From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 207E0C7618E for ; Thu, 27 Apr 2023 00:11:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242935AbjD0ALX (ORCPT ); Wed, 26 Apr 2023 20:11:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242710AbjD0AKo (ORCPT ); Wed, 26 Apr 2023 20:10:44 -0400 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 791924483 for ; Wed, 26 Apr 2023 17:10:16 -0700 (PDT) Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGwmBj013734; Thu, 27 Apr 2023 00:09:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=m5WZMJcDB0yDdPEBWmM9MnWGtmdsh6oAci435qRzgOU=; b=QewotladiR+frxDMWZyUDcfpo/8nqkUiKnfHSRs9SL+UuWgOSwzwz6Xrj6GTL5w9DtxW MsjduscPTtA/yoi3zzPoi++bDrBIHN5XH+yM6R4aRZgVRMh+Yn+wYEZctYXL2UHw5isz SZLBRu9mDnChGTVbBWeKV9emv7S6ny2RHEUkysCK3/Bby+X4HpppPpaRGC6yBauLJ509 9vjIT4OFeFr0q6EhWumjwXuxVRWx582ycuLwg7Zj/NgyunYsJp0eat6mXnRj5WXKjUyz xHZNZdfR08moqTT7pVRXGXzUcoxSXd/0HnRaSj3A8t16cPPHaG5hUWiTMGDF5ASf3oX2 Kw== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q47md2umn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:07 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QMn2Cf007147; Thu, 27 Apr 2023 00:09:06 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mp9f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:06 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938O013888; Thu, 27 Apr 2023 00:09:05 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-2; Thu, 27 Apr 2023 00:09:05 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 01/21] mm: add PKRAM API stubs and Kconfig Date: Wed, 26 Apr 2023 17:08:37 -0700 Message-Id: <1682554137-13938-2-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-GUID: oZBoD9zIKFev_Cz5qJgpsNFO-hW5IASo X-Proofpoint-ORIG-GUID: oZBoD9zIKFev_Cz5qJgpsNFO-hW5IASo Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Preserved-across-kexec memory or PKRAM is a method for saving memory pages of the currently executing kernel and restoring them after kexec boot into a new one. This can be utilized for preserving guest VM state, large in-memory databases, process memory, etc. across reboot. While DRAM-as-PMEM or actual persistent memory could be used to accomplish these things, PKRAM provides the latency of DRAM with the flexibility of dynamically determining the amount of memory to preserve. The proposed API: * Preserved memory is divided into nodes which can be saved or loaded independently of each other. The nodes are identified by unique name strings. A PKRAM node is created when save is initiated by calling pkram_prepare_save(). A PKRAM node is removed when load is initiated by calling pkram_prepare_load(). See below * A node is further divided into objects. An object represents closely coupled data in the form of a grouping of folios and/or a stream of byte data. For example, the folios and attributes of a file. After initiating an operation on a PKRAM node, PKRAM objects are initialized for saving or loading by calling pkram_prepare_save_obj() or pkram_prepare_load_obj(). * For saving/loading data from a PKRAM node/object instances of the pkram_stream and pkram_access structs are used. pkram_stream tracks the node and object being operated on while pkram_access tracks the data type and position within an object. The pkram_stream struct is initialized by calling pkram_prepare_save() or pkram_prepare_load() and then pkram_prepare_save_obj() or pkram_prepare_load_obj(). Once a pkram_stream is fully initialized, a pkram_access struct is initialized for each data type associated with the object. After save or load of a data type for the object is complete, pkram_finish_access() is called. After save or load is complete for the object, pkram_finish_save_obj() or pkram_finish_load_obj() must be called followed by pkram_finish_save() or pkram_finish_load() when save or load is completed for the node. If an error occurred during save, the saved data and the PKRAM node may be freed by calling pkram_discard_save() instead of pkram_finish_save(). * Both folio data and byte data can separately be streamed to a PKRAM object. pkram_save_folio() and pkram_load_folio() are used to stream folio data while pkram_write() and pkram_read() are used to stream byte data. A sequence of operations for saving/loading data from PKRAM would look like: * For saving data to PKRAM: /* create a PKRAM node and do initial stream setup */ pkram_prepare_save() /* create a PKRAM object associated with the PKRAM node and complete st= ream initialization */ pkram_prepare_save_obj() /* save data to the node/object */ PKRAM_ACCESS(pa_folios,...) PKRAM_ACCESS(pa_bytes,...) pkram_save_folio(pa_folios,...)[,...] /* for file folios */ pkram_write(pa_bytes,...)[,...] /* for a byte stream */ pkram_finish_access(pa_folios) pkram_finish_access(pa_bytes) pkram_finish_save_obj() /* commit the save or discard and delete the node */ pkram_finish_save() /* on success, or pkram_discard_save() * ... in case of error */ * For loading data from PKRAM: /* remove a PKRAM node from the list and do initial stream setup */ pkram_prepare_load() /* Remove a PKRAM object from the node and complete stream initializtio= n for loading data from it. */ pkram_prepare_load_obj() /* load data from the node/object */ PKRAM_ACCESS(pa_folios,...) PKRAM_ACCESS(pa_bytes,...) pkram_load_folio(pa_folios,...)[,...] /* for file folios */ pkram_read(pa_bytes,...)[,...] /* for a byte stream */ */ pkram_finish_access(pa_folios) pkram_finish_access(pa_bytes) /* free the object */ pkram_finish_load_obj() /* free the node */ pkram_finish_load() Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 47 +++++++++++++ mm/Kconfig | 9 +++ mm/Makefile | 2 + mm/pkram.c | 179 ++++++++++++++++++++++++++++++++++++++++++++++= ++++ 4 files changed, 237 insertions(+) create mode 100644 include/linux/pkram.h create mode 100644 mm/pkram.c diff --git a/include/linux/pkram.h b/include/linux/pkram.h new file mode 100644 index 000000000000..57b8db4229a4 --- /dev/null +++ b/include/linux/pkram.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PKRAM_H +#define _LINUX_PKRAM_H + +#include +#include +#include + +/** + * enum pkram_data_flags - definition of data types contained in a pkram o= bj + * @PKRAM_DATA_none: No data types configured + */ +enum pkram_data_flags { + PKRAM_DATA_none =3D 0x0, /* No data types configured */ +}; + +struct pkram_stream; +struct pkram_access; + +#define PKRAM_NAME_MAX 256 /* including nul */ + +int pkram_prepare_save(struct pkram_stream *ps, const char *name, + gfp_t gfp_mask); +int pkram_prepare_save_obj(struct pkram_stream *ps, enum pkram_data_flags = flags); + +void pkram_finish_save(struct pkram_stream *ps); +void pkram_finish_save_obj(struct pkram_stream *ps); +void pkram_discard_save(struct pkram_stream *ps); + +int pkram_prepare_load(struct pkram_stream *ps, const char *name); +int pkram_prepare_load_obj(struct pkram_stream *ps); + +void pkram_finish_load(struct pkram_stream *ps); +void pkram_finish_load_obj(struct pkram_stream *ps); + +#define PKRAM_ACCESS(name, stream, type) \ + struct pkram_access name + +void pkram_finish_access(struct pkram_access *pa, bool status_ok); + +int pkram_save_folio(struct pkram_access *pa, struct folio *folio); +struct folio *pkram_load_folio(struct pkram_access *pa, unsigned long *ind= ex); + +ssize_t pkram_write(struct pkram_access *pa, const void *buf, size_t count= ); +size_t pkram_read(struct pkram_access *pa, void *buf, size_t count); + +#endif /* _LINUX_PKRAM_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 4751031f3f05..10f089f4a181 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1202,6 +1202,15 @@ config LRU_GEN_STATS This option has a per-memcg and per-node memory overhead. # } =20 +config PKRAM + bool "Preserved-over-kexec memory storage" + default n + help + This option adds the kernel API that enables saving memory pages of + the currently executing kernel and restoring them after a kexec in + the newly booted one. This can be utilized for speeding up reboot by + leaving process memory and/or FS caches in-place. + source "mm/damon/Kconfig" =20 endmenu diff --git a/mm/Makefile b/mm/Makefile index 8e105e5b3e29..7a8d5a286d48 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -138,3 +138,5 @@ obj-$(CONFIG_IO_MAPPING) +=3D io-mapping.o obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) +=3D bootmem_info.o obj-$(CONFIG_GENERIC_IOREMAP) +=3D ioremap.o obj-$(CONFIG_SHRINKER_DEBUG) +=3D shrinker_debug.o +obj-$(CONFIG_PKRAM) +=3D pkram.o +>>>>>>> mm: add PKRAM API stubs and Kconfig diff --git a/mm/pkram.c b/mm/pkram.c new file mode 100644 index 000000000000..421de8211e05 --- /dev/null +++ b/mm/pkram.c @@ -0,0 +1,179 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +/** + * Create a preserved memory node with name @name and initialize stream @ps + * for saving data to it. + * + * @gfp_mask specifies the memory allocation mask to be used when saving d= ata. + * + * Returns 0 on success, -errno on failure. + * + * After the save has finished, pkram_finish_save() (or pkram_discard_save= () in + * case of failure) is to be called. + */ +int pkram_prepare_save(struct pkram_stream *ps, const char *name, gfp_t gf= p_mask) +{ + return -EINVAL; +} + +/** + * Create a preserved memory object and initialize stream @ps for saving d= ata + * to it. + * + * Returns 0 on success, -errno on failure. + * + * After the save has finished, pkram_finish_save_obj() (or pkram_discard_= save() + * in case of failure) is to be called. + */ +int pkram_prepare_save_obj(struct pkram_stream *ps, enum pkram_data_flags = flags) +{ + return -EINVAL; +} + +/** + * Commit the object started with pkram_prepare_save_obj() to preserved me= mory. + */ +void pkram_finish_save_obj(struct pkram_stream *ps) +{ + WARN_ON_ONCE(1); +} + +/** + * Commit the save to preserved memory started with pkram_prepare_save(). + * After the call, the stream may not be used any more. + */ +void pkram_finish_save(struct pkram_stream *ps) +{ + WARN_ON_ONCE(1); +} + +/** + * Cancel the save to preserved memory started with pkram_prepare_save() a= nd + * destroy the corresponding preserved memory node freeing any data already + * saved to it. + */ +void pkram_discard_save(struct pkram_stream *ps) +{ + WARN_ON_ONCE(1); +} + +/** + * Remove the preserved memory node with name @name and initialize stream = @ps + * for loading data from it. + * + * Returns 0 on success, -errno on failure. + * + * After the load has finished, pkram_finish_load() is to be called. + */ +int pkram_prepare_load(struct pkram_stream *ps, const char *name) +{ + return -EINVAL; +} + +/** + * Remove the next preserved memory object from the stream @ps and + * initialize stream @ps for loading data from it. + * + * Returns 0 on success, -errno on failure. + * + * After the load has finished, pkram_finish_load_obj() is to be called. + */ +int pkram_prepare_load_obj(struct pkram_stream *ps) +{ + return -EINVAL; +} + +/** + * Finish the load of a preserved memory object started with + * pkram_prepare_load_obj() freeing the object and any data that has not + * been loaded from it. + */ +void pkram_finish_load_obj(struct pkram_stream *ps) +{ + WARN_ON_ONCE(1); +} + +/** + * Finish the load from preserved memory started with pkram_prepare_load() + * freeing the corresponding preserved memory node and any data that has + * not been loaded from it. + */ +void pkram_finish_load(struct pkram_stream *ps) +{ + WARN_ON_ONCE(1); +} + +/** + * Finish the data access to or from the preserved memory node and object + * associated with pkram stream access @pa. The access must have been + * initialized with PKRAM_ACCESS(). + */ +void pkram_finish_access(struct pkram_access *pa, bool status_ok) +{ + WARN_ON_ONCE(1); +} + +/** + * Save folio @folio to the preserved memory node and object associated + * with pkram stream access @pa. The stream must have been initialized with + * pkram_prepare_save() and pkram_prepare_save_obj() and access initialized + * with PKRAM_ACCESS(). + * + * Returns 0 on success, -errno on failure. + */ +int pkram_save_folio(struct pkram_access *pa, struct folio *folio) +{ + return -EINVAL; +} + +/** + * Load the next folio from the preserved memory node and object associated + * with pkram stream access @pa. The stream must have been initialized with + * pkram_prepare_load() and pkram_prepare_load_obj() and access initialized + * with PKRAM_ACCESS(). + * + * If not NULL, @index is initialized with the preserved mapping offset of= the + * folio loaded. + * + * Returns the folio loaded or NULL if the node is empty. + * + * The folio loaded has its refcount incremented. + */ +struct folio *pkram_load_folio(struct pkram_access *pa, unsigned long *ind= ex) +{ + return NULL; +} + +/** + * Copy @count bytes from @buf to the preserved memory node and object + * associated with pkram stream access @pa. The stream must have been + * initialized with pkram_prepare_save() and pkram_prepare_save_obj() + * and access initialized with PKRAM_ACCESS(); + * + * On success, returns the number of bytes written, which is always equal = to + * @count. On failure, -errno is returned. + */ +ssize_t pkram_write(struct pkram_access *pa, const void *buf, size_t count) +{ + return -EINVAL; +} + +/** + * Copy up to @count bytes from the preserved memory node and object + * associated with pkram stream access @pa to @buf. The stream must have b= een + * initialized with pkram_prepare_load() and pkram_prepare_load_obj() and + * access initialized PKRAM_ACCESS(). + * + * Returns the number of bytes read, which may be less than @count if the = node + * has fewer bytes available. + */ +size_t pkram_read(struct pkram_access *pa, void *buf, size_t count) +{ + return 0; +} --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4984AC7EE23 for ; Thu, 27 Apr 2023 00:10:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242753AbjD0AKL (ORCPT ); Wed, 26 Apr 2023 20:10:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35278 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242700AbjD0AKD (ORCPT ); Wed, 26 Apr 2023 20:10:03 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 759423ABD for ; Wed, 26 Apr 2023 17:10:01 -0700 (PDT) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGxEtw025349; Thu, 27 Apr 2023 00:09:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=2ukFf+zRaNRmI8y9b//d4HHLWejv/e/RSSnUHKNUOb0=; b=TxlrwQ0h7qFCJaeuzgzyEU0pAR/9XxEhbPIoeg6wR+uexwlVY+B8YNO/biu1qWN7dWsN zWja6dGffqrU7IioURUI0OMKQZQ9tuPCh7XG67qZectGRQhDmBVm+QZ5BEeNmOZgOdVi nwIybKeQicAO1qnGABVzNxIV9c/kc+TFYGvW3OrDaKffhxCBcJ2xvnKxbTw+ksNL93ET +3oK/5n13QLSew1loC2CozejsH8uV5iwonsq7XVCCgHaF7rHjDHPkQxkLHPsi67hbOyd YXklKe+yThp4IMn+e8XDiW13KH4eGybP1zScjsIW4e6WZfTZ9OYxWFtBQe1eEa6RptVa Hg== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q46622txu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:08 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QLGcu0007340; Thu, 27 Apr 2023 00:09:07 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpab-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:07 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938Q013888; Thu, 27 Apr 2023 00:09:07 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-3; Thu, 27 Apr 2023 00:09:06 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 02/21] mm: PKRAM: implement node load and save functions Date: Wed, 26 Apr 2023 17:08:38 -0700 Message-Id: <1682554137-13938-3-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-ORIG-GUID: BhiXjpaVT-IHUNxaCSxCFNS9jekWEvq_ X-Proofpoint-GUID: BhiXjpaVT-IHUNxaCSxCFNS9jekWEvq_ Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Preserved memory is divided into nodes which can be saved and loaded independently of each other. PKRAM nodes are kept on a list and identified by unique names. Whenever a save operation is initiated by calling pkram_prepare_save(), a new node is created and linked to the list. When the save operation has been committed by calling pkram_finish_save(), the node becomes loadable. A load operation can be then initiated by calling pkram_prepare_load() which deletes the node from the list and prepares the corresponding stream for loading data from it. After the load has been finished, the pkram_finish_load() function must be called to free the node. Nodes are also deleted when a save operation is discarded, i.e. pkram_discard_save() is called instead of pkram_finish_save(). Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 8 ++- mm/pkram.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++= ++-- 2 files changed, 149 insertions(+), 6 deletions(-) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index 57b8db4229a4..8def9017b16a 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -6,6 +6,8 @@ #include #include =20 +struct pkram_node; + /** * enum pkram_data_flags - definition of data types contained in a pkram o= bj * @PKRAM_DATA_none: No data types configured @@ -14,7 +16,11 @@ enum pkram_data_flags { PKRAM_DATA_none =3D 0x0, /* No data types configured */ }; =20 -struct pkram_stream; +struct pkram_stream { + gfp_t gfp_mask; + struct pkram_node *node; +}; + struct pkram_access; =20 #define PKRAM_NAME_MAX 256 /* including nul */ diff --git a/mm/pkram.c b/mm/pkram.c index 421de8211e05..bbfd8df0874e 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -2,16 +2,85 @@ #include #include #include +#include #include +#include #include +#include #include =20 +/* + * Preserved memory is divided into nodes that can be saved or loaded + * independently of each other. The nodes are identified by unique name + * strings. + * + * The structure occupies a memory page. + */ +struct pkram_node { + __u32 flags; + + __u8 name[PKRAM_NAME_MAX]; +}; + +#define PKRAM_SAVE 1 +#define PKRAM_LOAD 2 +#define PKRAM_ACCMODE_MASK 3 + +static LIST_HEAD(pkram_nodes); /* linked through page::lru */ +static DEFINE_MUTEX(pkram_mutex); /* serializes open/close */ + +static inline struct page *pkram_alloc_page(gfp_t gfp_mask) +{ + return alloc_page(gfp_mask); +} + +static inline void pkram_free_page(void *addr) +{ + free_page((unsigned long)addr); +} + +static inline void pkram_insert_node(struct pkram_node *node) +{ + list_add(&virt_to_page(node)->lru, &pkram_nodes); +} + +static inline void pkram_delete_node(struct pkram_node *node) +{ + list_del(&virt_to_page(node)->lru); +} + +static struct pkram_node *pkram_find_node(const char *name) +{ + struct page *page; + struct pkram_node *node; + + list_for_each_entry(page, &pkram_nodes, lru) { + node =3D page_address(page); + if (strcmp(node->name, name) =3D=3D 0) + return node; + } + return NULL; +} + +static void pkram_stream_init(struct pkram_stream *ps, + struct pkram_node *node, gfp_t gfp_mask) +{ + memset(ps, 0, sizeof(*ps)); + ps->gfp_mask =3D gfp_mask; + ps->node =3D node; +} + /** * Create a preserved memory node with name @name and initialize stream @ps * for saving data to it. * * @gfp_mask specifies the memory allocation mask to be used when saving d= ata. * + * Error values: + * %ENAMETOOLONG: name len >=3D PKRAM_NAME_MAX + * %ENOMEM: insufficient memory available + * %EEXIST: node with specified name already exists + * * Returns 0 on success, -errno on failure. * * After the save has finished, pkram_finish_save() (or pkram_discard_save= () in @@ -19,7 +88,34 @@ */ int pkram_prepare_save(struct pkram_stream *ps, const char *name, gfp_t gf= p_mask) { - return -EINVAL; + struct page *page; + struct pkram_node *node; + int err =3D 0; + + if (strlen(name) >=3D PKRAM_NAME_MAX) + return -ENAMETOOLONG; + + page =3D pkram_alloc_page(gfp_mask | __GFP_ZERO); + if (!page) + return -ENOMEM; + node =3D page_address(page); + + node->flags =3D PKRAM_SAVE; + strcpy(node->name, name); + + mutex_lock(&pkram_mutex); + if (!pkram_find_node(name)) + pkram_insert_node(node); + else + err =3D -EEXIST; + mutex_unlock(&pkram_mutex); + if (err) { + pkram_free_page(node); + return err; + } + + pkram_stream_init(ps, node, gfp_mask); + return 0; } =20 /** @@ -50,7 +146,11 @@ void pkram_finish_save_obj(struct pkram_stream *ps) */ void pkram_finish_save(struct pkram_stream *ps) { - WARN_ON_ONCE(1); + struct pkram_node *node =3D ps->node; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_SAVE); + + node->flags &=3D ~PKRAM_ACCMODE_MASK; } =20 /** @@ -60,7 +160,15 @@ void pkram_finish_save(struct pkram_stream *ps) */ void pkram_discard_save(struct pkram_stream *ps) { - WARN_ON_ONCE(1); + struct pkram_node *node =3D ps->node; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_SAVE); + + mutex_lock(&pkram_mutex); + pkram_delete_node(node); + mutex_unlock(&pkram_mutex); + + pkram_free_page(node); } =20 /** @@ -69,11 +177,36 @@ void pkram_discard_save(struct pkram_stream *ps) * * Returns 0 on success, -errno on failure. * + * Error values: + * %ENOENT: node with specified name does not exist + * %EBUSY: save to required node has not finished yet + * * After the load has finished, pkram_finish_load() is to be called. */ int pkram_prepare_load(struct pkram_stream *ps, const char *name) { - return -EINVAL; + struct pkram_node *node; + int err =3D 0; + + mutex_lock(&pkram_mutex); + node =3D pkram_find_node(name); + if (!node) { + err =3D -ENOENT; + goto out_unlock; + } + if (node->flags & PKRAM_ACCMODE_MASK) { + err =3D -EBUSY; + goto out_unlock; + } + pkram_delete_node(node); +out_unlock: + mutex_unlock(&pkram_mutex); + if (err) + return err; + + node->flags |=3D PKRAM_LOAD; + pkram_stream_init(ps, node, 0); + return 0; } =20 /** @@ -106,7 +239,11 @@ void pkram_finish_load_obj(struct pkram_stream *ps) */ void pkram_finish_load(struct pkram_stream *ps) { - WARN_ON_ONCE(1); + struct pkram_node *node =3D ps->node; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_LOAD); + + pkram_free_page(node); } =20 /** --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00D4DC7EE24 for ; Thu, 27 Apr 2023 00:10:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242805AbjD0AKU (ORCPT ); Wed, 26 Apr 2023 20:10:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242703AbjD0AKD (ORCPT ); Wed, 26 Apr 2023 20:10:03 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B28283C01 for ; Wed, 26 Apr 2023 17:10:01 -0700 (PDT) Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGx6XV014746; Thu, 27 Apr 2023 00:09:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=V8bKMJcQa6Jv2T9adbwM8jWBRAWVIiOFCv4jdhXt4wk=; b=1O7OzEZ6q3Uaoe9tuHIc0BR7Q0Y5MQzRvRBF53JohJ57D3BnVoGakPz0LMCaXlr5k6EE hcxD5lmKhhEzCaYotXvvhdcoZM2GlaWAVi0uGAe97gWkaCXkvAOoMAPYpJ2EdV7Ytq0S fMpw/wChwFMuBMEXu5cQ8vjvoxrH0kZ2H63FuQNA1E2CJtVvVklLv66+AZT1lQie52OH vXAPe8Nv/QXXF3iyefzyZ0nt6hCgrNCXm7+dd0y+ij+jdzxQbYH9cvAC/zJfiTOJjHaR 3vOFdz2NsVHfxbdkqOAeCcHG8QQK+nMa6GmGSMNAwC2+lUJ+COrkYB7u9Zgi39ZmGxqK nQ== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q47fatmrn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:10 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QN01rW007153; Thu, 27 Apr 2023 00:09:09 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpbb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:09 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938S013888; Thu, 27 Apr 2023 00:09:08 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-4; Thu, 27 Apr 2023 00:09:08 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 03/21] mm: PKRAM: implement object load and save functions Date: Wed, 26 Apr 2023 17:08:39 -0700 Message-Id: <1682554137-13938-4-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-ORIG-GUID: KQ35sjKexsktQQWPCGo9miJFSnT_W4A6 X-Proofpoint-GUID: KQ35sjKexsktQQWPCGo9miJFSnT_W4A6 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" PKRAM nodes are further divided into a list of objects. After a save operation has been initiated for a node, a save operation for an object associated with the node is initiated by calling pkram_prepare_save_obj(). A new object is created and linked to the node. The save operation for the object is committed by calling pkram_finish_save_obj(). After a load operation has been initiated, pkram_prepare_load_obj() is called to delete the next object from the node and prepare the corresponding stream for loading data from it. After the load of object has been finished, pkram_finish_load_obj() is called to free the object. Objects are also deleted when a save operation is discarded. Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 2 ++ mm/pkram.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++= +--- 2 files changed, 70 insertions(+), 4 deletions(-) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index 8def9017b16a..83718ad0e416 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -7,6 +7,7 @@ #include =20 struct pkram_node; +struct pkram_obj; =20 /** * enum pkram_data_flags - definition of data types contained in a pkram o= bj @@ -19,6 +20,7 @@ enum pkram_data_flags { struct pkram_stream { gfp_t gfp_mask; struct pkram_node *node; + struct pkram_obj *obj; }; =20 struct pkram_access; diff --git a/mm/pkram.c b/mm/pkram.c index bbfd8df0874e..6e3895cb9872 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -6,9 +6,14 @@ #include #include #include +#include #include #include =20 +struct pkram_obj { + __u64 obj_pfn; /* points to the next object in the list */ +}; + /* * Preserved memory is divided into nodes that can be saved or loaded * independently of each other. The nodes are identified by unique name @@ -18,6 +23,7 @@ */ struct pkram_node { __u32 flags; + __u64 obj_pfn; /* points to the first obj of the node */ =20 __u8 name[PKRAM_NAME_MAX]; }; @@ -62,6 +68,21 @@ static struct pkram_node *pkram_find_node(const char *na= me) return NULL; } =20 +static void pkram_truncate_node(struct pkram_node *node) +{ + unsigned long obj_pfn; + struct pkram_obj *obj; + + obj_pfn =3D node->obj_pfn; + while (obj_pfn) { + obj =3D pfn_to_kaddr(obj_pfn); + obj_pfn =3D obj->obj_pfn; + pkram_free_page(obj); + cond_resched(); + } + node->obj_pfn =3D 0; +} + static void pkram_stream_init(struct pkram_stream *ps, struct pkram_node *node, gfp_t gfp_mask) { @@ -124,12 +145,31 @@ int pkram_prepare_save(struct pkram_stream *ps, const= char *name, gfp_t gfp_mask * * Returns 0 on success, -errno on failure. * + * Error values: + * %ENOMEM: insufficient memory available + * * After the save has finished, pkram_finish_save_obj() (or pkram_discard_= save() * in case of failure) is to be called. */ int pkram_prepare_save_obj(struct pkram_stream *ps, enum pkram_data_flags = flags) { - return -EINVAL; + struct pkram_node *node =3D ps->node; + struct pkram_obj *obj; + struct page *page; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_SAVE); + + page =3D pkram_alloc_page(ps->gfp_mask | __GFP_ZERO); + if (!page) + return -ENOMEM; + obj =3D page_address(page); + + if (node->obj_pfn) + obj->obj_pfn =3D node->obj_pfn; + node->obj_pfn =3D page_to_pfn(page); + + ps->obj =3D obj; + return 0; } =20 /** @@ -137,7 +177,9 @@ int pkram_prepare_save_obj(struct pkram_stream *ps, enu= m pkram_data_flags flags) */ void pkram_finish_save_obj(struct pkram_stream *ps) { - WARN_ON_ONCE(1); + struct pkram_node *node =3D ps->node; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_SAVE); } =20 /** @@ -168,6 +210,7 @@ void pkram_discard_save(struct pkram_stream *ps) pkram_delete_node(node); mutex_unlock(&pkram_mutex); =20 + pkram_truncate_node(node); pkram_free_page(node); } =20 @@ -215,11 +258,26 @@ int pkram_prepare_load(struct pkram_stream *ps, const= char *name) * * Returns 0 on success, -errno on failure. * + * Error values: + * %ENODATA: Stream @ps has no preserved memory objects + * * After the load has finished, pkram_finish_load_obj() is to be called. */ int pkram_prepare_load_obj(struct pkram_stream *ps) { - return -EINVAL; + struct pkram_node *node =3D ps->node; + struct pkram_obj *obj; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_LOAD); + + if (!node->obj_pfn) + return -ENODATA; + + obj =3D pfn_to_kaddr(node->obj_pfn); + node->obj_pfn =3D obj->obj_pfn; + + ps->obj =3D obj; + return 0; } =20 /** @@ -229,7 +287,12 @@ int pkram_prepare_load_obj(struct pkram_stream *ps) */ void pkram_finish_load_obj(struct pkram_stream *ps) { - WARN_ON_ONCE(1); + struct pkram_node *node =3D ps->node; + struct pkram_obj *obj =3D ps->obj; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_LOAD); + + pkram_free_page(obj); } =20 /** @@ -243,6 +306,7 @@ void pkram_finish_load(struct pkram_stream *ps) =20 BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_LOAD); =20 + pkram_truncate_node(node); pkram_free_page(node); } =20 --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B32A5C77B7C for ; Thu, 27 Apr 2023 00:11:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242849AbjD0ALV (ORCPT ); Wed, 26 Apr 2023 20:11:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242818AbjD0AKa (ORCPT ); Wed, 26 Apr 2023 20:10:30 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36C674205 for ; Wed, 26 Apr 2023 17:10:12 -0700 (PDT) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGxD1d025310; Thu, 27 Apr 2023 00:09:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=M8m1jpfuKsBAiOuFoEkjKkL2mHa5vwe/lnUsSpAdeqY=; b=XioIQjRsKO55if7qU5pc5RmHJ7+xiiutqegxkqaBdyQfIyM/ZkyhCK464QKjJLnxrdmt qfFhb26IJfPS5NrhO41Fd7xLL/JUqLb2zwxT1JipnXsUch18NntSN6ysSHKmoqiuDvv8 UTBTXwdjJbiMdb65OtwieFuuVoojyf3WE8H67z6EwxDbSnkTVFT1lOERHjEV+vVNpX61 Ib3BI/N2BlYpkNNE/ID2hr5v1pz35CmC/13qXlClCcLN2nevgrzffP8hHOs7I0uxSz9Y ppfwpanX2o5Wl8U9+t5Gu1Rj0PFXIxFyaD66RUZewzyXug20mDDAkAGIr2R2GeXnwI6p RA== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q46622txw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:11 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QM05AR007142; Thu, 27 Apr 2023 00:09:11 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpc0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:10 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938U013888; Thu, 27 Apr 2023 00:09:10 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-5; Thu, 27 Apr 2023 00:09:10 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 04/21] mm: PKRAM: implement folio stream operations Date: Wed, 26 Apr 2023 17:08:40 -0700 Message-Id: <1682554137-13938-5-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-ORIG-GUID: 9E3o04aE9ICIsw0HPe36rS9Y7E8_BkLp X-Proofpoint-GUID: 9E3o04aE9ICIsw0HPe36rS9Y7E8_BkLp Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Implement pkram_save_folio() to populate a PKRAM object with in-memory folios and pkram_load_folio() to load folios from a PKRAM object. Saving a folio to PKRAM is accomplished by recording its pfn, order, and mapping index and incrementing its refcount so that it will not be freed after the last user puts it. Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 42 ++++++- mm/pkram.c | 311 ++++++++++++++++++++++++++++++++++++++++++++++= +++- 2 files changed, 346 insertions(+), 7 deletions(-) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index 83718ad0e416..130ab5c2d94a 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -8,22 +8,47 @@ =20 struct pkram_node; struct pkram_obj; +struct pkram_link; =20 /** * enum pkram_data_flags - definition of data types contained in a pkram o= bj * @PKRAM_DATA_none: No data types configured + * @PKRAM_DATA_folios: obj contains folio data */ enum pkram_data_flags { - PKRAM_DATA_none =3D 0x0, /* No data types configured */ + PKRAM_DATA_none =3D 0x0, /* No data types configured */ + PKRAM_DATA_folios =3D 0x1, /* Contains folio data */ +}; + +struct pkram_data_stream { + /* List of link pages to add/remove from */ + __u64 *head_link_pfnp; + __u64 *tail_link_pfnp; + + struct pkram_link *link; /* current link */ + unsigned int entry_idx; /* next entry in link */ }; =20 struct pkram_stream { gfp_t gfp_mask; struct pkram_node *node; struct pkram_obj *obj; + + __u64 *folios_head_link_pfnp; + __u64 *folios_tail_link_pfnp; +}; + +struct pkram_folios_access { + unsigned long next_index; }; =20 -struct pkram_access; +struct pkram_access { + enum pkram_data_flags dtype; + struct pkram_stream *ps; + struct pkram_data_stream pds; + + struct pkram_folios_access folios; +}; =20 #define PKRAM_NAME_MAX 256 /* including nul */ =20 @@ -41,8 +66,19 @@ int pkram_prepare_save(struct pkram_stream *ps, const ch= ar *name, void pkram_finish_load(struct pkram_stream *ps); void pkram_finish_load_obj(struct pkram_stream *ps); =20 +#define PKRAM_PDS_INIT(name, stream, type) { \ + .head_link_pfnp =3D (stream)->type##_head_link_pfnp, \ + .tail_link_pfnp =3D (stream)->type##_tail_link_pfnp, \ + } + +#define PKRAM_ACCESS_INIT(name, stream, type) { \ + .dtype =3D PKRAM_DATA_##type, \ + .ps =3D (stream), \ + .pds =3D PKRAM_PDS_INIT(name, stream, type), \ + } + #define PKRAM_ACCESS(name, stream, type) \ - struct pkram_access name + struct pkram_access name =3D PKRAM_ACCESS_INIT(name, stream, type) =20 void pkram_finish_access(struct pkram_access *pa, bool status_ok); =20 diff --git a/mm/pkram.c b/mm/pkram.c index 6e3895cb9872..610ff7a88c98 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include #include +#include #include #include #include @@ -10,8 +11,40 @@ #include #include =20 +#include "internal.h" + + +/* + * Represents a reference to a data page saved to PKRAM. + */ +typedef __u64 pkram_entry_t; + +#define PKRAM_ENTRY_FLAGS_SHIFT 0x5 +#define PKRAM_ENTRY_FLAGS_MASK 0x7f +#define PKRAM_ENTRY_ORDER_MASK 0x1f + +/* + * Keeps references to folios saved to PKRAM. + * The structure occupies a memory page. + */ +struct pkram_link { + __u64 link_pfn; /* points to the next link of the object */ + __u64 index; /* mapping index of first pkram_entry_t */ + + /* + * the array occupies the rest of the link page; if the link is not + * full, the rest of the array must be filled with zeros + */ + pkram_entry_t entry[]; +}; + +#define PKRAM_LINK_ENTRIES_MAX \ + ((PAGE_SIZE-sizeof(struct pkram_link))/sizeof(pkram_entry_t)) + struct pkram_obj { - __u64 obj_pfn; /* points to the next object in the list */ + __u64 folios_head_link_pfn; /* the first folios link of the object */ + __u64 folios_tail_link_pfn; /* the last folios link of the object */ + __u64 obj_pfn; /* points to the next object in the list */ }; =20 /* @@ -19,6 +52,10 @@ struct pkram_obj { * independently of each other. The nodes are identified by unique name * strings. * + * References to folios saved to a preserved memory node are kept in a + * singly-linked list of PKRAM link structures (see above), the node has a + * pointer to the head of. + * * The structure occupies a memory page. */ struct pkram_node { @@ -68,6 +105,41 @@ static struct pkram_node *pkram_find_node(const char *n= ame) return NULL; } =20 +static void pkram_truncate_link(struct pkram_link *link) +{ + struct page *page; + pkram_entry_t p; + int i; + + for (i =3D 0; i < PKRAM_LINK_ENTRIES_MAX; i++) { + p =3D link->entry[i]; + if (!p) + continue; + page =3D pfn_to_page(PHYS_PFN(p)); + put_page(page); + } +} + +static void pkram_truncate_links(unsigned long link_pfn) +{ + struct pkram_link *link; + + while (link_pfn) { + link =3D pfn_to_kaddr(link_pfn); + pkram_truncate_link(link); + link_pfn =3D link->link_pfn; + pkram_free_page(link); + cond_resched(); + } +} + +static void pkram_truncate_obj(struct pkram_obj *obj) +{ + pkram_truncate_links(obj->folios_head_link_pfn); + obj->folios_head_link_pfn =3D 0; + obj->folios_tail_link_pfn =3D 0; +} + static void pkram_truncate_node(struct pkram_node *node) { unsigned long obj_pfn; @@ -76,6 +148,7 @@ static void pkram_truncate_node(struct pkram_node *node) obj_pfn =3D node->obj_pfn; while (obj_pfn) { obj =3D pfn_to_kaddr(obj_pfn); + pkram_truncate_obj(obj); obj_pfn =3D obj->obj_pfn; pkram_free_page(obj); cond_resched(); @@ -83,6 +156,84 @@ static void pkram_truncate_node(struct pkram_node *node) node->obj_pfn =3D 0; } =20 +static void pkram_add_link(struct pkram_link *link, struct pkram_data_stre= am *pds) +{ + __u64 link_pfn =3D page_to_pfn(virt_to_page(link)); + + if (!*pds->head_link_pfnp) { + *pds->head_link_pfnp =3D link_pfn; + *pds->tail_link_pfnp =3D link_pfn; + } else { + struct pkram_link *tail =3D pfn_to_kaddr(*pds->tail_link_pfnp); + + tail->link_pfn =3D link_pfn; + *pds->tail_link_pfnp =3D link_pfn; + } +} + +static struct pkram_link *pkram_remove_link(struct pkram_data_stream *pds) +{ + struct pkram_link *link; + + if (!*pds->head_link_pfnp) + return NULL; + + link =3D pfn_to_kaddr(*pds->head_link_pfnp); + *pds->head_link_pfnp =3D link->link_pfn; + if (!*pds->head_link_pfnp) + *pds->tail_link_pfnp =3D 0; + else + link->link_pfn =3D 0; + + return link; +} + +static struct pkram_link *pkram_new_link(struct pkram_data_stream *pds, gf= p_t gfp_mask) +{ + struct pkram_link *link; + struct page *link_page; + + link_page =3D pkram_alloc_page((gfp_mask & GFP_RECLAIM_MASK) | + __GFP_ZERO); + if (!link_page) + return NULL; + + link =3D page_address(link_page); + pkram_add_link(link, pds); + pds->link =3D link; + pds->entry_idx =3D 0; + + return link; +} + +static void pkram_add_link_entry(struct pkram_data_stream *pds, struct pag= e *page) +{ + struct pkram_link *link =3D pds->link; + pkram_entry_t p; + short flags =3D 0; + + p =3D page_to_phys(page); + p |=3D compound_order(page); + p |=3D ((flags & PKRAM_ENTRY_FLAGS_MASK) << PKRAM_ENTRY_FLAGS_SHIFT); + link->entry[pds->entry_idx] =3D p; + pds->entry_idx++; +} + +static int pkram_next_link(struct pkram_data_stream *pds, struct pkram_lin= k **linkp) +{ + struct pkram_link *link; + + link =3D pkram_remove_link(pds); + if (!link) + return -ENODATA; + + pds->link =3D link; + pds->entry_idx =3D 0; + *linkp =3D link; + + return 0; +} + static void pkram_stream_init(struct pkram_stream *ps, struct pkram_node *node, gfp_t gfp_mask) { @@ -159,6 +310,9 @@ int pkram_prepare_save_obj(struct pkram_stream *ps, enu= m pkram_data_flags flags) =20 BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_SAVE); =20 + if (flags & ~PKRAM_DATA_folios) + return -EINVAL; + page =3D pkram_alloc_page(ps->gfp_mask | __GFP_ZERO); if (!page) return -ENOMEM; @@ -168,6 +322,10 @@ int pkram_prepare_save_obj(struct pkram_stream *ps, en= um pkram_data_flags flags) obj->obj_pfn =3D node->obj_pfn; node->obj_pfn =3D page_to_pfn(page); =20 + if (flags & PKRAM_DATA_folios) { + ps->folios_head_link_pfnp =3D &obj->folios_head_link_pfn; + ps->folios_tail_link_pfnp =3D &obj->folios_tail_link_pfn; + } ps->obj =3D obj; return 0; } @@ -274,8 +432,17 @@ int pkram_prepare_load_obj(struct pkram_stream *ps) return -ENODATA; =20 obj =3D pfn_to_kaddr(node->obj_pfn); + if (!obj->folios_head_link_pfn) { + WARN_ON(1); + return -EINVAL; + } + node->obj_pfn =3D obj->obj_pfn; =20 + if (obj->folios_head_link_pfn) { + ps->folios_head_link_pfnp =3D &obj->folios_head_link_pfn; + ps->folios_tail_link_pfnp =3D &obj->folios_tail_link_pfn; + } ps->obj =3D obj; return 0; } @@ -292,6 +459,7 @@ void pkram_finish_load_obj(struct pkram_stream *ps) =20 BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_LOAD); =20 + pkram_truncate_obj(obj); pkram_free_page(obj); } =20 @@ -317,7 +485,41 @@ void pkram_finish_load(struct pkram_stream *ps) */ void pkram_finish_access(struct pkram_access *pa, bool status_ok) { - WARN_ON_ONCE(1); + if (status_ok) + return; + + if (pa->ps->node->flags =3D=3D PKRAM_SAVE) + return; + + if (pa->pds.link) + pkram_truncate_link(pa->pds.link); +} + +/* + * Add a page to a PKRAM obj allocating a new PKRAM link if necessary. + */ +static int __pkram_save_page(struct pkram_access *pa, struct page *page, + unsigned long index) +{ + struct pkram_data_stream *pds =3D &pa->pds; + struct pkram_link *link =3D pds->link; + + if (!link || pds->entry_idx >=3D PKRAM_LINK_ENTRIES_MAX || + index !=3D pa->folios.next_index) { + link =3D pkram_new_link(pds, pa->ps->gfp_mask); + if (!link) + return -ENOMEM; + + pa->folios.next_index =3D link->index =3D index; + } + + get_page(page); + + pkram_add_link_entry(pds, page); + + pa->folios.next_index +=3D compound_nr(page); + + return 0; } =20 /** @@ -327,10 +529,102 @@ void pkram_finish_access(struct pkram_access *pa, bo= ol status_ok) * with PKRAM_ACCESS(). * * Returns 0 on success, -errno on failure. + * + * Error values: + * %ENOMEM: insufficient amount of memory available + * + * Saving a folio to preserved memory is simply incrementing its refcount = so + * that it will not get freed after the last user puts it. That means it is + * safe to use the folio as usual after it has been saved. */ int pkram_save_folio(struct pkram_access *pa, struct folio *folio) { - return -EINVAL; + struct pkram_node *node =3D pa->ps->node; + struct page *page =3D folio_page(folio, 0); + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_SAVE); + + return __pkram_save_page(pa, page, page->index); +} + +static struct page *__pkram_prep_load_page(pkram_entry_t p) +{ + struct page *page; + int order; + short flags; + + flags =3D (p >> PKRAM_ENTRY_FLAGS_SHIFT) & PKRAM_ENTRY_FLAGS_MASK; + order =3D p & PKRAM_ENTRY_ORDER_MASK; + if (order >=3D MAX_ORDER) + goto out_error; + + page =3D pfn_to_page(PHYS_PFN(p)); + + if (!page_ref_freeze(pg, 1)) { + pr_err("PKRAM preserved page has unexpected inflated ref count\n"); + goto out_error; + } + + if (order) { + prep_compound_page(page, order); + if (order > 1) + prep_transhuge_page(page); + } + + page_ref_unfreeze(page, 1); + + return page; + +out_error: + return ERR_PTR(-EINVAL); +} + +/* + * Extract the next page from preserved memory freeing a PKRAM link if it + * becomes empty. + */ +static struct page *__pkram_load_page(struct pkram_access *pa, unsigned lo= ng *index) +{ + struct pkram_data_stream *pds =3D &pa->pds; + struct pkram_link *link =3D pds->link; + struct page *page; + pkram_entry_t p; + int ret; + + if (!link) { + ret =3D pkram_next_link(pds, &link); + if (ret) + return NULL; + + if (index) + pa->folios.next_index =3D link->index; + } + + BUG_ON(pds->entry_idx >=3D PKRAM_LINK_ENTRIES_MAX); + + p =3D link->entry[pds->entry_idx]; + BUG_ON(!p); + + page =3D __pkram_prep_load_page(p); + if (IS_ERR(page)) + return page; + + if (index) { + *index =3D pa->folios.next_index; + pa->folios.next_index +=3D compound_nr(page); + } + + /* clear to avoid double free (see pkram_truncate_link()) */ + link->entry[pds->entry_idx] =3D 0; + + pds->entry_idx++; + if (pds->entry_idx >=3D PKRAM_LINK_ENTRIES_MAX || + !link->entry[pds->entry_idx]) { + pds->link =3D NULL; + pkram_free_page(link); + } + + return page; } =20 /** @@ -348,7 +642,16 @@ int pkram_save_folio(struct pkram_access *pa, struct f= olio *folio) */ struct folio *pkram_load_folio(struct pkram_access *pa, unsigned long *ind= ex) { - return NULL; + struct pkram_node *node =3D pa->ps->node; + struct page *page; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_LOAD); + + page =3D __pkram_load_page(pa, index); + if (IS_ERR_OR_NULL(page)) + return (struct folio *)page; + else + return page_folio(page); } =20 /** --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 452F2C7618E for ; Thu, 27 Apr 2023 00:11:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242888AbjD0ALB (ORCPT ); Wed, 26 Apr 2023 20:11:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242730AbjD0AKI (ORCPT ); Wed, 26 Apr 2023 20:10:08 -0400 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75B4D3C25 for ; Wed, 26 Apr 2023 17:10:06 -0700 (PDT) Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGx0jh015505; Thu, 27 Apr 2023 00:09:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=MfX2UCA2WLfnR/ast08TW20ISh9rczjU8Ar+dRs48HA=; b=GCWhljQNjE5AjIClKhqzHeKVW3z5kr5UBJP7HEDWHVPjD+xiaPJ2/5t9JZYlCLAGl/i7 yvTMTaC3FGePZ3YrF4pTNY5ulz6nyTG1wx4O2xEl56aT+LOheMQ72F2eqB71yGtxnX33 o/hrmAtnO3q1uFdVKVmX1jK+/8S2wrSpWAvtkAkE9A4H6AT9NupoiSiA9BPoU69PpeL1 VoaJtJ2XB0hsW0yKsZDTdd1p+zgv2sLxe8Xm5EP2E3B7UpH96KL1y7cW/00QTYOxaYPR XXZ0vDADFP4dLfYAb/Ef7z3nTdE2ZfvilGGHVUfiEBjCMvZFLCCUdIorty/kdB5tRZZL RQ== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q460dampk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:13 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QNBFC2007555; Thu, 27 Apr 2023 00:09:12 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpcy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:12 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938W013888; Thu, 27 Apr 2023 00:09:11 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-6; Thu, 27 Apr 2023 00:09:11 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 05/21] mm: PKRAM: implement byte stream operations Date: Wed, 26 Apr 2023 17:08:41 -0700 Message-Id: <1682554137-13938-6-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-GUID: oHWiWle6IkzwQxWZ4r-eqosGLrXgVc0t X-Proofpoint-ORIG-GUID: oHWiWle6IkzwQxWZ4r-eqosGLrXgVc0t Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This patch adds the ability to save an arbitrary byte streams to a a PKRAM object using pkram_write() to be restored later using pkram_read(). Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 11 +++++ mm/pkram.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++= ++-- 2 files changed, 130 insertions(+), 4 deletions(-) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index 130ab5c2d94a..b614e9059bba 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -14,10 +14,12 @@ * enum pkram_data_flags - definition of data types contained in a pkram o= bj * @PKRAM_DATA_none: No data types configured * @PKRAM_DATA_folios: obj contains folio data + * @PKRAM_DATA_bytes: obj contains byte data */ enum pkram_data_flags { PKRAM_DATA_none =3D 0x0, /* No data types configured */ PKRAM_DATA_folios =3D 0x1, /* Contains folio data */ + PKRAM_DATA_bytes =3D 0x2, /* Contains byte data */ }; =20 struct pkram_data_stream { @@ -36,18 +38,27 @@ struct pkram_stream { =20 __u64 *folios_head_link_pfnp; __u64 *folios_tail_link_pfnp; + + __u64 *bytes_head_link_pfnp; + __u64 *bytes_tail_link_pfnp; }; =20 struct pkram_folios_access { unsigned long next_index; }; =20 +struct pkram_bytes_access { + struct page *data_page; /* current page */ + unsigned int data_offset; /* offset into current page */ +}; + struct pkram_access { enum pkram_data_flags dtype; struct pkram_stream *ps; struct pkram_data_stream pds; =20 struct pkram_folios_access folios; + struct pkram_bytes_access bytes; }; =20 #define PKRAM_NAME_MAX 256 /* including nul */ diff --git a/mm/pkram.c b/mm/pkram.c index 610ff7a88c98..eac8cf6b0cdf 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include #include +#include #include #include #include @@ -44,6 +45,9 @@ struct pkram_link { struct pkram_obj { __u64 folios_head_link_pfn; /* the first folios link of the object */ __u64 folios_tail_link_pfn; /* the last folios link of the object */ + __u64 bytes_head_link_pfn; /* the first bytes link of the object */ + __u64 bytes_tail_link_pfn; /* the last bytes link of the object */ + __u64 data_len; /* byte data size */ __u64 obj_pfn; /* points to the next object in the list */ }; =20 @@ -138,6 +142,11 @@ static void pkram_truncate_obj(struct pkram_obj *obj) pkram_truncate_links(obj->folios_head_link_pfn); obj->folios_head_link_pfn =3D 0; obj->folios_tail_link_pfn =3D 0; + + pkram_truncate_links(obj->bytes_head_link_pfn); + obj->bytes_head_link_pfn =3D 0; + obj->bytes_tail_link_pfn =3D 0; + obj->data_len =3D 0; } =20 static void pkram_truncate_node(struct pkram_node *node) @@ -310,7 +319,7 @@ int pkram_prepare_save_obj(struct pkram_stream *ps, enu= m pkram_data_flags flags) =20 BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_SAVE); =20 - if (flags & ~PKRAM_DATA_folios) + if (flags & ~(PKRAM_DATA_folios | PKRAM_DATA_bytes)) return -EINVAL; =20 page =3D pkram_alloc_page(ps->gfp_mask | __GFP_ZERO); @@ -326,6 +335,10 @@ int pkram_prepare_save_obj(struct pkram_stream *ps, en= um pkram_data_flags flags) ps->folios_head_link_pfnp =3D &obj->folios_head_link_pfn; ps->folios_tail_link_pfnp =3D &obj->folios_tail_link_pfn; } + if (flags & PKRAM_DATA_bytes) { + ps->bytes_head_link_pfnp =3D &obj->bytes_head_link_pfn; + ps->bytes_tail_link_pfnp =3D &obj->bytes_tail_link_pfn; + } ps->obj =3D obj; return 0; } @@ -432,7 +445,7 @@ int pkram_prepare_load_obj(struct pkram_stream *ps) return -ENODATA; =20 obj =3D pfn_to_kaddr(node->obj_pfn); - if (!obj->folios_head_link_pfn) { + if (!obj->folios_head_link_pfn && !obj->bytes_head_link_pfn) { WARN_ON(1); return -EINVAL; } @@ -443,6 +456,10 @@ int pkram_prepare_load_obj(struct pkram_stream *ps) ps->folios_head_link_pfnp =3D &obj->folios_head_link_pfn; ps->folios_tail_link_pfnp =3D &obj->folios_tail_link_pfn; } + if (obj->bytes_head_link_pfn) { + ps->bytes_head_link_pfnp =3D &obj->bytes_head_link_pfn; + ps->bytes_tail_link_pfnp =3D &obj->bytes_tail_link_pfn; + } ps->obj =3D obj; return 0; } @@ -493,6 +510,9 @@ void pkram_finish_access(struct pkram_access *pa, bool = status_ok) =20 if (pa->pds.link) pkram_truncate_link(pa->pds.link); + + if ((pa->dtype =3D=3D PKRAM_DATA_bytes) && (pa->bytes.data_page)) + pkram_free_page(page_address(pa->bytes.data_page)); } =20 /* @@ -547,6 +567,22 @@ int pkram_save_folio(struct pkram_access *pa, struct f= olio *folio) return __pkram_save_page(pa, page, page->index); } =20 +static int __pkram_bytes_save_page(struct pkram_access *pa, struct page *p= age) +{ + struct pkram_data_stream *pds =3D &pa->pds; + struct pkram_link *link =3D pds->link; + + if (!link || pds->entry_idx >=3D PKRAM_LINK_ENTRIES_MAX) { + link =3D pkram_new_link(pds, pa->ps->gfp_mask); + if (!link) + return -ENOMEM; + } + + pkram_add_link_entry(pds, page); + + return 0; +} + static struct page *__pkram_prep_load_page(pkram_entry_t p) { struct page *page; @@ -662,10 +698,53 @@ struct folio *pkram_load_folio(struct pkram_access *p= a, unsigned long *index) * * On success, returns the number of bytes written, which is always equal = to * @count. On failure, -errno is returned. + * + * Error values: + * %ENOMEM: insufficient amount of memory available */ ssize_t pkram_write(struct pkram_access *pa, const void *buf, size_t count) { - return -EINVAL; + struct pkram_node *node =3D pa->ps->node; + struct pkram_obj *obj =3D pa->ps->obj; + size_t copy_count, write_count =3D 0; + void *addr; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_SAVE); + + while (count > 0) { + if (!pa->bytes.data_page) { + gfp_t gfp_mask =3D pa->ps->gfp_mask; + struct page *page; + int err; + + page =3D pkram_alloc_page((gfp_mask & GFP_RECLAIM_MASK) | + __GFP_HIGHMEM | __GFP_ZERO); + if (!page) + return -ENOMEM; + err =3D __pkram_bytes_save_page(pa, page); + if (err) { + pkram_free_page(page_address(page)); + return err; + } + pa->bytes.data_page =3D page; + pa->bytes.data_offset =3D 0; + } + + copy_count =3D min_t(size_t, count, PAGE_SIZE - pa->bytes.data_offset); + addr =3D kmap_local_page(pa->bytes.data_page); + memcpy(addr + pa->bytes.data_offset, buf, copy_count); + kunmap_local(addr); + + buf +=3D copy_count; + obj->data_len +=3D copy_count; + pa->bytes.data_offset +=3D copy_count; + if (pa->bytes.data_offset >=3D PAGE_SIZE) + pa->bytes.data_page =3D NULL; + + write_count +=3D copy_count; + count -=3D copy_count; + } + return write_count; } =20 /** @@ -679,5 +758,41 @@ ssize_t pkram_write(struct pkram_access *pa, const voi= d *buf, size_t count) */ size_t pkram_read(struct pkram_access *pa, void *buf, size_t count) { - return 0; + struct pkram_node *node =3D pa->ps->node; + struct pkram_obj *obj =3D pa->ps->obj; + size_t copy_count, read_count =3D 0; + char *addr; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_LOAD); + + while (count > 0 && obj->data_len > 0) { + if (!pa->bytes.data_page) { + struct page *page; + + page =3D __pkram_load_page(pa, NULL); + if (IS_ERR_OR_NULL(page)) + break; + pa->bytes.data_page =3D page; + pa->bytes.data_offset =3D 0; + } + + copy_count =3D min_t(size_t, count, PAGE_SIZE - pa->bytes.data_offset); + if (copy_count > obj->data_len) + copy_count =3D obj->data_len; + addr =3D kmap_local_page(pa->bytes.data_page); + memcpy(buf, addr + pa->bytes.data_offset, copy_count); + kunmap_local(addr); + + buf +=3D copy_count; + obj->data_len -=3D copy_count; + pa->bytes.data_offset +=3D copy_count; + if (pa->bytes.data_offset >=3D PAGE_SIZE || !obj->data_len) { + put_page(pa->bytes.data_page); + pa->bytes.data_page =3D NULL; + } + + read_count +=3D copy_count; + count -=3D copy_count; + } + return read_count; } --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E783FC77B7F for ; Thu, 27 Apr 2023 00:10:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242733AbjD0AKJ (ORCPT ); Wed, 26 Apr 2023 20:10:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242697AbjD0AKC (ORCPT ); Wed, 26 Apr 2023 20:10:02 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 261FF3AB6 for ; Wed, 26 Apr 2023 17:10:01 -0700 (PDT) Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGx2Ov004937; Thu, 27 Apr 2023 00:09:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=Ubk61ivo6vmHzGraxUFDO1r49XLMJqa4VVETk4Cbi0M=; b=FbqfouMwai47X2F7jNvsJULqyrfxpV15OToo/loakJHJQEPmxoEdpqyIHRa9xadym2iu Zfj3nxbIdNpCk2h9qaTsdZeZJOhA/A8QwbQ9vqot2Pmw4tcVa5qExJaA5QclNyOSEjxX Nw0W6/FfI79xQncISher39+JSNcaco3uaQ8BhrKSNNqcACD58W1dtnCPmxTEo2M7bsaB yklcT/WnUpoYfsWS9mEFJt9lu0jnwkwk/wMTDtBP/V5JOuMNzVKJVYsIpHKD8QXsqlgI 2VLOSJBeOHOwEuIzNy8aRR5TyDsY/oq4xcl5TIXA81o+4TrlYj9fxI45DAynbDiVR2SL kg== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q46gbtshn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:14 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QNdMUZ007383; Thu, 27 Apr 2023 00:09:13 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpdh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:13 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938Y013888; Thu, 27 Apr 2023 00:09:13 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-7; Thu, 27 Apr 2023 00:09:12 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 06/21] mm: PKRAM: link nodes by pfn before reboot Date: Wed, 26 Apr 2023 17:08:42 -0700 Message-Id: <1682554137-13938-7-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-GUID: 4dfsl940M32SqdTr3kz-hSfm9AlSY8O0 X-Proofpoint-ORIG-GUID: 4dfsl940M32SqdTr3kz-hSfm9AlSY8O0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Since page structs are used for linking PKRAM nodes and cleared on boot, organize all PKRAM nodes into a list singly-linked by pfns before reboot to facilitate restoring the node list in the new kernel. Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- mm/pkram.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/mm/pkram.c b/mm/pkram.c index eac8cf6b0cdf..da166cb6afb7 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -2,12 +2,16 @@ #include #include #include +#include #include #include #include #include +#include #include +#include #include +#include #include #include #include @@ -60,11 +64,15 @@ struct pkram_obj { * singly-linked list of PKRAM link structures (see above), the node has a * pointer to the head of. * + * To facilitate data restore in the new kernel, before reboot all PKRAM n= odes + * are organized into a list singly-linked by pfn's (see pkram_reboot()). + * * The structure occupies a memory page. */ struct pkram_node { __u32 flags; __u64 obj_pfn; /* points to the first obj of the node */ + __u64 node_pfn; /* points to the next node in the node list */ =20 __u8 name[PKRAM_NAME_MAX]; }; @@ -73,6 +81,10 @@ struct pkram_node { #define PKRAM_LOAD 2 #define PKRAM_ACCMODE_MASK 3 =20 +/* + * For convenience sake PKRAM nodes are kept in an auxiliary doubly-linked= list + * connected through the lru field of the page struct. + */ static LIST_HEAD(pkram_nodes); /* linked through page::lru */ static DEFINE_MUTEX(pkram_mutex); /* serializes open/close */ =20 @@ -796,3 +808,41 @@ size_t pkram_read(struct pkram_access *pa, void *buf, = size_t count) } return read_count; } + +/* + * Build the list of PKRAM nodes. + */ +static void __pkram_reboot(void) +{ + struct page *page; + struct pkram_node *node; + unsigned long node_pfn =3D 0; + + list_for_each_entry_reverse(page, &pkram_nodes, lru) { + node =3D page_address(page); + if (WARN_ON(node->flags & PKRAM_ACCMODE_MASK)) + continue; + node->node_pfn =3D node_pfn; + node_pfn =3D page_to_pfn(page); + } +} + +static int pkram_reboot(struct notifier_block *notifier, + unsigned long val, void *v) +{ + if (val !=3D SYS_RESTART) + return NOTIFY_DONE; + __pkram_reboot(); + return NOTIFY_OK; +} + +static struct notifier_block pkram_reboot_notifier =3D { + .notifier_call =3D pkram_reboot, +}; + +static int __init pkram_init(void) +{ + register_reboot_notifier(&pkram_reboot_notifier); + return 0; +} +module_init(pkram_init); --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAE9BC7618E for ; Thu, 27 Apr 2023 00:10:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242811AbjD0AK0 (ORCPT ); Wed, 26 Apr 2023 20:10:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242710AbjD0AKE (ORCPT ); Wed, 26 Apr 2023 20:10:04 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 068A63AA4 for ; Wed, 26 Apr 2023 17:10:03 -0700 (PDT) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGxDTf025309; Thu, 27 Apr 2023 00:09:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=kCxexgjquhm43uIqnCzaCKme8/lW/JHqc5BFLC3zBqE=; b=vTo9B+vvAjWpZl5sA4oPL1lKBhlH5A1X8UCeXo14KQCHh4gDG9JGTe8jsA8SvhWEA6lX rCwXFo16cZYMNfr4aZeivRf7u3DFx56P7YgMJx3riVsOHDgVAViRH21ldwmNdxF2DBeb JqNRvOwjMdpuYY+yWWgDEzFFodnBKgAjYlQhjn5Sy84JLsimXJOqFOWvEWcUM7WH2Ae3 yxJTV377SDejfMR/5XEMdNkOD2spr/zEOAXfZVFI3I7cKVkYN4JLeqnXL9DqpOG7fOVy E/grQ+ERNydeVl9TX0aKf+NA2mkgf95PlZu+WMDNVMhFHDqzhiq9/daDDPxs4kFFdds/ 8A== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q46622ty0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:15 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QMwjOd007159; Thu, 27 Apr 2023 00:09:15 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpep-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:15 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938a013888; Thu, 27 Apr 2023 00:09:14 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-8; Thu, 27 Apr 2023 00:09:14 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 07/21] mm: PKRAM: introduce super block Date: Wed, 26 Apr 2023 17:08:43 -0700 Message-Id: <1682554137-13938-8-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-ORIG-GUID: JPUChGVr1Ox4-7b8jQIrZtvAKfFxOq23 X-Proofpoint-GUID: JPUChGVr1Ox4-7b8jQIrZtvAKfFxOq23 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" The PKRAM super block is the starting point for restoring preserved memory. By providing the super block to the new kernel at boot time, preserved memory can be reserved and made available to be restored. To point the kernel to the location of the super block, one passes its pfn via the 'pkram' boot param. For that purpose, the pkram super block pfn is exported via /sys/kernel/pkram. If none is passed, any preserved memory will not be kept, and a new super block will be allocated. Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- mm/pkram.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++= ++-- 1 file changed, 100 insertions(+), 2 deletions(-) diff --git a/mm/pkram.c b/mm/pkram.c index da166cb6afb7..c66b2ae4d520 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -5,15 +5,18 @@ #include #include #include +#include #include #include #include #include #include +#include #include #include #include #include +#include #include =20 #include "internal.h" @@ -82,12 +85,38 @@ struct pkram_node { #define PKRAM_ACCMODE_MASK 3 =20 /* + * The PKRAM super block contains data needed to restore the preserved mem= ory + * structure on boot. The pointer to it (pfn) should be passed via the 'pk= ram' + * boot param if one wants to restore preserved data saved by the previous= ly + * executing kernel. For that purpose the kernel exports the pfn via + * /sys/kernel/pkram. If none is passed, preserved memory if any will not = be + * preserved and a new clean page will be allocated for the super block. + * + * The structure occupies a memory page. + */ +struct pkram_super_block { + __u64 node_pfn; /* first element of the node list */ +}; + +static unsigned long pkram_sb_pfn __initdata; +static struct pkram_super_block *pkram_sb; + +/* * For convenience sake PKRAM nodes are kept in an auxiliary doubly-linked= list * connected through the lru field of the page struct. */ static LIST_HEAD(pkram_nodes); /* linked through page::lru */ static DEFINE_MUTEX(pkram_mutex); /* serializes open/close */ =20 +/* + * The PKRAM super block pfn, see above. + */ +static int __init parse_pkram_sb_pfn(char *arg) +{ + return kstrtoul(arg, 16, &pkram_sb_pfn); +} +early_param("pkram", parse_pkram_sb_pfn); + static inline struct page *pkram_alloc_page(gfp_t gfp_mask) { return alloc_page(gfp_mask); @@ -270,6 +299,7 @@ static void pkram_stream_init(struct pkram_stream *ps, * @gfp_mask specifies the memory allocation mask to be used when saving d= ata. * * Error values: + * %ENODEV: PKRAM not available * %ENAMETOOLONG: name len >=3D PKRAM_NAME_MAX * %ENOMEM: insufficient memory available * %EEXIST: node with specified name already exists @@ -285,6 +315,9 @@ int pkram_prepare_save(struct pkram_stream *ps, const c= har *name, gfp_t gfp_mask struct pkram_node *node; int err =3D 0; =20 + if (!pkram_sb) + return -ENODEV; + if (strlen(name) >=3D PKRAM_NAME_MAX) return -ENAMETOOLONG; =20 @@ -404,6 +437,7 @@ void pkram_discard_save(struct pkram_stream *ps) * Returns 0 on success, -errno on failure. * * Error values: + * %ENODEV: PKRAM not available * %ENOENT: node with specified name does not exist * %EBUSY: save to required node has not finished yet * @@ -414,6 +448,9 @@ int pkram_prepare_load(struct pkram_stream *ps, const c= har *name) struct pkram_node *node; int err =3D 0; =20 + if (!pkram_sb) + return -ENODEV; + mutex_lock(&pkram_mutex); node =3D pkram_find_node(name); if (!node) { @@ -825,6 +862,13 @@ static void __pkram_reboot(void) node->node_pfn =3D node_pfn; node_pfn =3D page_to_pfn(page); } + + /* + * Zero out pkram_sb completely since it may have been passed from + * the previous boot. + */ + memset(pkram_sb, 0, PAGE_SIZE); + pkram_sb->node_pfn =3D node_pfn; } =20 static int pkram_reboot(struct notifier_block *notifier, @@ -832,7 +876,8 @@ static int pkram_reboot(struct notifier_block *notifier, { if (val !=3D SYS_RESTART) return NOTIFY_DONE; - __pkram_reboot(); + if (pkram_sb) + __pkram_reboot(); return NOTIFY_OK; } =20 @@ -840,9 +885,62 @@ static int pkram_reboot(struct notifier_block *notifie= r, .notifier_call =3D pkram_reboot, }; =20 +static ssize_t show_pkram_sb_pfn(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + unsigned long pfn =3D pkram_sb ? PFN_DOWN(__pa(pkram_sb)) : 0; + + return sprintf(buf, "%lx\n", pfn); +} + +static struct kobj_attribute pkram_sb_pfn_attr =3D + __ATTR(pkram, 0444, show_pkram_sb_pfn, NULL); + +static struct attribute *pkram_attrs[] =3D { + &pkram_sb_pfn_attr.attr, + NULL, +}; + +static struct attribute_group pkram_attr_group =3D { + .attrs =3D pkram_attrs, +}; + +/* returns non-zero on success */ +static int __init pkram_init_sb(void) +{ + unsigned long pfn; + struct pkram_node *node; + + if (!pkram_sb) { + struct page *page; + + page =3D pkram_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) { + pr_err("PKRAM: Failed to allocate super block\n"); + return 0; + } + pkram_sb =3D page_address(page); + } + + /* + * Build auxiliary doubly-linked list of nodes connected through + * page::lru for convenience sake. + */ + pfn =3D pkram_sb->node_pfn; + while (pfn) { + node =3D pfn_to_kaddr(pfn); + pkram_insert_node(node); + pfn =3D node->node_pfn; + } + return 1; +} + static int __init pkram_init(void) { - register_reboot_notifier(&pkram_reboot_notifier); + if (pkram_init_sb()) { + register_reboot_notifier(&pkram_reboot_notifier); + sysfs_update_group(kernel_kobj, &pkram_attr_group); + } return 0; } module_init(pkram_init); --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D47DFC7618E for ; Thu, 27 Apr 2023 00:11:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242918AbjD0ALS (ORCPT ); Wed, 26 Apr 2023 20:11:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242795AbjD0AKQ (ORCPT ); Wed, 26 Apr 2023 20:10:16 -0400 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF49E40EF for ; Wed, 26 Apr 2023 17:10:11 -0700 (PDT) Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGx0ji015505; Thu, 27 Apr 2023 00:09:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=gsaWUneWyL6rRzD/yIHYfGhUn1k+T9X+WkpsWzv1iV8=; b=mDqUgTbTSU6cnPw3zIM1XyCiMw+fSQGgT3QxlLZPvuf6xhN7kjfpA5S65xHVuzL7FT1d MGjlWEjvHmFFK3pStMXSW4nUHTx3bOBt7b2EJakJCY7L7meqs0m0QV2Vtng255bRdgm8 /rWYGb8HvY3c+vhBXe6XD5MEw4a0TCADR/xoqYuuf47wVDeLuXAs4nhk4CCj/AQuq3dL MKTx8r86M10M50OkBFdr6kXy6b6ORkvf2QOT7QqahA3CX8uDHoX4t/UG7HZ1itPxWs8D GTo2afqCR/a4R6vypgdRkWzRNdCnjRCdrFsN6EfXrUHHNrXiRPPYj+Hya6fk5sCqJDgZ OA== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q460dampm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:17 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QMDCSI007353; Thu, 27 Apr 2023 00:09:16 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpfu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:16 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938c013888; Thu, 27 Apr 2023 00:09:15 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-9; Thu, 27 Apr 2023 00:09:15 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 08/21] PKRAM: track preserved pages in a physical mapping pagetable Date: Wed, 26 Apr 2023 17:08:44 -0700 Message-Id: <1682554137-13938-9-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-GUID: 6JJSP6KEDa_4tfH-GxMGt6XutfQgNfDT X-Proofpoint-ORIG-GUID: 6JJSP6KEDa_4tfH-GxMGt6XutfQgNfDT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Later patches in this series will need a way to efficiently identify physically contiguous ranges of preserved pages independent of their virtual addresses. To facilitate this all pages to be preserved across kexec are added to a pseudo identity mapping pagetable. The pagetable makes use of the existing architecture definitions for building a memory mapping pagetable except that a bitmap is used to represent the presence or absence of preserved pages at the PTE level. Signed-off-by: Anthony Yznaga --- mm/Makefile | 4 +- mm/pkram.c | 30 ++++- mm/pkram_pagetable.c | 375 +++++++++++++++++++++++++++++++++++++++++++++++= ++++ 3 files changed, 404 insertions(+), 5 deletions(-) create mode 100644 mm/pkram_pagetable.c diff --git a/mm/Makefile b/mm/Makefile index 7a8d5a286d48..7a1a33b67de6 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -138,5 +138,5 @@ obj-$(CONFIG_IO_MAPPING) +=3D io-mapping.o obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) +=3D bootmem_info.o obj-$(CONFIG_GENERIC_IOREMAP) +=3D ioremap.o obj-$(CONFIG_SHRINKER_DEBUG) +=3D shrinker_debug.o -obj-$(CONFIG_PKRAM) +=3D pkram.o ->>>>>>> mm: add PKRAM API stubs and Kconfig +obj-$(CONFIG_PKRAM) +=3D pkram.o pkram_pagetable.o +>>>>>>> PKRAM: track preserved pages in a physical mapping pagetable diff --git a/mm/pkram.c b/mm/pkram.c index c66b2ae4d520..e6c0f3c52465 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -101,6 +101,9 @@ struct pkram_super_block { static unsigned long pkram_sb_pfn __initdata; static struct pkram_super_block *pkram_sb; =20 +extern int pkram_add_identity_map(struct page *page); +extern void pkram_remove_identity_map(struct page *page); + /* * For convenience sake PKRAM nodes are kept in an auxiliary doubly-linked= list * connected through the lru field of the page struct. @@ -119,11 +122,24 @@ static int __init parse_pkram_sb_pfn(char *arg) =20 static inline struct page *pkram_alloc_page(gfp_t gfp_mask) { - return alloc_page(gfp_mask); + struct page *page; + int err; + + page =3D alloc_page(gfp_mask); + if (page) { + err =3D pkram_add_identity_map(page); + if (err) { + __free_page(page); + page =3D NULL; + } + } + + return page; } =20 static inline void pkram_free_page(void *addr) { + pkram_remove_identity_map(virt_to_page(addr)); free_page((unsigned long)addr); } =20 @@ -161,6 +177,7 @@ static void pkram_truncate_link(struct pkram_link *link) if (!p) continue; page =3D pfn_to_page(PHYS_PFN(p)); + pkram_remove_identity_map(page); put_page(page); } } @@ -610,10 +627,15 @@ int pkram_save_folio(struct pkram_access *pa, struct = folio *folio) { struct pkram_node *node =3D pa->ps->node; struct page *page =3D folio_page(folio, 0); + int err; =20 BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_SAVE); =20 - return __pkram_save_page(pa, page, page->index); + err =3D __pkram_save_page(pa, page, page->index); + if (!err) + err =3D pkram_add_identity_map(page); + + return err; } =20 static int __pkram_bytes_save_page(struct pkram_access *pa, struct page *p= age) @@ -658,6 +680,8 @@ static struct page *__pkram_prep_load_page(pkram_entry_= t p) =20 page_ref_unfreeze(page, 1); =20 + pkram_remove_identity_map(page); + return page; =20 out_error: @@ -914,7 +938,7 @@ static int __init pkram_init_sb(void) if (!pkram_sb) { struct page *page; =20 - page =3D pkram_alloc_page(GFP_KERNEL | __GFP_ZERO); + page =3D alloc_page(GFP_KERNEL | __GFP_ZERO); if (!page) { pr_err("PKRAM: Failed to allocate super block\n"); return 0; diff --git a/mm/pkram_pagetable.c b/mm/pkram_pagetable.c new file mode 100644 index 000000000000..85e34301ef1e --- /dev/null +++ b/mm/pkram_pagetable.c @@ -0,0 +1,375 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include + +static pgd_t *pkram_pgd; +static DEFINE_SPINLOCK(pkram_pgd_lock); + +#define set_p4d(p4dp, p4d) WRITE_ONCE(*(p4dp), (p4d)) + +#define PKRAM_PTE_BM_BYTES (PTRS_PER_PTE / BITS_PER_BYTE) +#define PKRAM_PTE_BM_MASK (PAGE_SIZE / PKRAM_PTE_BM_BYTES - 1) + +static pmd_t make_bitmap_pmd(unsigned long *bitmap) +{ + unsigned long val; + + val =3D __pa(ALIGN_DOWN((unsigned long)bitmap, PAGE_SIZE)); + val |=3D (((unsigned long)bitmap & ~PAGE_MASK) / PKRAM_PTE_BM_BYTES); + + return __pmd(val); +} + +static unsigned long *get_bitmap_addr(pmd_t pmd) +{ + unsigned long val, off; + + val =3D pmd_val(pmd); + off =3D (val & PKRAM_PTE_BM_MASK) * PKRAM_PTE_BM_BYTES; + + val =3D (val & PAGE_MASK) + off; + + return __va(val); +} + +int pkram_add_identity_map(struct page *page) +{ + unsigned long paddr; + unsigned long *bitmap; + unsigned int index; + struct page *pg; + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + + if (!pkram_pgd) { + spin_lock(&pkram_pgd_lock); + if (!pkram_pgd) { + pg =3D alloc_page(GFP_ATOMIC|__GFP_ZERO); + if (!pg) + goto nomem; + pkram_pgd =3D page_address(pg); + } + spin_unlock(&pkram_pgd_lock); + } + + paddr =3D __pa(page_address(page)); + pgd =3D pkram_pgd; + pgd +=3D pgd_index(paddr); + if (pgd_none(*pgd)) { + spin_lock(&pkram_pgd_lock); + if (pgd_none(*pgd)) { + pg =3D alloc_page(GFP_ATOMIC|__GFP_ZERO); + if (!pg) + goto nomem; + p4d =3D page_address(pg); + set_pgd(pgd, __pgd(__pa(p4d))); + } + spin_unlock(&pkram_pgd_lock); + } + p4d =3D p4d_offset(pgd, paddr); + if (p4d_none(*p4d)) { + spin_lock(&pkram_pgd_lock); + if (p4d_none(*p4d)) { + pg =3D alloc_page(GFP_ATOMIC|__GFP_ZERO); + if (!pg) + goto nomem; + pud =3D page_address(pg); + set_p4d(p4d, __p4d(__pa(pud))); + } + spin_unlock(&pkram_pgd_lock); + } + pud =3D pud_offset(p4d, paddr); + if (pud_none(*pud)) { + spin_lock(&pkram_pgd_lock); + if (pud_none(*pud)) { + pg =3D alloc_page(GFP_ATOMIC|__GFP_ZERO); + if (!pg) + goto nomem; + pmd =3D page_address(pg); + set_pud(pud, __pud(__pa(pmd))); + } + spin_unlock(&pkram_pgd_lock); + } + pmd =3D pmd_offset(pud, paddr); + if (pmd_none(*pmd)) { + spin_lock(&pkram_pgd_lock); + if (pmd_none(*pmd)) { + if (PageTransHuge(page)) { + set_pmd(pmd, pmd_mkhuge(*pmd)); + spin_unlock(&pkram_pgd_lock); + goto done; + } + bitmap =3D bitmap_zalloc(PTRS_PER_PTE, GFP_ATOMIC); + if (!bitmap) + goto nomem; + set_pmd(pmd, make_bitmap_pmd(bitmap)); + } else { + BUG_ON(pmd_leaf(*pmd)); + bitmap =3D get_bitmap_addr(*pmd); + } + spin_unlock(&pkram_pgd_lock); + } else { + BUG_ON(pmd_leaf(*pmd)); + bitmap =3D get_bitmap_addr(*pmd); + } + + index =3D pte_index(paddr); + BUG_ON(test_bit(index, bitmap)); + set_bit(index, bitmap); + smp_mb__after_atomic(); + if (bitmap_full(bitmap, PTRS_PER_PTE)) + set_pmd(pmd, pmd_mkhuge(*pmd)); +done: + return 0; +nomem: + return -ENOMEM; +} + +void pkram_remove_identity_map(struct page *page) +{ + unsigned long *bitmap; + unsigned long paddr; + unsigned int index; + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + + /* + * pkram_pgd will be null when freeing metadata pages after a reboot + */ + if (!pkram_pgd) + return; + + paddr =3D __pa(page_address(page)); + pgd =3D pkram_pgd; + pgd +=3D pgd_index(paddr); + if (pgd_none(*pgd)) { + WARN_ONCE(1, "PKRAM: %s: no pgd for 0x%lx\n", __func__, paddr); + return; + } + p4d =3D p4d_offset(pgd, paddr); + if (p4d_none(*p4d)) { + WARN_ONCE(1, "PKRAM: %s: no p4d for 0x%lx\n", __func__, paddr); + return; + } + pud =3D pud_offset(p4d, paddr); + if (pud_none(*pud)) { + WARN_ONCE(1, "PKRAM: %s: no pud for 0x%lx\n", __func__, paddr); + return; + } + pmd =3D pmd_offset(pud, paddr); + if (pmd_none(*pmd)) { + WARN_ONCE(1, "PKRAM: %s: no pmd for 0x%lx\n", __func__, paddr); + return; + } + if (PageTransHuge(page)) { + BUG_ON(!pmd_leaf(*pmd)); + pmd_clear(pmd); + return; + } + + if (pmd_leaf(*pmd)) { + spin_lock(&pkram_pgd_lock); + if (pmd_leaf(*pmd)) + set_pmd(pmd, __pmd(pte_val(pte_clrhuge(*(pte_t *)pmd)))); + spin_unlock(&pkram_pgd_lock); + } + + bitmap =3D get_bitmap_addr(*pmd); + index =3D pte_index(paddr); + clear_bit(index, bitmap); + smp_mb__after_atomic(); + + spin_lock(&pkram_pgd_lock); + if (!pmd_none(*pmd) && bitmap_empty(bitmap, PTRS_PER_PTE)) { + pmd_clear(pmd); + spin_unlock(&pkram_pgd_lock); + bitmap_free(bitmap); + } else { + spin_unlock(&pkram_pgd_lock); + } +} + +struct pkram_pg_state { + int (*range_cb)(unsigned long base, unsigned long size, void *private); + unsigned long start_addr; + unsigned long curr_addr; + unsigned long min_addr; + unsigned long max_addr; + void *private; + bool tracking; +}; + +#define pgd_none(a) (pgtable_l5_enabled() ? pgd_none(a) : p4d_none(__p4d(= pgd_val(a)))) + +static int note_page(struct pkram_pg_state *st, unsigned long addr, bool p= resent) +{ + if (!st->tracking && present) { + if (addr >=3D st->max_addr) + return 1; + /* + * addr can be < min_addr if the page straddles the + * boundary + */ + st->start_addr =3D max(addr, st->min_addr); + st->tracking =3D true; + } else if (st->tracking) { + unsigned long base, size; + int ret; + + /* Continue tracking if upper bound has not been reached */ + if (present && addr < st->max_addr) + return 0; + + addr =3D min(addr, st->max_addr); + + base =3D st->start_addr; + size =3D addr - st->start_addr; + st->tracking =3D false; + + ret =3D st->range_cb(base, size, st->private); + + if (addr =3D=3D st->max_addr) + return 1; + else + return ret; + } + + return 0; +} + +static int walk_pte_level(struct pkram_pg_state *st, pmd_t addr, unsigned = long P) +{ + unsigned long *bitmap; + int present; + int i, ret; + + bitmap =3D get_bitmap_addr(addr); + for (i =3D 0; i < PTRS_PER_PTE; i++) { + unsigned long curr_addr =3D P + i * PAGE_SIZE; + + if (curr_addr < st->min_addr) + continue; + present =3D test_bit(i, bitmap); + ret =3D note_page(st, curr_addr, present); + if (ret) + break; + } + + return ret; +} + +static int walk_pmd_level(struct pkram_pg_state *st, pud_t addr, unsigned = long P) +{ + pmd_t *start; + int i, ret; + + start =3D pud_pgtable(addr); + for (i =3D 0; i < PTRS_PER_PMD; i++, start++) { + unsigned long curr_addr =3D P + i * PMD_SIZE; + + if (curr_addr + PMD_SIZE <=3D st->min_addr) + continue; + if (!pmd_none(*start)) { + if (pmd_leaf(*start)) + ret =3D note_page(st, curr_addr, true); + else + ret =3D walk_pte_level(st, *start, curr_addr); + } else + ret =3D note_page(st, curr_addr, false); + if (ret) + break; + } + + return ret; +} + +static int walk_pud_level(struct pkram_pg_state *st, p4d_t addr, unsigned = long P) +{ + pud_t *start; + int i, ret; + + start =3D p4d_pgtable(addr); + for (i =3D 0; i < PTRS_PER_PUD; i++, start++) { + unsigned long curr_addr =3D P + i * PUD_SIZE; + + if (curr_addr + PUD_SIZE <=3D st->min_addr) + continue; + if (!pud_none(*start)) { + if (pud_leaf(*start)) + ret =3D note_page(st, curr_addr, true); + else + ret =3D walk_pmd_level(st, *start, curr_addr); + } else + ret =3D note_page(st, curr_addr, false); + if (ret) + break; + } + + return ret; +} + +static int walk_p4d_level(struct pkram_pg_state *st, pgd_t addr, unsigned = long P) +{ + p4d_t *start; + int i, ret; + + if (PTRS_PER_P4D =3D=3D 1) + return walk_pud_level(st, __p4d(pgd_val(addr)), P); + + start =3D (p4d_t *)pgd_page_vaddr(addr); + for (i =3D 0; i < PTRS_PER_P4D; i++, start++) { + unsigned long curr_addr =3D P + i * P4D_SIZE; + + if (curr_addr + P4D_SIZE <=3D st->min_addr) + continue; + if (!p4d_none(*start)) { + if (p4d_leaf(*start)) + ret =3D note_page(st, curr_addr, true); + else + ret =3D walk_pud_level(st, *start, curr_addr); + } else + ret =3D note_page(st, curr_addr, false); + if (ret) + break; + } + + return ret; +} + +void pkram_walk_pgt(struct pkram_pg_state *st, pgd_t *pgd) +{ + pgd_t *start =3D pgd; + int i, ret =3D 0; + + for (i =3D 0; i < PTRS_PER_PGD; i++, start++) { + unsigned long curr_addr =3D i * PGDIR_SIZE; + + if (curr_addr + PGDIR_SIZE <=3D st->min_addr) + continue; + if (!pgd_none(*start)) + ret =3D walk_p4d_level(st, *start, curr_addr); + else + ret =3D note_page(st, curr_addr, false); + if (ret) + break; + } +} + +void pkram_find_preserved(unsigned long start, unsigned long end, void *pr= ivate, int (*callback)(unsigned long base, unsigned long size, void *privat= e)) +{ + struct pkram_pg_state st =3D { + .range_cb =3D callback, + .min_addr =3D start, + .max_addr =3D end, + .private =3D private, + }; + + if (!pkram_pgd) + return; + + pkram_walk_pgt(&st, pkram_pgd); +} --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D77EDC7618E for ; Thu, 27 Apr 2023 00:11:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242913AbjD0ALN (ORCPT ); Wed, 26 Apr 2023 20:11:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242787AbjD0AKQ (ORCPT ); Wed, 26 Apr 2023 20:10:16 -0400 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57FA93AB0 for ; Wed, 26 Apr 2023 17:10:11 -0700 (PDT) Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGwraY017018; Thu, 27 Apr 2023 00:09:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=TFcTuHsIv/plzZmdlp1y8kigeEeyXo4KMEAxl9EnHkE=; b=Zggh1E8czE4x4ifpjvsDspPPPFrTZtGY0vjQy73xEQSQJCKBnE4cJTj4k5BJ5EXzz1Su KsiwjlUDFJNM0uU2OJB/zM5Bm9iqCJeuDWrpQ/MKoHxEXp0zMW1mIqKa3m5N4CQormOS z7FEU3qe7EeyswcxhsWF8wXz9MgbSVFiSArjA8dRyMlNb/JnPJC7N6VrgImA1wYTkyxp ctCaI/TG+W5xh1cBKNImbW8wT7qmLf4zR/nm8RB1KqjFdQRqFLyQoSuKmPAtJLhdvsAQ FQE42ee6cVT5V1dU/kk3KMCHIgvH2DN3bPq0HhRluiybIMW2fDFh36w+3T5LamXb1skG hA== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q476u2ng3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:18 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QLgiYb007654; Thu, 27 Apr 2023 00:09:18 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpgv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:17 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938e013888; Thu, 27 Apr 2023 00:09:17 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-10; Thu, 27 Apr 2023 00:09:17 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 09/21] PKRAM: pass a list of preserved ranges to the next kernel Date: Wed, 26 Apr 2023 17:08:45 -0700 Message-Id: <1682554137-13938-10-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-GUID: xTHpC9CzAHAd5dZprMjOHVznQrKD_GfJ X-Proofpoint-ORIG-GUID: xTHpC9CzAHAd5dZprMjOHVznQrKD_GfJ Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" In order to build a new memblock reserved list during boot that includes ranges preserved by the previous kernel, a list of preserved ranges is passed to the next kernel via the pkram superblock. The ranges are stored in ascending order in a linked list of pages. A more complete memblock list is not prepared to avoid possible conflicts with changes in a newer kernel and to avoid having to allocate a contiguous range larger than a page. Signed-off-by: Anthony Yznaga --- mm/pkram.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++= +--- 1 file changed, 177 insertions(+), 7 deletions(-) diff --git a/mm/pkram.c b/mm/pkram.c index e6c0f3c52465..3790e5180feb 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -84,6 +84,20 @@ struct pkram_node { #define PKRAM_LOAD 2 #define PKRAM_ACCMODE_MASK 3 =20 +struct pkram_region { + phys_addr_t base; + phys_addr_t size; +}; + +struct pkram_region_list { + __u64 prev_pfn; + __u64 next_pfn; + + struct pkram_region regions[0]; +}; + +#define PKRAM_REGIONS_LIST_MAX \ + ((PAGE_SIZE-sizeof(struct pkram_region_list))/sizeof(struct pkram_region)) /* * The PKRAM super block contains data needed to restore the preserved mem= ory * structure on boot. The pointer to it (pfn) should be passed via the 'pk= ram' @@ -96,13 +110,21 @@ struct pkram_node { */ struct pkram_super_block { __u64 node_pfn; /* first element of the node list */ + __u64 region_list_pfn; + __u64 nr_regions; }; =20 +static struct pkram_region_list *pkram_regions_list; +static int pkram_init_regions_list(void); +static unsigned long pkram_populate_regions_list(void); + static unsigned long pkram_sb_pfn __initdata; static struct pkram_super_block *pkram_sb; =20 extern int pkram_add_identity_map(struct page *page); extern void pkram_remove_identity_map(struct page *page); +extern void pkram_find_preserved(unsigned long start, unsigned long end, v= oid *private, + int (*callback)(unsigned long base, unsigned long size, void *private)); =20 /* * For convenience sake PKRAM nodes are kept in an auxiliary doubly-linked= list @@ -878,21 +900,48 @@ static void __pkram_reboot(void) struct page *page; struct pkram_node *node; unsigned long node_pfn =3D 0; + unsigned long rl_pfn =3D 0; + unsigned long nr_regions =3D 0; + int err =3D 0; =20 - list_for_each_entry_reverse(page, &pkram_nodes, lru) { - node =3D page_address(page); - if (WARN_ON(node->flags & PKRAM_ACCMODE_MASK)) - continue; - node->node_pfn =3D node_pfn; - node_pfn =3D page_to_pfn(page); + if (!list_empty(&pkram_nodes)) { + err =3D pkram_add_identity_map(virt_to_page(pkram_sb)); + if (err) { + pr_err("PKRAM: failed to add super block to pagetable\n"); + goto done; + } + list_for_each_entry_reverse(page, &pkram_nodes, lru) { + node =3D page_address(page); + if (WARN_ON(node->flags & PKRAM_ACCMODE_MASK)) + continue; + node->node_pfn =3D node_pfn; + node_pfn =3D page_to_pfn(page); + } + err =3D pkram_init_regions_list(); + if (err) { + pr_err("PKRAM: failed to init regions list\n"); + goto done; + } + nr_regions =3D pkram_populate_regions_list(); + if (IS_ERR_VALUE(nr_regions)) { + err =3D nr_regions; + pr_err("PKRAM: failed to populate regions list\n"); + goto done; + } + rl_pfn =3D page_to_pfn(virt_to_page(pkram_regions_list)); } =20 +done: /* * Zero out pkram_sb completely since it may have been passed from * the previous boot. */ memset(pkram_sb, 0, PAGE_SIZE); - pkram_sb->node_pfn =3D node_pfn; + if (!err && node_pfn) { + pkram_sb->node_pfn =3D node_pfn; + pkram_sb->region_list_pfn =3D rl_pfn; + pkram_sb->nr_regions =3D nr_regions; + } } =20 static int pkram_reboot(struct notifier_block *notifier, @@ -968,3 +1017,124 @@ static int __init pkram_init(void) return 0; } module_init(pkram_init); + +static int count_region_cb(unsigned long base, unsigned long size, void *p= rivate) +{ + unsigned long *nr_regions =3D (unsigned long *)private; + + (*nr_regions)++; + return 0; +} + +static unsigned long pkram_count_regions(void) +{ + unsigned long nr_regions =3D 0; + + pkram_find_preserved(0, PHYS_ADDR_MAX, &nr_regions, count_region_cb); + + return nr_regions; +} + +/* + * To faciliate rapidly building a new memblock reserved list during boot + * with the addition of preserved memory ranges a regions list is built + * before reboot. + * The regions list is a linked list of pages with each page containing an + * array of preserved memory ranges. The ranges are stored in each page + * and across the list in address order. A linked list is used rather than + * a single contiguous range to mitigate against the possibility that a + * larger, contiguous allocation may fail due to fragmentation. + * + * Since the pages of the regions list must be preserved and the pkram + * pagetable is used to determine what ranges are preserved, the list pages + * must be allocated and represented in the pkram pagetable before they can + * be populated. Rather than recounting the number of regions after + * allocating pages and repeating until a precise number of pages are + * allocated, the number of pages needed is estimated. + */ +static int pkram_init_regions_list(void) +{ + struct pkram_region_list *rl; + unsigned long nr_regions; + unsigned long nr_lpages; + struct page *page; + + nr_regions =3D pkram_count_regions(); + + nr_lpages =3D DIV_ROUND_UP(nr_regions, PKRAM_REGIONS_LIST_MAX); + nr_regions +=3D nr_lpages; + nr_lpages =3D DIV_ROUND_UP(nr_regions, PKRAM_REGIONS_LIST_MAX); + + for (; nr_lpages; nr_lpages--) { + page =3D pkram_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) + return -ENOMEM; + rl =3D page_address(page); + if (pkram_regions_list) { + rl->next_pfn =3D page_to_pfn(virt_to_page(pkram_regions_list)); + pkram_regions_list->prev_pfn =3D page_to_pfn(page); + } + pkram_regions_list =3D rl; + } + + return 0; +} + +struct pkram_regions_priv { + struct pkram_region_list *curr; + struct pkram_region_list *last; + unsigned long nr_regions; + int idx; +}; + +static int add_region_cb(unsigned long base, unsigned long size, void *pri= vate) +{ + struct pkram_regions_priv *priv; + struct pkram_region_list *rl; + int i; + + priv =3D (struct pkram_regions_priv *)private; + rl =3D priv->curr; + i =3D priv->idx; + + if (!rl) { + WARN_ON(1); + return 1; + } + + if (!i) + priv->last =3D priv->curr; + + rl->regions[i].base =3D base; + rl->regions[i].size =3D size; + + priv->nr_regions++; + i++; + if (i =3D=3D PKRAM_REGIONS_LIST_MAX) { + u64 next_pfn =3D rl->next_pfn; + + if (next_pfn) + priv->curr =3D pfn_to_kaddr(next_pfn); + else + priv->curr =3D NULL; + + i =3D 0; + } + priv->idx =3D i; + + return 0; +} + +static unsigned long pkram_populate_regions_list(void) +{ + struct pkram_regions_priv priv =3D { .curr =3D pkram_regions_list }; + + pkram_find_preserved(0, PHYS_ADDR_MAX, &priv, add_region_cb); + + /* + * Link the first node to the last populated one. + */ + pkram_regions_list->prev_pfn =3D page_to_pfn(virt_to_page(priv.last)); + + return priv.nr_regions; +} --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5767C77B60 for ; Thu, 27 Apr 2023 00:10:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242780AbjD0AKb (ORCPT ); Wed, 26 Apr 2023 20:10:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242718AbjD0AKF (ORCPT ); Wed, 26 Apr 2023 20:10:05 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6EA5B3AB6 for ; Wed, 26 Apr 2023 17:10:03 -0700 (PDT) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGxEQx025323; Thu, 27 Apr 2023 00:09:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=HqaUuCd1TrikYSZqcmSr/0DHMDDKEbvkxeVBZQXF+4s=; b=ymokuz3wRntwI30/UPMDy8uxtvahfMueGv8Da1N3qrPtLBUqqTRtFtKbvokd96e5oZPc pGY+TCBPIuSXwvnuN1QwEW7r2+WpUN6T68bsCFTIQXhsaByghzHkSw63dHZAIArr8Zok vRy10JtQfeUPrT0F4r+cKjI4JYbseEZWXcPUr7KtYBgw5zozoWZePDLxldsjq0Z8Fbe5 SJR+63Zf6mzrcGiBcfd8IT3o7crf+MDRZs6cwBzPEY7dvharmXfVNp0dVkumlGxD/XU6 S108cMR7bmO/7+BFit4IaIqe96T4ktAHBuex20hQ1G56cNGPf/8/HB8tDXQFc3myQwQr Bw== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q46622ty4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:20 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QNV55B007557; Thu, 27 Apr 2023 00:09:19 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpjm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:19 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938g013888; Thu, 27 Apr 2023 00:09:19 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-11; Thu, 27 Apr 2023 00:09:18 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 10/21] PKRAM: prepare for adding preserved ranges to memblock reserved Date: Wed, 26 Apr 2023 17:08:46 -0700 Message-Id: <1682554137-13938-11-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-ORIG-GUID: Pb7Yjfl5p58r9mry-DL31w9dKlO5igMJ X-Proofpoint-GUID: Pb7Yjfl5p58r9mry-DL31w9dKlO5igMJ Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Calling memblock_reserve() repeatedly to add preserved ranges is inefficient and risks clobbering preserved memory if the memblock reserved regions array must be resized. Instead, calculate the size needed to accommodate the preserved ranges, find a suitable range for a new reserved regions array that does not overlap any preserved range, and populate it with a new, merged regions array. Signed-off-by: Anthony Yznaga --- mm/pkram.c | 244 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 244 insertions(+) diff --git a/mm/pkram.c b/mm/pkram.c index 3790e5180feb..c649504fa1fa 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -1138,3 +1139,246 @@ static unsigned long pkram_populate_regions_list(vo= id) =20 return priv.nr_regions; } + +struct pkram_region *pkram_first_region(struct pkram_super_block *sb, + struct pkram_region_list **rlp, int *idx) +{ + WARN_ON(!sb); + WARN_ON(!sb->region_list_pfn); + + if (!sb || !sb->region_list_pfn) + return NULL; + + *rlp =3D pfn_to_kaddr(sb->region_list_pfn); + *idx =3D 0; + + return &(*rlp)->regions[0]; +} + +struct pkram_region *pkram_next_region(struct pkram_region_list **rlp, int= *idx) +{ + struct pkram_region_list *rl =3D *rlp; + int i =3D *idx; + + i++; + if (i >=3D PKRAM_REGIONS_LIST_MAX) { + if (!rl->next_pfn) { + pr_err("PKRAM: %s: no more pkram_region_list pages\n", __func__); + return NULL; + } + rl =3D pfn_to_kaddr(rl->next_pfn); + *rlp =3D rl; + i =3D 0; + } + *idx =3D i; + + if (rl->regions[i].size =3D=3D 0) + return NULL; + + return &rl->regions[i]; +} + +struct pkram_region *pkram_first_region_topdown(struct pkram_super_block *= sb, + struct pkram_region_list **rlp, int *idx) +{ + struct pkram_region_list *rl; + + WARN_ON(!sb); + WARN_ON(!sb->region_list_pfn); + + if (!sb || !sb->region_list_pfn) + return NULL; + + rl =3D pfn_to_kaddr(sb->region_list_pfn); + if (!rl->prev_pfn) { + WARN_ON(1); + return NULL; + } + rl =3D pfn_to_kaddr(rl->prev_pfn); + + *rlp =3D rl; + + *idx =3D (sb->nr_regions - 1) % PKRAM_REGIONS_LIST_MAX; + + return &rl->regions[*idx]; +} + +struct pkram_region *pkram_next_region_topdown(struct pkram_region_list **= rlp, int *idx) +{ + struct pkram_region_list *rl =3D *rlp; + int i =3D *idx; + + if (i =3D=3D 0) { + if (!rl->prev_pfn) + return NULL; + rl =3D pfn_to_kaddr(rl->prev_pfn); + *rlp =3D rl; + i =3D PKRAM_REGIONS_LIST_MAX - 1; + } else + i--; + + *idx =3D i; + + return &rl->regions[i]; +} + +/* + * Use the pkram regions list to allocate a block of memory that does + * not overlap with preserved pages. + */ +phys_addr_t __init alloc_topdown(phys_addr_t size) +{ + phys_addr_t hole_start, hole_end, hole_size; + struct pkram_region_list *rl; + struct pkram_region *r; + phys_addr_t addr =3D 0; + int idx; + + hole_end =3D memblock.current_limit; + r =3D pkram_first_region_topdown(pkram_sb, &rl, &idx); + + while (r) { + hole_start =3D r->base + r->size; + hole_size =3D hole_end - hole_start; + + if (hole_size >=3D size) { + addr =3D memblock_phys_alloc_range(size, PAGE_SIZE, + hole_start, hole_end); + if (addr) + break; + } + + hole_end =3D r->base; + r =3D pkram_next_region_topdown(&rl, &idx); + } + + if (!addr) + addr =3D memblock_phys_alloc_range(size, PAGE_SIZE, 0, hole_end); + + return addr; +} + +int __init pkram_create_merged_reserved(struct memblock_type *new) +{ + unsigned long cnt_a; + unsigned long cnt_b; + long i, j, k; + struct memblock_region *r; + struct memblock_region *rgn; + struct pkram_region *pkr; + struct pkram_region_list *rl; + int idx; + unsigned long total_size =3D 0; + unsigned long nr_preserved =3D 0; + + cnt_a =3D memblock.reserved.cnt; + cnt_b =3D pkram_sb->nr_regions; + + i =3D 0; + j =3D 0; + k =3D 0; + + pkr =3D pkram_first_region(pkram_sb, &rl, &idx); + if (!pkr) + return -EINVAL; + while (i < cnt_a && j < cnt_b && pkr) { + r =3D &memblock.reserved.regions[i]; + rgn =3D &new->regions[k]; + + if (r->base + r->size <=3D pkr->base) { + *rgn =3D *r; + i++; + } else if (pkr->base + pkr->size <=3D r->base) { + rgn->base =3D pkr->base; + rgn->size =3D pkr->size; + memblock_set_region_node(rgn, MAX_NUMNODES); + + nr_preserved +=3D (rgn->size >> PAGE_SHIFT); + pkr =3D pkram_next_region(&rl, &idx); + j++; + } else { + pr_err("PKRAM: unexpected overlap:\n"); + pr_err("PKRAM: reserved: base=3D%pa,size=3D%pa,flags=3D0x%x\n", &r->bas= e, + &r->size, (int)r->flags); + pr_err("PKRAM: pkram: base=3D%pa,size=3D%pa\n", &pkr->base, &pkr->size); + return -EBUSY; + } + total_size +=3D rgn->size; + k++; + } + + while (i < cnt_a) { + r =3D &memblock.reserved.regions[i]; + rgn =3D &new->regions[k]; + + *rgn =3D *r; + + total_size +=3D rgn->size; + i++; + k++; + } + while (j < cnt_b && pkr) { + rgn =3D &new->regions[k]; + rgn->base =3D pkr->base; + rgn->size =3D pkr->size; + memblock_set_region_node(rgn, MAX_NUMNODES); + + nr_preserved +=3D (rgn->size >> PAGE_SHIFT); + total_size +=3D rgn->size; + pkr =3D pkram_next_region(&rl, &idx); + j++; + k++; + } + + WARN_ON(cnt_a + cnt_b !=3D k); + new->cnt =3D cnt_a + cnt_b; + new->total_size =3D total_size; + + return 0; +} + +/* + * Reserve pages that belong to preserved memory. This is accomplished by + * merging the existing reserved ranges with the preserved ranges into + * a new, sufficiently sized memblock reserved array. + * + * This function should be called at boot time as early as possible to pre= vent + * preserved memory from being recycled. + */ +int __init pkram_merge_with_reserved(void) +{ + struct memblock_type new; + unsigned long new_max; + phys_addr_t new_size; + phys_addr_t addr; + int err; + + /* + * Need space to insert one more range into memblock.reserved + * without memblock_double_array() being called. + */ + if (memblock.reserved.cnt =3D=3D memblock.reserved.max) { + WARN_ONCE(1, "PKRAM: no space for new memblock list\n"); + return -ENOMEM; + } + + new_max =3D memblock.reserved.max + pkram_sb->nr_regions; + new_size =3D PAGE_ALIGN(sizeof(struct memblock_region) * new_max); + + addr =3D alloc_topdown(new_size); + if (!addr) + return -ENOMEM; + + new.regions =3D __va(addr); + new.max =3D new_max; + err =3D pkram_create_merged_reserved(&new); + if (err) + return err; + + memblock.reserved.cnt =3D new.cnt; + memblock.reserved.max =3D new.max; + memblock.reserved.total_size =3D new.total_size; + memblock.reserved.regions =3D new.regions; + + return 0; +} --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 685B8C7EE25 for ; Thu, 27 Apr 2023 00:10:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242700AbjD0AKR (ORCPT ); Wed, 26 Apr 2023 20:10:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242704AbjD0AKD (ORCPT ); Wed, 26 Apr 2023 20:10:03 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9EEA3C0C for ; Wed, 26 Apr 2023 17:10:01 -0700 (PDT) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGxES2025338; Thu, 27 Apr 2023 00:09:22 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=qCn/KjOTilZ1uLUD9nUQPCZhyulY7CLY6Ex38w3zacI=; b=Bm/jAfF05hZ41Po/Sldj/OVZEgh4vornvAX3ywU3g0/yLbdoYbJMFOMTp7C5akCX3fcZ +pInqqlhBeWA1j5TPNcZ+6iEF6gS/F5sq1lf/buPv1WXX5wsj5gKfhyz8y8F710gFQvy q3dmGfVJQxHlYBNf/drz3+bPaSYVMM92CKhQlUQu4PKBx+edhfUxVnE93LXz0l2c3jod c0Yhtsq4U5CR9dYfi1acKahr/PyUR0HznCt6VPM22Bm0p28plaj/4tvVXewIDniGtxfB U6VxuygBOvSiOhVsBYRZQ4FOX7K5ngh3ZyBqGktoa1h2pAyzX/SDB/RijrDDl5gqc2tg Fw== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q46622ty5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:21 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QNUjaf007380; Thu, 27 Apr 2023 00:09:21 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpkr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:21 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938i013888; Thu, 27 Apr 2023 00:09:20 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-12; Thu, 27 Apr 2023 00:09:20 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 11/21] mm: PKRAM: reserve preserved memory at boot Date: Wed, 26 Apr 2023 17:08:47 -0700 Message-Id: <1682554137-13938-12-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-ORIG-GUID: ePrmjbJOEXsn1MF-WbF_e2bo2djCeESw X-Proofpoint-GUID: ePrmjbJOEXsn1MF-WbF_e2bo2djCeESw Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Keep preserved pages from being recycled during boot by adding them to the memblock reserved list during early boot. If memory reservation fails (e.g. a region has already been reserved), all preserved pages are dropped. Signed-off-by: Anthony Yznaga --- arch/x86/kernel/setup.c | 3 ++ arch/x86/mm/init_64.c | 2 ++ include/linux/pkram.h | 8 +++++ mm/pkram.c | 84 +++++++++++++++++++++++++++++++++++++++++++++= +--- 4 files changed, 92 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 16babff771bd..2806b21236d0 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -1221,6 +1222,8 @@ void __init setup_arch(char **cmdline_p) initmem_init(); dma_contiguous_reserve(max_pfn_mapped << PAGE_SHIFT); =20 + pkram_reserve(); + if (boot_cpu_has(X86_FEATURE_GBPAGES)) hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT); =20 diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index a190aae8ceaf..a46ffb434f39 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -34,6 +34,7 @@ #include #include #include +#include =20 #include #include @@ -1339,6 +1340,7 @@ void __init mem_init(void) after_bootmem =3D 1; x86_init.hyper.init_after_bootmem(); =20 + totalram_pages_add(pkram_reserved_pages); /* * Must be done after boot memory is put on freelist, because here we * might set fields in deferred struct pages that have not yet been diff --git a/include/linux/pkram.h b/include/linux/pkram.h index b614e9059bba..53d5a1ec42ff 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -99,4 +99,12 @@ int pkram_prepare_save(struct pkram_stream *ps, const ch= ar *name, ssize_t pkram_write(struct pkram_access *pa, const void *buf, size_t count= ); size_t pkram_read(struct pkram_access *pa, void *buf, size_t count); =20 +#ifdef CONFIG_PKRAM +extern unsigned long pkram_reserved_pages; +void pkram_reserve(void); +#else +#define pkram_reserved_pages 0UL +static inline void pkram_reserve(void) { } +#endif + #endif /* _LINUX_PKRAM_H */ diff --git a/mm/pkram.c b/mm/pkram.c index c649504fa1fa..b711f94dbef4 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -134,6 +134,8 @@ extern void pkram_find_preserved(unsigned long start, u= nsigned long end, void *p static LIST_HEAD(pkram_nodes); /* linked through page::lru */ static DEFINE_MUTEX(pkram_mutex); /* serializes open/close */ =20 +unsigned long __initdata pkram_reserved_pages; + /* * The PKRAM super block pfn, see above. */ @@ -143,6 +145,59 @@ static int __init parse_pkram_sb_pfn(char *arg) } early_param("pkram", parse_pkram_sb_pfn); =20 +static void * __init pkram_map_meta(unsigned long pfn) +{ + if (pfn >=3D max_low_pfn) + return ERR_PTR(-EINVAL); + return pfn_to_kaddr(pfn); +} + +int pkram_merge_with_reserved(void); +/* + * Reserve pages that belong to preserved memory. + * + * This function should be called at boot time as early as possible to pre= vent + * preserved memory from being recycled. + */ +void __init pkram_reserve(void) +{ + int err =3D 0; + + if (!pkram_sb_pfn) + return; + + pr_info("PKRAM: Examining preserved memory...\n"); + + /* Verify that nothing else has reserved the pkram_sb page */ + if (memblock_is_region_reserved(PFN_PHYS(pkram_sb_pfn), PAGE_SIZE)) { + err =3D -EBUSY; + goto out; + } + + pkram_sb =3D pkram_map_meta(pkram_sb_pfn); + if (IS_ERR(pkram_sb)) { + err =3D PTR_ERR(pkram_sb); + goto out; + } + /* An empty pkram_sb is not an error */ + if (!pkram_sb->node_pfn) { + pkram_sb =3D NULL; + goto done; + } + + err =3D pkram_merge_with_reserved(); +out: + if (err) { + pr_err("PKRAM: Reservation failed: %d\n", err); + WARN_ON(pkram_reserved_pages > 0); + pkram_sb =3D NULL; + return; + } + +done: + pr_info("PKRAM: %lu pages reserved\n", pkram_reserved_pages); +} + static inline struct page *pkram_alloc_page(gfp_t gfp_mask) { struct page *page; @@ -162,6 +217,7 @@ static inline struct page *pkram_alloc_page(gfp_t gfp_m= ask) =20 static inline void pkram_free_page(void *addr) { + __ClearPageReserved(virt_to_page(addr)); pkram_remove_identity_map(virt_to_page(addr)); free_page((unsigned long)addr); } @@ -193,13 +249,23 @@ static void pkram_truncate_link(struct pkram_link *li= nk) { struct page *page; pkram_entry_t p; - int i; + int i, j, order; =20 for (i =3D 0; i < PKRAM_LINK_ENTRIES_MAX; i++) { p =3D link->entry[i]; if (!p) continue; + order =3D p & PKRAM_ENTRY_ORDER_MASK; + if (order >=3D MAX_ORDER) { + pr_err("PKRAM: attempted truncate of invalid page\n"); + return; + } page =3D pfn_to_page(PHYS_PFN(p)); + for (j =3D 0; j < (1 << order); j++) { + struct page *pg =3D page + j; + + __ClearPageReserved(pg); + } pkram_remove_identity_map(page); put_page(page); } @@ -680,7 +746,7 @@ static int __pkram_bytes_save_page(struct pkram_access = *pa, struct page *page) static struct page *__pkram_prep_load_page(pkram_entry_t p) { struct page *page; - int order; + int i, order; short flags; =20 flags =3D (p >> PKRAM_ENTRY_FLAGS_SHIFT) & PKRAM_ENTRY_FLAGS_MASK; @@ -690,9 +756,16 @@ static struct page *__pkram_prep_load_page(pkram_entry= _t p) =20 page =3D pfn_to_page(PHYS_PFN(p)); =20 - if (!page_ref_freeze(pg, 1)) { - pr_err("PKRAM preserved page has unexpected inflated ref count\n"); - goto out_error; + for (i =3D 0; i < (1 << order); i++) { + struct page *pg =3D page + i; + int was_rsvd; + + was_rsvd =3D PageReserved(pg); + __ClearPageReserved(pg); + if ((was_rsvd || i =3D=3D 0) && !page_ref_freeze(pg, 1)) { + pr_err("PKRAM preserved page has unexpected inflated ref count\n"); + goto out_error; + } } =20 if (order) { @@ -1331,6 +1404,7 @@ int __init pkram_create_merged_reserved(struct memblo= ck_type *new) } =20 WARN_ON(cnt_a + cnt_b !=3D k); + pkram_reserved_pages =3D nr_preserved; new->cnt =3D cnt_a + cnt_b; new->total_size =3D total_size; =20 --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19B96C7618E for ; Thu, 27 Apr 2023 00:10:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242789AbjD0AKm (ORCPT ); Wed, 26 Apr 2023 20:10:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242714AbjD0AKG (ORCPT ); Wed, 26 Apr 2023 20:10:06 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E985E3C1D for ; Wed, 26 Apr 2023 17:10:04 -0700 (PDT) Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGx3sS004984; Thu, 27 Apr 2023 00:09:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=G4RP12t8eWRPBRfP51I1pAfPsGdNJr7M6XikXc8e+iY=; b=IZlqAoUK9fHXAG95V5pJiDT0eeJVAQiL96nlqaKKbncGPOq4+NjDLrwT+LKmBt73JFkD 3mcPIMqgzlzA47ShALrJPkkxu7pusb4qi26KHoPAGbSpX/M1JkOMZO4JOtVmpSLR0jX9 v6EeFv0n1eY20Q2PLxjvtu1oxZk6jQhbzO6B+YZAzZEf1Dt7fxFNdTu0cCovcI4krNkC YCzNxBI2NikR170m1E32WLPrnbIeT83z/r8rH2LuL9SE8ib9UzhQQAVzwkNu+NT3zLlP AuUK3EEshPKJCjhjeWrWKljfngifihSrxos/AK80RytyAn6rR1izL4yHZkPJbhQxVjDF RQ== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q46gbtshv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:23 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33R00niN007445; Thu, 27 Apr 2023 00:09:22 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpmt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:22 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938k013888; Thu, 27 Apr 2023 00:09:21 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-13; Thu, 27 Apr 2023 00:09:21 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 12/21] PKRAM: free the preserved ranges list Date: Wed, 26 Apr 2023 17:08:48 -0700 Message-Id: <1682554137-13938-13-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-GUID: PNCavCd6ue3p59Kx26J25bbv7bWbtwP3 X-Proofpoint-ORIG-GUID: PNCavCd6ue3p59Kx26J25bbv7bWbtwP3 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Free the pages used to pass the preserved ranges to the new boot. Signed-off-by: Anthony Yznaga --- arch/x86/mm/init_64.c | 1 + include/linux/pkram.h | 2 ++ mm/pkram.c | 20 ++++++++++++++++++++ 3 files changed, 23 insertions(+) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index a46ffb434f39..9e68f07367fa 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1340,6 +1340,7 @@ void __init mem_init(void) after_bootmem =3D 1; x86_init.hyper.init_after_bootmem(); =20 + pkram_cleanup(); totalram_pages_add(pkram_reserved_pages); /* * Must be done after boot memory is put on freelist, because here we diff --git a/include/linux/pkram.h b/include/linux/pkram.h index 53d5a1ec42ff..c909aa299fc4 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -102,9 +102,11 @@ int pkram_prepare_save(struct pkram_stream *ps, const = char *name, #ifdef CONFIG_PKRAM extern unsigned long pkram_reserved_pages; void pkram_reserve(void); +void pkram_cleanup(void); #else #define pkram_reserved_pages 0UL static inline void pkram_reserve(void) { } +static inline void pkram_cleanup(void) { } #endif =20 #endif /* _LINUX_PKRAM_H */ diff --git a/mm/pkram.c b/mm/pkram.c index b711f94dbef4..c63b27bb711b 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -1456,3 +1456,23 @@ int __init pkram_merge_with_reserved(void) =20 return 0; } + +void __init pkram_cleanup(void) +{ + struct pkram_region_list *rl; + unsigned long next_pfn; + + if (!pkram_sb || !pkram_reserved_pages) + return; + + next_pfn =3D pkram_sb->region_list_pfn; + + while (next_pfn) { + struct page *page =3D pfn_to_page(next_pfn); + + rl =3D pfn_to_kaddr(next_pfn); + next_pfn =3D rl->next_pfn; + __free_pages_core(page, 0); + pkram_reserved_pages--; + } +} --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DF84C7618E for ; Thu, 27 Apr 2023 00:10:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242827AbjD0AKh (ORCPT ); Wed, 26 Apr 2023 20:10:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242719AbjD0AKF (ORCPT ); Wed, 26 Apr 2023 20:10:05 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FC7C3ABD for ; Wed, 26 Apr 2023 17:10:04 -0700 (PDT) Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGx8Pf014802; Thu, 27 Apr 2023 00:09:24 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=/klwynvryfeEh0hvbf4HAyTY1q/nlnl3HuBNmoKxWqE=; b=0j0n+Kwn3Bvz2+mjvASxMuPf8yoNwQ+WflTC1dDKD5SawtcQeSwCAX5Kl9PhTbjLM7XR 0SZX9uzBB7nHTzqDuYT3ExaruHA9SVIUtnOMZT42jsPxL3vRnz7XBauO5nY7dbWI8FGa k+a1xB0bz8FRktPPGJfMWdBAzSN67DoN4vn/wLfmsPxiBZXp0NViGyfMYOIiBfLxFYT4 YXmoG776eXjaC1bOyRIBJqyj52I0AsEr/DhPR3yHpLY7noBzb6XnjLZj8fFYuYmVKzKv Q/FhZ+/q1DvdVY6ADsoIdHyzTV0xawqFouIVJzNEoW/6uIlQ72P899/I2IKT1PoYoTBo 0Q== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q47fatms1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:24 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QNIv5l007334; Thu, 27 Apr 2023 00:09:23 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpnj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:23 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938m013888; Thu, 27 Apr 2023 00:09:23 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-14; Thu, 27 Apr 2023 00:09:23 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 13/21] PKRAM: prevent inadvertent use of a stale superblock Date: Wed, 26 Apr 2023 17:08:49 -0700 Message-Id: <1682554137-13938-14-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-ORIG-GUID: dA85op-dAowHiRV38HyLZKKt_wYwuMGr X-Proofpoint-GUID: dA85op-dAowHiRV38HyLZKKt_wYwuMGr Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" When pages have been saved to be preserved by the current boot, set a magic number on the super block to be validated by the next kernel. Signed-off-by: Anthony Yznaga --- mm/pkram.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/pkram.c b/mm/pkram.c index c63b27bb711b..befdffc76940 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -22,6 +22,7 @@ =20 #include "internal.h" =20 +#define PKRAM_MAGIC 0x706B726D =20 /* * Represents a reference to a data page saved to PKRAM. @@ -110,6 +111,8 @@ struct pkram_region_list { * The structure occupies a memory page. */ struct pkram_super_block { + __u32 magic; + __u64 node_pfn; /* first element of the node list */ __u64 region_list_pfn; __u64 nr_regions; @@ -179,6 +182,11 @@ void __init pkram_reserve(void) err =3D PTR_ERR(pkram_sb); goto out; } + if (pkram_sb->magic !=3D PKRAM_MAGIC) { + pr_err("PKRAM: invalid super block\n"); + err =3D -EINVAL; + goto out; + } /* An empty pkram_sb is not an error */ if (!pkram_sb->node_pfn) { pkram_sb =3D NULL; @@ -1012,6 +1020,7 @@ static void __pkram_reboot(void) */ memset(pkram_sb, 0, PAGE_SIZE); if (!err && node_pfn) { + pkram_sb->magic =3D PKRAM_MAGIC; pkram_sb->node_pfn =3D node_pfn; pkram_sb->region_list_pfn =3D rl_pfn; pkram_sb->nr_regions =3D nr_regions; --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 761B8C77B60 for ; Thu, 27 Apr 2023 00:10:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242879AbjD0AK4 (ORCPT ); Wed, 26 Apr 2023 20:10:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242731AbjD0AKI (ORCPT ); Wed, 26 Apr 2023 20:10:08 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03E8E40C8 for ; Wed, 26 Apr 2023 17:10:07 -0700 (PDT) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGxDTg025309; Thu, 27 Apr 2023 00:09:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=ltRUZX+kav0KgoOWwJByY5TJz/DW4UwuIA/fz1dRFQU=; b=xpVH9EsAYlNzHPNKr28M6uYk8hxosg9wy9QOXmzA9yN+HMuUoZfcrajvTUUr1lkOvLaX RM5c5MqPJv+zMSxVMYAjYSzVMxH6Rel4z1jaQnAe6791QBOa/kQboYsFtWjcxY5DF1+a bZzZXeUrWtDQzeHd6c+9eLsg7Xag/Vq9S/PA337P3zPlOG8CX9SArYqtBICe5MEdhCkn 2o/qGCjzmDk6N7GnF5wgATLTEi4/nVQSw6G+sUQlDmkGujq7HBYq5x1/ZTnpjCqDlMh3 qdorPMujYiR67P6HuxfFSjflchL9Svsbzw9GxTe6B8GKNUwz6HdzD7R0TQBFPcB+c7NM qQ== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q46622ty7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:25 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QNIv5m007334; Thu, 27 Apr 2023 00:09:25 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mppp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:25 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938o013888; Thu, 27 Apr 2023 00:09:24 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-15; Thu, 27 Apr 2023 00:09:24 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 14/21] PKRAM: provide a way to ban pages from use by PKRAM Date: Wed, 26 Apr 2023 17:08:50 -0700 Message-Id: <1682554137-13938-15-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-ORIG-GUID: BQQ0fjLL1Sgk_Nth9r2mdIqR3T5y8aHc X-Proofpoint-GUID: BQQ0fjLL1Sgk_Nth9r2mdIqR3T5y8aHc Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Not all memory ranges can be used for saving preserved over-kexec data. For example, a kexec kernel may be loaded before pages are preserved. The memory regions where the kexec segments will be copied to on kexec must not contain preserved pages or else they will be clobbered. Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 2 + mm/pkram.c | 205 ++++++++++++++++++++++++++++++++++++++++++++++= ++++ 2 files changed, 207 insertions(+) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index c909aa299fc4..29109e875604 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -103,10 +103,12 @@ int pkram_prepare_save(struct pkram_stream *ps, const= char *name, extern unsigned long pkram_reserved_pages; void pkram_reserve(void); void pkram_cleanup(void); +void pkram_ban_region(unsigned long start, unsigned long end); #else #define pkram_reserved_pages 0UL static inline void pkram_reserve(void) { } static inline void pkram_cleanup(void) { } +static inline void pkram_ban_region(unsigned long start, unsigned long end= ) { } #endif =20 #endif /* _LINUX_PKRAM_H */ diff --git a/mm/pkram.c b/mm/pkram.c index befdffc76940..cef75bd8ba99 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -140,6 +140,28 @@ extern void pkram_find_preserved(unsigned long start, = unsigned long end, void *p unsigned long __initdata pkram_reserved_pages; =20 /* + * For tracking a region of memory that PKRAM is not allowed to use. + */ +struct banned_region { + unsigned long start, end; /* pfn, inclusive */ +}; + +#define MAX_NR_BANNED (32 + MAX_NUMNODES * 2) + +static unsigned int nr_banned; /* number of banned regions */ + +/* banned regions; arranged in ascending order, do not overlap */ +static struct banned_region banned[MAX_NR_BANNED]; +/* + * If a page allocated for PKRAM turns out to belong to a banned region, + * it is placed on the banned_pages list so subsequent allocation attempts + * do not encounter it again. The list is shrunk when system memory is low. + */ +static LIST_HEAD(banned_pages); /* linked through page::lru */ +static DEFINE_SPINLOCK(banned_pages_lock); +static unsigned long nr_banned_pages; + +/* * The PKRAM super block pfn, see above. */ static int __init parse_pkram_sb_pfn(char *arg) @@ -206,12 +228,116 @@ void __init pkram_reserve(void) pr_info("PKRAM: %lu pages reserved\n", pkram_reserved_pages); } =20 +/* + * Ban pfn range [start..end] (inclusive) from use in PKRAM. + */ +void pkram_ban_region(unsigned long start, unsigned long end) +{ + int i, merged =3D -1; + + /* first try to merge the region with an existing one */ + for (i =3D nr_banned - 1; i >=3D 0 && start <=3D banned[i].end + 1; i--) { + if (end + 1 >=3D banned[i].start) { + start =3D min(banned[i].start, start); + end =3D max(banned[i].end, end); + if (merged < 0) + merged =3D i; + } else + /* + * Regions are arranged in ascending order and do not + * intersect so the merged region cannot jump over its + * predecessors. + */ + BUG_ON(merged >=3D 0); + } + + i++; + + if (merged >=3D 0) { + banned[i].start =3D start; + banned[i].end =3D end; + /* shift if merged with more than one region */ + memmove(banned + i + 1, banned + merged + 1, + sizeof(*banned) * (nr_banned - merged - 1)); + nr_banned -=3D merged - i; + return; + } + + /* + * The region does not intersect with an existing one; + * try to create a new one. + */ + if (nr_banned =3D=3D MAX_NR_BANNED) { + pr_err("PKRAM: Failed to ban %lu-%lu: Too many banned regions\n", + start, end); + return; + } + + memmove(banned + i + 1, banned + i, + sizeof(*banned) * (nr_banned - i)); + banned[i].start =3D start; + banned[i].end =3D end; + nr_banned++; +} + +static void pkram_show_banned(void) +{ + int i; + unsigned long n, total =3D 0; + + pr_info("PKRAM: banned regions:\n"); + for (i =3D 0; i < nr_banned; i++) { + n =3D banned[i].end - banned[i].start + 1; + pr_info("%4d: [%08lx - %08lx] %ld pages\n", + i, banned[i].start, banned[i].end, n); + total +=3D n; + } + pr_info("Total banned: %ld pages in %d regions\n", + total, nr_banned); +} + +/* + * Returns true if the page may not be used for storing preserved data. + */ +static bool pkram_page_banned(struct page *page) +{ + unsigned long epfn, pfn =3D page_to_pfn(page); + int l =3D 0, r =3D nr_banned - 1, m; + + epfn =3D pfn + compound_nr(page) - 1; + + /* do binary search */ + while (l <=3D r) { + m =3D (l + r) / 2; + if (epfn < banned[m].start) + r =3D m - 1; + else if (pfn > banned[m].end) + l =3D m + 1; + else + return true; + } + return false; +} + static inline struct page *pkram_alloc_page(gfp_t gfp_mask) { struct page *page; + LIST_HEAD(list); + unsigned long len =3D 0; int err; =20 page =3D alloc_page(gfp_mask); + while (page && pkram_page_banned(page)) { + len++; + list_add(&page->lru, &list); + page =3D alloc_page(gfp_mask); + } + if (len > 0) { + spin_lock(&banned_pages_lock); + nr_banned_pages +=3D len; + list_splice(&list, &banned_pages); + spin_unlock(&banned_pages_lock); + } if (page) { err =3D pkram_add_identity_map(page); if (err) { @@ -230,6 +356,53 @@ static inline void pkram_free_page(void *addr) free_page((unsigned long)addr); } =20 +static void __banned_pages_shrink(unsigned long nr_to_scan) +{ + struct page *page; + + if (nr_to_scan <=3D 0) + return; + + while (nr_banned_pages > 0) { + BUG_ON(list_empty(&banned_pages)); + page =3D list_first_entry(&banned_pages, struct page, lru); + list_del(&page->lru); + __free_page(page); + nr_banned_pages--; + nr_to_scan--; + if (!nr_to_scan) + break; + } +} + +static unsigned long +banned_pages_count(struct shrinker *shrink, struct shrink_control *sc) +{ + return nr_banned_pages; +} + +static unsigned long +banned_pages_scan(struct shrinker *shrink, struct shrink_control *sc) +{ + int nr_left =3D nr_banned_pages; + + if (!sc->nr_to_scan || !nr_left) + return nr_left; + + spin_lock(&banned_pages_lock); + __banned_pages_shrink(sc->nr_to_scan); + nr_left =3D nr_banned_pages; + spin_unlock(&banned_pages_lock); + + return nr_left; +} + +static struct shrinker banned_pages_shrinker =3D { + .count_objects =3D banned_pages_count, + .scan_objects =3D banned_pages_scan, + .seeks =3D DEFAULT_SEEKS, +}; + static inline void pkram_insert_node(struct pkram_node *node) { list_add(&virt_to_page(node)->lru, &pkram_nodes); @@ -705,6 +878,31 @@ static int __pkram_save_page(struct pkram_access *pa, = struct page *page, return 0; } =20 +static int __pkram_save_page_copy(struct pkram_access *pa, struct page *pa= ge) +{ + int nr_pages =3D compound_nr(page); + pgoff_t index =3D page->index; + int i, err; + + for (i =3D 0; i < nr_pages; i++, index++) { + struct page *p =3D page + i; + struct page *new; + + new =3D pkram_alloc_page(pa->ps->gfp_mask); + if (!new) + return -ENOMEM; + + copy_highpage(new, p); + err =3D __pkram_save_page(pa, new, index); + if (err) { + pkram_free_page(page_address(new)); + return err; + } + } + + return 0; +} + /** * Save folio @folio to the preserved memory node and object associated * with pkram stream access @pa. The stream must have been initialized with @@ -728,6 +926,10 @@ int pkram_save_folio(struct pkram_access *pa, struct f= olio *folio) =20 BUG_ON((node->flags & PKRAM_ACCMODE_MASK) !=3D PKRAM_SAVE); =20 + /* if page is banned, relocate it */ + if (pkram_page_banned(page)) + return __pkram_save_page_copy(pa, page); + err =3D __pkram_save_page(pa, page, page->index); if (!err) err =3D pkram_add_identity_map(page); @@ -987,6 +1189,7 @@ static void __pkram_reboot(void) int err =3D 0; =20 if (!list_empty(&pkram_nodes)) { + pkram_show_banned(); err =3D pkram_add_identity_map(virt_to_page(pkram_sb)); if (err) { pr_err("PKRAM: failed to add super block to pagetable\n"); @@ -1073,6 +1276,7 @@ static int __init pkram_init_sb(void) page =3D alloc_page(GFP_KERNEL | __GFP_ZERO); if (!page) { pr_err("PKRAM: Failed to allocate super block\n"); + __banned_pages_shrink(ULONG_MAX); return 0; } pkram_sb =3D page_address(page); @@ -1095,6 +1299,7 @@ static int __init pkram_init(void) { if (pkram_init_sb()) { register_reboot_notifier(&pkram_reboot_notifier); + register_shrinker(&banned_pages_shrinker, "pkram"); sysfs_update_group(kernel_kobj, &pkram_attr_group); } return 0; --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D241AC7618E for ; Thu, 27 Apr 2023 00:10:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242830AbjD0AKu (ORCPT ); Wed, 26 Apr 2023 20:10:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242725AbjD0AKH (ORCPT ); Wed, 26 Apr 2023 20:10:07 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76A2E3C30 for ; Wed, 26 Apr 2023 17:10:06 -0700 (PDT) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGxIC6009633; Thu, 27 Apr 2023 00:09:28 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=VjHtF6foxcpPkuDRLURypMKoHlv1VKb82EFT44hMcWo=; b=Yy0rDZNcm0K3+32Kv5Kt0bnPhZ+cjF+9HlcT8JjGanst6ujGlEFRgGR7Ty79CgHZnvhb 7RTmGajtoDcdzYpnzsTA+k/ABNEC1jZn22VcWZc0OANw7MpS0eOUswTIk1JBI0ub4N37 IxTv2QvPe0bmDiAlAOcHmJTDoHO8Zod3IT83urwFmtn2gzLy+JH6nmLGe+e863aIch4A WuzSrCLKVsUPlRJ9BwvFUGWu9PKDq5sFRAeU5XrJbS8BjGunh37SO3dCp9KefjeNn9CB s3FOGBZB1g9Z9GtLUzadHIQb3UVXkgecu+FEa8rdMXsWTpQtMXS/5i1nqDt2NMBR19YD 5w== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q484utq26-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:27 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QNUjaj007380; Thu, 27 Apr 2023 00:09:27 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpqn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:26 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938q013888; Thu, 27 Apr 2023 00:09:26 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-16; Thu, 27 Apr 2023 00:09:26 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 15/21] kexec: PKRAM: prevent kexec clobbering preserved pages in some cases Date: Wed, 26 Apr 2023 17:08:51 -0700 Message-Id: <1682554137-13938-16-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-GUID: -pxnJnsePK2q5pM0GAqWo7YuzPwuh3pz X-Proofpoint-ORIG-GUID: -pxnJnsePK2q5pM0GAqWo7YuzPwuh3pz Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" When loading a kernel for kexec, dynamically update the list of physical ranges that are not to be used for storing preserved pages with the ranges where kexec segments will be copied to on reboot. This ensures no pages preserved after the new kernel has been loaded will reside in these ranges on reboot. Not yet handled is the case where pages have been preserved before a kexec kernel is loaded. This will be covered by a later patch. Signed-off-by: Anthony Yznaga --- kernel/kexec.c | 9 +++++++++ kernel/kexec_file.c | 10 ++++++++++ 2 files changed, 19 insertions(+) diff --git a/kernel/kexec.c b/kernel/kexec.c index 92d301f98776..cd871fc07c65 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -16,6 +16,7 @@ #include #include #include +#include =20 #include "kexec_internal.h" =20 @@ -153,6 +154,14 @@ static int do_kexec_load(unsigned long entry, unsigned= long nr_segments, if (ret) goto out; =20 + for (i =3D 0; i < nr_segments; i++) { + unsigned long mem =3D image->segment[i].mem; + size_t memsz =3D image->segment[i].memsz; + + if (memsz) + pkram_ban_region(PFN_DOWN(mem), PFN_UP(mem + memsz) - 1); + } + /* Install the new kernel and uninstall the old */ image =3D xchg(dest_image, image); =20 diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index f1a0e4e3fb5c..ca2aa2d61955 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -27,6 +27,8 @@ #include #include #include +#include + #include "kexec_internal.h" =20 #ifdef CONFIG_KEXEC_SIG @@ -403,6 +405,14 @@ static int kexec_image_verify_sig(struct kimage *image= , void *buf, if (ret) goto out; =20 + for (i =3D 0; i < image->nr_segments; i++) { + unsigned long mem =3D image->segment[i].mem; + size_t memsz =3D image->segment[i].memsz; + + if (memsz) + pkram_ban_region(PFN_DOWN(mem), PFN_UP(mem + memsz) - 1); + } + /* * Free up any temporary buffers allocated which are not needed * after image has been loaded --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DFA1C7618E for ; Thu, 27 Apr 2023 00:10:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242846AbjD0AKq (ORCPT ); Wed, 26 Apr 2023 20:10:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242728AbjD0AKI (ORCPT ); Wed, 26 Apr 2023 20:10:08 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3F2A40C6 for ; Wed, 26 Apr 2023 17:10:06 -0700 (PDT) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGxER0025323; Thu, 27 Apr 2023 00:09:29 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=iVHkxnhPmN3ZiIllDQA4XN9d0CPfj30tNxwbJaWlZYE=; b=uln3/Y5F+Sv5bX0Aapb9pqhswccVRvqb/9UqSXysiUWJfvKaAtGgueeexgYxPSyeDVU1 JY61SJ0QAxIz9xFLQu1a+bfAQ7BoRsRMO/zykqYENeQfQX2Tsp31Pwuci7L745E6EStq al6VWV501TMo24pZir/gWgRySI/5HjAKGGrMSanISa8maMcnnLFivFO29ZEDyZLAdXSZ RWl2os1ejhXGKljBNkHolMvKGrZDQUtuY+vOxt6u02F5SFOr5ARd7jtA13Qssv/4Q+Nd ZEr7Z+4EcuFibdjMmqSjL+Co/bgWhtdnklJ3asiCwPFLUA3oIv16J3DXTRhyeQz5K8J3 vA== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q46622tyd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:28 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QNRYCp007326; Thu, 27 Apr 2023 00:09:28 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mprm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:28 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938s013888; Thu, 27 Apr 2023 00:09:27 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-17; Thu, 27 Apr 2023 00:09:27 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 16/21] PKRAM: provide a way to check if a memory range has preserved pages Date: Wed, 26 Apr 2023 17:08:52 -0700 Message-Id: <1682554137-13938-17-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-ORIG-GUID: 2bK9bjHYj08MJyIhhXZ9v97WqmZ7xyPq X-Proofpoint-GUID: 2bK9bjHYj08MJyIhhXZ9v97WqmZ7xyPq Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" When a kernel is loaded for kexec the address ranges where the kexec segments will be copied to may conflict with pages already set to be preserved. Provide a way to determine if preserved pages exist in a specified range. Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 2 ++ mm/pkram.c | 20 ++++++++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index 29109e875604..bec9ae75e802 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -104,11 +104,13 @@ int pkram_prepare_save(struct pkram_stream *ps, const= char *name, void pkram_reserve(void); void pkram_cleanup(void); void pkram_ban_region(unsigned long start, unsigned long end); +int pkram_has_preserved_pages(unsigned long start, unsigned long end); #else #define pkram_reserved_pages 0UL static inline void pkram_reserve(void) { } static inline void pkram_cleanup(void) { } static inline void pkram_ban_region(unsigned long start, unsigned long end= ) { } +static inline int pkram_has_preserved_pages(unsigned long start, unsigned = long end) { return 0; } #endif =20 #endif /* _LINUX_PKRAM_H */ diff --git a/mm/pkram.c b/mm/pkram.c index cef75bd8ba99..474fb6fc8355 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -1690,3 +1690,23 @@ void __init pkram_cleanup(void) pkram_reserved_pages--; } } + +static int has_preserved_pages_cb(unsigned long base, unsigned long size, = void *private) +{ + int *has_preserved =3D (int *)private; + + *has_preserved =3D 1; + return 1; +} + +/* + * Check whether the memory range [start, end) contains preserved pages. + */ +int pkram_has_preserved_pages(unsigned long start, unsigned long end) +{ + int has_preserved =3D 0; + + pkram_find_preserved(start, end, &has_preserved, has_preserved_pages_cb); + + return has_preserved; +} --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 632B2C7EE26 for ; Thu, 27 Apr 2023 00:10:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242776AbjD0AKO (ORCPT ); Wed, 26 Apr 2023 20:10:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35280 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242701AbjD0AKD (ORCPT ); Wed, 26 Apr 2023 20:10:03 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DD193C13 for ; Wed, 26 Apr 2023 17:10:02 -0700 (PDT) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGxJYE009644; Thu, 27 Apr 2023 00:09:31 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=GKXPxESVfvIM+pNIB5lGbZkr/9+GXI6AMF73iO2olBA=; b=Fr50ImkcFewD5fhKQE2bdokl3ohNxwbCvzToQ3b3qQt8nmwfevLzs25EG7cuy39H0eWI n1LOh78LxZm9im3r9t7TbZe9BB3hn1nNdRiHeT9nPO+1uIo0+IM7jW8Nhi9WY9+llkpC C/hv+kIvPkuBGcQeWZeAHkGuhJKvx2Q2mu4mdrCOq9O3mqjJWFFzeVRC6bkX+TU9BPew B3hlYsgLWoUYC/EAzBBk2Kj2SXpbhsLn6xM59mCHDWHbJINIhPBxQtipbfWF6rvidti+ y2+4OGsrI+sJd8YTcewFfgjMScpiVf/CVq/hrFB+6a2eCVKnpqU302dIFuD/UE0e+l+f Ew== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q484utq2b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:30 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QN01rh007153; Thu, 27 Apr 2023 00:09:30 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpsr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:30 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938u013888; Thu, 27 Apr 2023 00:09:29 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-18; Thu, 27 Apr 2023 00:09:29 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 17/21] kexec: PKRAM: avoid clobbering already preserved pages Date: Wed, 26 Apr 2023 17:08:53 -0700 Message-Id: <1682554137-13938-18-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-GUID: AvvXex8oIgFKryG_FPd3JZhh5CVfptob X-Proofpoint-ORIG-GUID: AvvXex8oIgFKryG_FPd3JZhh5CVfptob Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Ensure destination ranges of the kexec segments do not overlap with any kernel pages marked to be preserved across kexec. For kexec_load, return EADDRNOTAVAIL if overlap is detected. For kexec_file_load, skip ranges containing preserved pages when seaching for available ranges to use. Signed-off-by: Anthony Yznaga --- kernel/kexec_core.c | 3 +++ kernel/kexec_file.c | 5 +++++ 2 files changed, 8 insertions(+) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 3d578c6fefee..e0d52f70cb48 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -40,6 +40,7 @@ #include #include #include +#include =20 #include #include @@ -178,6 +179,8 @@ int sanity_check_segment_list(struct kimage *image) return -EADDRNOTAVAIL; if (mend >=3D KEXEC_DESTINATION_MEMORY_LIMIT) return -EADDRNOTAVAIL; + if (pkram_has_preserved_pages(mstart, mend)) + return -EADDRNOTAVAIL; } =20 /* Verify our destination addresses do not overlap. diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index ca2aa2d61955..8bca01060d32 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -490,6 +490,11 @@ static int locate_mem_hole_bottom_up(unsigned long sta= rt, unsigned long end, continue; } =20 + if (pkram_has_preserved_pages(temp_start, temp_end + 1)) { + temp_start =3D temp_start - PAGE_SIZE; + continue; + } + /* We found a suitable memory range */ break; } while (1); --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 083A8C77B60 for ; Thu, 27 Apr 2023 00:11:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242908AbjD0ALK (ORCPT ); Wed, 26 Apr 2023 20:11:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242744AbjD0AKL (ORCPT ); Wed, 26 Apr 2023 20:10:11 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E740D40E0 for ; Wed, 26 Apr 2023 17:10:09 -0700 (PDT) Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGx63C014744; Thu, 27 Apr 2023 00:09:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=43eGY+TDyN1Z/nEUhZp/pkRJ381CTkYO9EwFJXyHijs=; b=lFddECHUFaWe4SvxIWe74XuldZD3XhgVHok2PgKl2CS5V9ILX3LvwHV30JNs3cypkMZR 8uiAfjJTFIhtWlrJRlFDLbAorKgQgvIuVOLrLSEI6IxPSo58U7I0ugig8LUpDtroQYf0 qEQ1xL25lZRaIfxmaXWpRfGNpN5jYkV1m5iS+Qye7WwBzoSapoardIE7ZBZBnESy+/4m gSbsTmhPLiFOZ+wkOz++c0c0sgsqgkb4FrVZUnIxbqFMpcl7ubrUGjWlxLi5zPvEIXaS CguZa3wdyDWhJpmUoSLf4r4OI1kFIpshJwuSTa3CKlRl9yR0fibdnDQCo6NYgE9+jhUZ TA== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q47fatmse-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:31 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QMn2Cq007147; Thu, 27 Apr 2023 00:09:31 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpu2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:31 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938w013888; Thu, 27 Apr 2023 00:09:30 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-19; Thu, 27 Apr 2023 00:09:30 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 18/21] mm: PKRAM: allow preserved memory to be freed from userspace Date: Wed, 26 Apr 2023 17:08:54 -0700 Message-Id: <1682554137-13938-19-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-ORIG-GUID: -W4-EPoCWhe2SX1_zknW2SnhXj4oiT4p X-Proofpoint-GUID: -W4-EPoCWhe2SX1_zknW2SnhXj4oiT4p Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" To free all space utilized for preserved memory, one can write 0 to /sys/kernel/pkram. This will destroy all PKRAM nodes that are not currently being read or written. Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- mm/pkram.c | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/mm/pkram.c b/mm/pkram.c index 474fb6fc8355..d404e415f3cb 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -493,6 +493,32 @@ static void pkram_truncate_node(struct pkram_node *nod= e) node->obj_pfn =3D 0; } =20 +/* + * Free all nodes that are not under operation. + */ +static void pkram_truncate(void) +{ + struct page *page, *tmp; + struct pkram_node *node; + LIST_HEAD(dispose); + + mutex_lock(&pkram_mutex); + list_for_each_entry_safe(page, tmp, &pkram_nodes, lru) { + node =3D page_address(page); + if (!(node->flags & PKRAM_ACCMODE_MASK)) + list_move(&page->lru, &dispose); + } + mutex_unlock(&pkram_mutex); + + while (!list_empty(&dispose)) { + page =3D list_first_entry(&dispose, struct page, lru); + list_del(&page->lru); + node =3D page_address(page); + pkram_truncate_node(node); + pkram_free_page(node); + } +} + static void pkram_add_link(struct pkram_link *link, struct pkram_data_stre= am *pds) { __u64 link_pfn =3D page_to_pfn(virt_to_page(link)); @@ -1252,8 +1278,19 @@ static ssize_t show_pkram_sb_pfn(struct kobject *kob= j, return sprintf(buf, "%lx\n", pfn); } =20 +static ssize_t store_pkram_sb_pfn(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + int val; + + if (kstrtoint(buf, 0, &val) || val) + return -EINVAL; + pkram_truncate(); + return count; +} + static struct kobj_attribute pkram_sb_pfn_attr =3D - __ATTR(pkram, 0444, show_pkram_sb_pfn, NULL); + __ATTR(pkram, 0644, show_pkram_sb_pfn, store_pkram_sb_pfn); =20 static struct attribute *pkram_attrs[] =3D { &pkram_sb_pfn_attr.attr, --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E11AC7EE29 for ; Thu, 27 Apr 2023 00:10:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242717AbjD0AKF (ORCPT ); Wed, 26 Apr 2023 20:10:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35246 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240955AbjD0AKA (ORCPT ); Wed, 26 Apr 2023 20:10:00 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E55BA3AAA for ; Wed, 26 Apr 2023 17:09:59 -0700 (PDT) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGxFQg025361; Thu, 27 Apr 2023 00:09:33 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=VQwaugMMgK/MS0E/urOkz6I4ukGs9iD4CWV4YSx/1xc=; b=J9766iWR8G77hZ7IvzmoZgr4vzzvhIrhZGyputyh0qhGelOn0BZzz+TYbnctnLfB2tXh 00taSjRcT8LFpZEslBFSSAM+7RpAZLoDdPLhJX6aat3Xus05BlbcMz45pXDselmAKTQL 4j9iCnvsVigM9pq/Hn6cRChdsGBc3RcS+agLBtC0Ryp+vhjYvDipYwOzgG+fZ0YtUBKh ywDBGNDhlrFvWYIx2fSMBZzPveVn06sAanl2zckyI7jdur2dQ1+8SRVOAnEI4OnwJAyu qzNFfHr6Mb0eJ2sQWEFZDyiF5ZYwCvt6ZBGbB2qNbzLkGSSfha9z8223RFFTKeqKIbV2 AQ== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q46622tym-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:33 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QNUZGN007329; Thu, 27 Apr 2023 00:09:32 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpux-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:32 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R09390013888; Thu, 27 Apr 2023 00:09:32 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-20; Thu, 27 Apr 2023 00:09:31 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 19/21] PKRAM: disable feature when running the kdump kernel Date: Wed, 26 Apr 2023 17:08:55 -0700 Message-Id: <1682554137-13938-20-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-ORIG-GUID: jPzpZYiuahk2b6x6CL1T4-5GRvhXf1gU X-Proofpoint-GUID: jPzpZYiuahk2b6x6CL1T4-5GRvhXf1gU Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" The kdump kernel should not preserve or restore pages. Signed-off-by: Anthony Yznaga --- mm/pkram.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/pkram.c b/mm/pkram.c index d404e415f3cb..f38236e5d836 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include @@ -188,7 +189,7 @@ void __init pkram_reserve(void) { int err =3D 0; =20 - if (!pkram_sb_pfn) + if (!pkram_sb_pfn || is_kdump_kernel()) return; =20 pr_info("PKRAM: Examining preserved memory...\n"); @@ -285,6 +286,9 @@ static void pkram_show_banned(void) int i; unsigned long n, total =3D 0; =20 + if (is_kdump_kernel()) + return; + pr_info("PKRAM: banned regions:\n"); for (i =3D 0; i < nr_banned; i++) { n =3D banned[i].end - banned[i].start + 1; @@ -1334,7 +1338,7 @@ static int __init pkram_init_sb(void) =20 static int __init pkram_init(void) { - if (pkram_init_sb()) { + if (!is_kdump_kernel() && pkram_init_sb()) { register_reboot_notifier(&pkram_reboot_notifier); register_shrinker(&banned_pages_shrinker, "pkram"); sysfs_update_group(kernel_kobj, &pkram_attr_group); --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28DA7C7618E for ; Thu, 27 Apr 2023 00:11:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242903AbjD0ALH (ORCPT ); Wed, 26 Apr 2023 20:11:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35496 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242736AbjD0AKK (ORCPT ); Wed, 26 Apr 2023 20:10:10 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9FD93C30 for ; Wed, 26 Apr 2023 17:10:09 -0700 (PDT) Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGx3sV004984; Thu, 27 Apr 2023 00:09:35 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=yIENXNNzi/QzzmshoP+q+Y+PreZ+zZ5LAbsbewQ0r6M=; b=D93iwhD4tLk5IUOUbCMp75EVVn8pz3JVlt8axqiDirtNgTH2ZnBgIb0wXe6ekEbLRo3e eEPG+sS5dvN3KroM8CtCay1vQzVY8aDzCvlx8qf7BrUi/uc6Ri73ggcjNDyyOTXKHrZb p7qvzaHBremK6EWvwWWGnKslZqz39B4flJiXKpv5Wk0duzRK9YcIjqc0pnue7uMjxDo5 62ZZ4RO8P1pw10cOAOEE5i57vPlb6IBrJc3759GRFGLvTLDeHDdKMQi/RPcUVjlGAY+o u/wecOchLHcLvkKmIRLTn5Iepx8SHgbkzjsv0cZDb8NF5yb/8y2kp/+ObTQ4ssFP3mU6 6g== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q46gbtsj9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:34 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QM3usC007418; Thu, 27 Apr 2023 00:09:34 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpvt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:34 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R09392013888; Thu, 27 Apr 2023 00:09:33 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-21; Thu, 27 Apr 2023 00:09:33 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 20/21] x86/KASLR: PKRAM: support physical kaslr Date: Wed, 26 Apr 2023 17:08:56 -0700 Message-Id: <1682554137-13938-21-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-GUID: 1EnT7XCXpRCY0gBKXVSZzw8iuiNCBjPV X-Proofpoint-ORIG-GUID: 1EnT7XCXpRCY0gBKXVSZzw8iuiNCBjPV Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Avoid regions of memory that contain preserved pages when computing slots used to select where to put the decompressed kernel. Signed-off-by: Anthony Yznaga --- arch/x86/boot/compressed/Makefile | 3 ++ arch/x86/boot/compressed/kaslr.c | 10 +++- arch/x86/boot/compressed/misc.h | 10 ++++ arch/x86/boot/compressed/pkram.c | 110 ++++++++++++++++++++++++++++++++++= ++++ mm/pkram.c | 2 +- 5 files changed, 132 insertions(+), 3 deletions(-) create mode 100644 arch/x86/boot/compressed/pkram.c diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/M= akefile index 6b6cfe607bdb..d9a5af94a797 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -103,6 +103,9 @@ ifdef CONFIG_X86_64 vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) +=3D $(obj)/mem_encrypt.o vmlinux-objs-y +=3D $(obj)/pgtable_64.o vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) +=3D $(obj)/sev.o +ifdef CONFIG_RANDOMIZE_BASE + vmlinux-objs-$(CONFIG_PKRAM) +=3D $(obj)/pkram.o +endif endif =20 vmlinux-objs-$(CONFIG_ACPI) +=3D $(obj)/acpi.o diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/ka= slr.c index 454757fbdfe5..047b8b9a0799 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -436,6 +436,7 @@ static bool mem_avoid_overlap(struct mem_vector *img, struct setup_data *ptr; u64 earliest =3D img->start + img->size; bool is_overlapping =3D false; + struct mem_vector avoid; =20 for (i =3D 0; i < MEM_AVOID_MAX; i++) { if (mem_overlaps(img, &mem_avoid[i]) && @@ -449,8 +450,6 @@ static bool mem_avoid_overlap(struct mem_vector *img, /* Avoid all entries in the setup_data linked list. */ ptr =3D (struct setup_data *)(unsigned long)boot_params->hdr.setup_data; while (ptr) { - struct mem_vector avoid; - avoid.start =3D (unsigned long)ptr; avoid.size =3D sizeof(*ptr) + ptr->len; =20 @@ -475,6 +474,12 @@ static bool mem_avoid_overlap(struct mem_vector *img, ptr =3D (struct setup_data *)(unsigned long)ptr->next; } =20 + if (pkram_has_overlap(img, &avoid) && (avoid.start < earliest)) { + *overlap =3D avoid; + earliest =3D overlap->start; + is_overlapping =3D true; + } + return is_overlapping; } =20 @@ -836,6 +841,7 @@ void choose_random_location(unsigned long input, return; } =20 + pkram_init(); boot_params->hdr.loadflags |=3D KASLR_FLAG; =20 if (IS_ENABLED(CONFIG_X86_32)) diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/mis= c.h index 20118fb7c53b..01ff5e507064 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -124,6 +124,16 @@ static inline void console_init(void) { } #endif =20 +#ifdef CONFIG_PKRAM +void pkram_init(void); +int pkram_has_overlap(struct mem_vector *entry, struct mem_vector *overlap= ); +#else +static inline void pkram_init(void) { } +static inline int pkram_has_overlap(struct mem_vector *entry, + struct mem_vector *overlap) +{ return 0; } +#endif + #ifdef CONFIG_AMD_MEM_ENCRYPT void sev_enable(struct boot_params *bp); void snp_check_features(void); diff --git a/arch/x86/boot/compressed/pkram.c b/arch/x86/boot/compressed/pk= ram.c new file mode 100644 index 000000000000..19267ca2ce8e --- /dev/null +++ b/arch/x86/boot/compressed/pkram.c @@ -0,0 +1,110 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "misc.h" + +#define PKRAM_MAGIC 0x706B726D + +struct pkram_super_block { + __u32 magic; + + __u64 node_pfn; + __u64 region_list_pfn; + __u64 nr_regions; +}; + +struct pkram_region { + phys_addr_t base; + phys_addr_t size; +}; + +struct pkram_region_list { + __u64 prev_pfn; + __u64 next_pfn; + + struct pkram_region regions[0]; +}; + +#define PKRAM_REGIONS_LIST_MAX \ + ((PAGE_SIZE-sizeof(struct pkram_region_list))/sizeof(struct pkram_region)) + +static u64 pkram_sb_pfn; +static struct pkram_super_block *pkram_sb; + +void pkram_init(void) +{ + struct pkram_super_block *sb; + char arg[32]; + + if (cmdline_find_option("pkram", arg, sizeof(arg)) > 0) { + if (kstrtoull(arg, 16, &pkram_sb_pfn) !=3D 0) + return; + } else + return; + + sb =3D (struct pkram_super_block *)(pkram_sb_pfn << PAGE_SHIFT); + if (sb->magic !=3D PKRAM_MAGIC) { + debug_putstr("PKRAM: invalid super block\n"); + return; + } + + pkram_sb =3D sb; +} + +static struct pkram_region *pkram_first_region(struct pkram_super_block *s= b, + struct pkram_region_list **rlp, int *idx) +{ + if (!sb || !sb->region_list_pfn) + return NULL; + + *rlp =3D (struct pkram_region_list *)(sb->region_list_pfn << PAGE_SHIFT); + *idx =3D 0; + + return &(*rlp)->regions[0]; +} + +static struct pkram_region *pkram_next_region(struct pkram_region_list **r= lp, int *idx) +{ + struct pkram_region_list *rl =3D *rlp; + int i =3D *idx; + + i++; + if (i >=3D PKRAM_REGIONS_LIST_MAX) { + if (!rl->next_pfn) { + debug_putstr("PKRAM: no more pkram_region_list pages\n"); + return NULL; + } + rl =3D (struct pkram_region_list *)(rl->next_pfn << PAGE_SHIFT); + *rlp =3D rl; + i =3D 0; + } + *idx =3D i; + + if (rl->regions[i].size =3D=3D 0) + return NULL; + + return &rl->regions[i]; +} + +int pkram_has_overlap(struct mem_vector *entry, struct mem_vector *overlap) +{ + struct pkram_region_list *rl; + struct pkram_region *r; + int idx; + + r =3D pkram_first_region(pkram_sb, &rl, &idx); + + while (r) { + if (r->base + r->size <=3D entry->start) { + r =3D pkram_next_region(&rl, &idx); + continue; + } + if (r->base >=3D entry->start + entry->size) + return 0; + + overlap->start =3D r->base; + overlap->size =3D r->size; + return 1; + } + + return 0; +} diff --git a/mm/pkram.c b/mm/pkram.c index f38236e5d836..a3e045b8dfe4 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -96,7 +96,7 @@ struct pkram_region_list { __u64 prev_pfn; __u64 next_pfn; =20 - struct pkram_region regions[0]; + struct pkram_region regions[]; }; =20 #define PKRAM_REGIONS_LIST_MAX \ --=20 1.9.4 From nobody Thu Dec 18 08:59:17 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4227C7618E for ; Thu, 27 Apr 2023 00:10:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242837AbjD0AKx (ORCPT ); Wed, 26 Apr 2023 20:10:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242732AbjD0AKJ (ORCPT ); Wed, 26 Apr 2023 20:10:09 -0400 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB2C040E0 for ; Wed, 26 Apr 2023 17:10:07 -0700 (PDT) Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGwqlV017095; Thu, 27 Apr 2023 00:09:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=75Jn2zmafPDZhmtzI/Kq6AU5iLzwZOe7CoJGlihTNLk=; b=MOXjt11P2QQBa/6vgt8DMX+0+ZgqCUaivtCph8kNnTLFgBTbUFLihY8agUGoQkhqCOXL 0r3HEqV+4zssxzMNDVvjT0WWpU/tglDW6uKbJjRCvwYi199DE6+ohLNKqSrT7E5STFoX Jw9AKL4Zay68MyY1E2qTcYNTMMhyg/I5Ydd/AJyxqffV1b140ubkZsm9S9ROeOCfNvJy M3Zen9xmKgd3oPhKdrXmbDlq0nHMqHdvv6wJphA8J14HaVDW+6xqRy2BukjvxwL1tWMV OM2ISdxQsb1f1iT79Xjrs0hdyaPvf1E7CzCZ7AE3hZdsxWNpSI8UrQ6+k8mIlW6u0G98 Dw== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q46c4arvx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:36 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QMDCSU007353; Thu, 27 Apr 2023 00:09:36 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpx1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:36 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R09394013888; Thu, 27 Apr 2023 00:09:35 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-22; Thu, 27 Apr 2023 00:09:35 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 21/21] x86/boot/compressed/64: use 1GB pages for mappings Date: Wed, 26 Apr 2023 17:08:57 -0700 Message-Id: <1682554137-13938-22-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-GUID: HA43vq6yRTxbNgszb89rmJy5Zv5oWhzC X-Proofpoint-ORIG-GUID: HA43vq6yRTxbNgszb89rmJy5Zv5oWhzC Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" pkram kaslr code can incur multiple page faults when it walks its preserved ranges list called via mem_avoid_overlap(). The multiple faults can easily end up using up the small number of pages available to be allocated for page table pages. This patch hacks things so that mappings are 1GB which results in the need for far fewer page table pages. As is this breaks AMD SEV-ES which expects the mappings to be 2M. This could possibly be fixed by updating split code to split 1GB page if the aren't any other issues with using 1GB mappings. Signed-off-by: Anthony Yznaga --- arch/x86/boot/compressed/ident_map_64.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compre= ssed/ident_map_64.c index 321a5011042d..1e02cf6dda3c 100644 --- a/arch/x86/boot/compressed/ident_map_64.c +++ b/arch/x86/boot/compressed/ident_map_64.c @@ -95,8 +95,8 @@ void kernel_add_identity_map(unsigned long start, unsigne= d long end) int ret; =20 /* Align boundary to 2M. */ - start =3D round_down(start, PMD_SIZE); - end =3D round_up(end, PMD_SIZE); + start =3D round_down(start, PUD_SIZE); + end =3D round_up(end, PUD_SIZE); if (start >=3D end) return; =20 @@ -120,6 +120,7 @@ void initialize_identity_maps(void *rmode) mapping_info.context =3D &pgt_data; mapping_info.page_flag =3D __PAGE_KERNEL_LARGE_EXEC | sme_me_mask; mapping_info.kernpg_flag =3D _KERNPG_TABLE; + mapping_info.direct_gbpages =3D true; =20 /* * It should be impossible for this not to already be true, @@ -365,8 +366,8 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned = long error_code) =20 ghcb_fault =3D sev_es_check_ghcb_fault(address); =20 - address &=3D PMD_MASK; - end =3D address + PMD_SIZE; + address &=3D PUD_MASK; + end =3D address + PUD_SIZE; =20 /* * Check for unexpected error codes. Unexpected are: --=20 1.9.4