From nobody Thu Dec 18 08:58:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68D91C7EE29 for ; Fri, 2 Jun 2023 10:32:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234087AbjFBKcX (ORCPT ); Fri, 2 Jun 2023 06:32:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236592AbjFBKbY (ORCPT ); Fri, 2 Jun 2023 06:31:24 -0400 Received: from esa3.hc1455-7.c3s2.iphmx.com (esa3.hc1455-7.c3s2.iphmx.com [207.54.90.49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74F102D41 for ; Fri, 2 Jun 2023 03:29:45 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10728"; a="119282616" X-IronPort-AV: E=Sophos;i="6.00,212,1681138800"; d="scan'208";a="119282616" Received: from unknown (HELO yto-r4.gw.nic.fujitsu.com) ([218.44.52.220]) by esa3.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jun 2023 19:27:10 +0900 Received: from yto-m2.gw.nic.fujitsu.com (yto-nat-yto-m2.gw.nic.fujitsu.com [192.168.83.65]) by yto-r4.gw.nic.fujitsu.com (Postfix) with ESMTP id 06211D3EA3 for ; Fri, 2 Jun 2023 19:27:08 +0900 (JST) Received: from kws-ab4.gw.nic.fujitsu.com (kws-ab4.gw.nic.fujitsu.com [192.51.206.22]) by yto-m2.gw.nic.fujitsu.com (Postfix) with ESMTP id 3CDCFD67D2 for ; Fri, 2 Jun 2023 19:27:07 +0900 (JST) Received: from localhost.localdomain (unknown [10.167.234.230]) by kws-ab4.gw.nic.fujitsu.com (Postfix) with ESMTP id 43003E4AAF; Fri, 2 Jun 2023 19:27:06 +0900 (JST) From: Li Zhijian To: kexec@lists.infradead.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, dan.j.williams@intel.com, bhe@redhat.com, ruansy.fnst@fujitsu.com, y-goto@fujitsu.com, yangx.jy@fujitsu.com, Li Zhijian , Vivek Goyal , Dave Young Subject: [RFC PATCH kexec-tools v3 1/1] kexec: Add and mark pmem region into PT_LOADs Date: Fri, 2 Jun 2023 18:26:53 +0800 Message-Id: <20230602102656.131654-5-lizhijian@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230602102656.131654-1-lizhijian@fujitsu.com> References: <20230602102656.131654-1-lizhijian@fujitsu.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-27666.006 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-27666.006 X-TMASE-Result: 10--7.038100-10.000000 X-TMASE-MatchedRID: SzbEz7SZt2tSuJfEWZSQfA0QY5VnQyANm0H2L3kjQgpOmq2IYpeEBtfG u/3wXym7PHFWBoH6D4ycFX6mBx5z38fdkIlEiI2kRcGHEV0WBxCycrvYxo9Kp742hLbi424DvwU evDt+uW5/XjpbSJS7a86BcTqviA1zfbpIB/11M574Zi3x/9WFO9DEMPvvoocvo/gdx29vvKfIU7 MLOn2QZlafBRDsN6GPgFK2nmPCAnkfE8yM4pjsDwtuKBGekqUpI/NGWt0UYPC2A5+imUXo05G7e yF+PNMDxJCK7NkLnfyBYhxJ1cjIzsdLFNEgDIsW X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" It does: 1. Add pmem region into PT_LOADs of vmcore so that pmem region is dumpable Only the region described by PT_LOADs of /proc/vmcore are dumpable/readble by dumping applications. Previously, on x86/x86_64 only system ram resources will be injected into PT_LOADs. So in order to make the entire pmem resource is dumpable/readable, we need to add pmem region into the PT_LOADs of /proc/vmcore. 2. Mark pmem region's p_flags as PF_DEV so that we are able to ignore the specific pages For pmem, metadata is specific to the namespace rather than the entire pmem region. Therefore, ranges that have not yet created a namespace or are unusable due to alignment reasons will not be associated with metadata. When an application attempts to access regions that do not have corresponding metadata, it will encounter an access error. With this flag, the dumping applications are able to know this access error, and then take special actions correspondingly. CC: Baoquan He CC: Vivek Goyal CC: Dave Young CC: kexec@lists.infradead.org Signed-off-by: Li Zhijian --- kexec/crashdump-elf.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kexec/crashdump-elf.c b/kexec/crashdump-elf.c index b8bb686a17ca..ab257e825187 100644 --- a/kexec/crashdump-elf.c +++ b/kexec/crashdump-elf.c @@ -25,6 +25,8 @@ do { \ } while(0) #endif =20 +#define PF_DEV (1 << 4) + /* Prepares the crash memory headers and stores in supplied buffer. */ int FUNC(struct kexec_info *info, struct crash_elf_info *elf_info, @@ -199,7 +201,7 @@ int FUNC(struct kexec_info *info, * A seprate program header for Backup Region*/ for (i =3D 0; i < ranges; i++, range++) { unsigned long long mstart, mend; - if (range->type !=3D RANGE_RAM) + if (range->type !=3D RANGE_RAM && range->type !=3D RANGE_PMEM) continue; mstart =3D range->start; mend =3D range->end; @@ -209,6 +211,8 @@ int FUNC(struct kexec_info *info, bufp +=3D sizeof(PHDR); phdr->p_type =3D PT_LOAD; phdr->p_flags =3D PF_R|PF_W|PF_X; + if (range->type =3D=3D RANGE_PMEM) + phdr->p_flags |=3D PF_DEV; phdr->p_offset =3D mstart; =20 if (mstart =3D=3D info->backup_src_start --=20 2.29.2 From nobody Thu Dec 18 08:58:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA7DFC7EE29 for ; Fri, 2 Jun 2023 10:35:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235503AbjFBKfn (ORCPT ); Fri, 2 Jun 2023 06:35:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235941AbjFBKfI (ORCPT ); Fri, 2 Jun 2023 06:35:08 -0400 Received: from esa11.hc1455-7.c3s2.iphmx.com (esa11.hc1455-7.c3s2.iphmx.com [207.54.90.137]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF5B3198E for ; Fri, 2 Jun 2023 03:34:05 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10728"; a="98600091" X-IronPort-AV: E=Sophos;i="6.00,212,1681138800"; d="scan'208";a="98600091" Received: from unknown (HELO oym-r3.gw.nic.fujitsu.com) ([210.162.30.91]) by esa11.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jun 2023 19:27:09 +0900 Received: from oym-m4.gw.nic.fujitsu.com (oym-nat-oym-m4.gw.nic.fujitsu.com [192.168.87.61]) by oym-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id 0A2E1CA1E8 for ; Fri, 2 Jun 2023 19:27:06 +0900 (JST) Received: from kws-ab4.gw.nic.fujitsu.com (kws-ab4.gw.nic.fujitsu.com [192.51.206.22]) by oym-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id 36982D616D for ; Fri, 2 Jun 2023 19:27:05 +0900 (JST) Received: from localhost.localdomain (unknown [10.167.234.230]) by kws-ab4.gw.nic.fujitsu.com (Postfix) with ESMTP id 1DB03E4AAF; Fri, 2 Jun 2023 19:27:04 +0900 (JST) From: Li Zhijian To: kexec@lists.infradead.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, dan.j.williams@intel.com, bhe@redhat.com, ruansy.fnst@fujitsu.com, y-goto@fujitsu.com, yangx.jy@fujitsu.com, Li Zhijian , Vishal Verma , Dave Jiang , Ira Weiny Subject: [RFC PATCH v3 1/3] nvdimm: set force_raw=1 in kdump kernel Date: Fri, 2 Jun 2023 18:26:50 +0800 Message-Id: <20230602102656.131654-2-lizhijian@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230602102656.131654-1-lizhijian@fujitsu.com> References: <20230602102656.131654-1-lizhijian@fujitsu.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-27666.006 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-27666.006 X-TMASE-Result: 10-0.974000-10.000000 X-TMASE-MatchedRID: t1Iw0ML99//7w6uw5pqYnoOlbll4OMtk9LMB0hXFSeg6Zx3YUNQTG+Wh NKYuM7eN4QRvjxz49tHS7j6TEIEt1D3TQfUpAv1sPkILbTHNp5vYUDvAr2Y/17fYIuZsOQ0sOXB 2cqV0mCIre4xpX839SFAz81vtOOYiZ4F2TwmYmEDum6Nvy6t3NlK6+0HOVoSoWAuSz3ewb22AI+ pLfk3sByL637QCIVpi8vc3EUpCmrV9Y/vlKk76U9splnBzc8xMTFQnI+epPIaRo95rkBSGU6PFj JEFr+olwXCBO/GKkVqOhzOa6g8KrW2CLRXivi3wNhkKY/nNBRst5qYpMkBetMsgyaJj+IZ28rnl FWxtjwA8cpbtw5mTmssX0GOwKpO0i+7bNO++4Qw50ytg9FCbQRXBt/mUREyAj/ZFF9Wfm7hNy7p pG0IjcFQqk0j7vLVUewMSBDreIdk= X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The virtually mapped memory map allows storing struct page objects for persistent memory devices in pre-allocated storage on those devices. These 'struct page objects' on devices are also known as metadata. During libnvdimm/nd_pmem are loading, the previous metadata will be re-constructed to fit the current running kernel. For kdump purpose, these metadata should not be touched until the dumping is done so that the metadata is identical. To achieve this, we have some options 1. Don't provide libnvdimm driver in kdump kernel rootfs/initramfs 2. Disable libnvdimm driver by specific comline parameters ( initcall_blacklist=3Dlibnvdimm_init libnvdimm.blacklist=3D1 rd.driver.bl= acklist=3Dlibnvdimm) 3. Enforce force_raw=3D1 for nvdimm namespace, because when force_raw=3D1, metadata will not be re-constructed again. This may also result in the pmem doesn't work before a few extra configurations. Here we choose the 3rd one because the kdump application in this RFC relies on some /sys interfaces exported by libnvdimm and nd_pmem etc. CC: Dan Williams CC: Vishal Verma CC: Dave Jiang CC: Ira Weiny CC: nvdimm@lists.linux.dev Signed-off-by: Li Zhijian --- V3: new patch --- drivers/nvdimm/namespace_devs.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_dev= s.c index c60ec0b373c5..2e59be8b9c78 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "nd-core.h" #include "pmem.h" #include "pfn.h" @@ -1504,6 +1505,8 @@ struct nd_namespace_common *nvdimm_namespace_common_p= robe(struct device *dev) return ERR_PTR(-ENODEV); } =20 + if (is_kdump_kernel()) + ndns->force_raw =3D true; return ndns; } EXPORT_SYMBOL(nvdimm_namespace_common_probe); --=20 2.29.2 From nobody Thu Dec 18 08:58:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73E76C7EE24 for ; Fri, 2 Jun 2023 10:31:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234582AbjFBKbp (ORCPT ); Fri, 2 Jun 2023 06:31:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51280 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236371AbjFBKa5 (ORCPT ); Fri, 2 Jun 2023 06:30:57 -0400 Received: from esa12.hc1455-7.c3s2.iphmx.com (esa12.hc1455-7.c3s2.iphmx.com [139.138.37.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 393C110E6 for ; Fri, 2 Jun 2023 03:29:17 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10728"; a="98690281" X-IronPort-AV: E=Sophos;i="6.00,212,1681138800"; d="scan'208";a="98690281" Received: from unknown (HELO oym-r2.gw.nic.fujitsu.com) ([210.162.30.90]) by esa12.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jun 2023 19:27:10 +0900 Received: from oym-m3.gw.nic.fujitsu.com (oym-nat-oym-m3.gw.nic.fujitsu.com [192.168.87.60]) by oym-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id CE75FD4256 for ; Fri, 2 Jun 2023 19:27:08 +0900 (JST) Received: from kws-ab4.gw.nic.fujitsu.com (kws-ab4.gw.nic.fujitsu.com [192.51.206.22]) by oym-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id 0949DD9A88 for ; Fri, 2 Jun 2023 19:27:08 +0900 (JST) Received: from localhost.localdomain (unknown [10.167.234.230]) by kws-ab4.gw.nic.fujitsu.com (Postfix) with ESMTP id 0034040FE2; Fri, 2 Jun 2023 19:27:06 +0900 (JST) From: Li Zhijian To: kexec@lists.infradead.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, dan.j.williams@intel.com, bhe@redhat.com, ruansy.fnst@fujitsu.com, y-goto@fujitsu.com, yangx.jy@fujitsu.com, Li Zhijian , Vivek Goyal , Dave Young Subject: [RFC PATCH makedumpfile v3 1/3] elf_info.c: Introduce is_pmem_pt_load_range Date: Fri, 2 Jun 2023 18:26:54 +0800 Message-Id: <20230602102656.131654-6-lizhijian@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230602102656.131654-1-lizhijian@fujitsu.com> References: <20230602102656.131654-1-lizhijian@fujitsu.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-27666.006 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-27666.006 X-TMASE-Result: 10--6.061400-10.000000 X-TMASE-MatchedRID: uYOEf1I6Oo115zj/0di3Q+6bo2/Lq3c20MQw+++ihy86FHRWx2FGsL8F Hrw7frluf146W0iUu2uMQUNNVv3RZFsSYoQIc1cjSHCU59h5KrHjLrHqvAiSy0/cRvj5stP609D 6Rw2zIrP6Ss9HyBHBXv2MF5HVqqgBYwDOL7t3RyGeAiCmPx4NwBnUJ0Ek6yhjxEHRux+uk8h+IC quNi0WJEs4O5JlDkki+KxOPxgcX3fuuoHGzHNMZnPOHbXzO9lIftwZ3X11IV0= X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" It checks BIT(4) of Elf64_Phdr, currently only the former 3 bits are used by ELF. In kexec-tool, we extend the BIT(4) to indicate pmem or not. dump_Elf_load: phys_start phys_end virt_start = virt_end is_pmem dump_Elf_load: LOAD[ 0] 6b800000 6e42c000 ffffffffbcc00000 = ffffffffbf82c000 false dump_Elf_load: LOAD[ 1] 1000 9fc00 ffff975980001000 = ffff97598009fc00 false dump_Elf_load: LOAD[ 2] 100000 7f000000 ffff975980100000 = ffff9759ff000000 false dump_Elf_load: LOAD[ 3] bf000000 bffd7000 ffff975a3f000000 = ffff975a3ffd7000 false dump_Elf_load: LOAD[ 4] 100000000 140000000 ffff975a80000000 = ffff975ac0000000 false dump_Elf_load: LOAD[ 5] 140000000 23e200000 ffff975ac0000000 = ffff975bbe200000 true CC: Baoquan He CC: Vivek Goyal CC: Dave Young CC: kexec@lists.infradead.org Signed-off-by: Li Zhijian --- elf_info.c | 31 +++++++++++++++++++++++++++---- elf_info.h | 1 + 2 files changed, 28 insertions(+), 4 deletions(-) diff --git a/elf_info.c b/elf_info.c index bc24083655d6..41b36b2804d2 100644 --- a/elf_info.c +++ b/elf_info.c @@ -43,6 +43,7 @@ struct pt_load_segment { unsigned long long phys_end; unsigned long long virt_start; unsigned long long virt_end; + int is_pmem; }; =20 static int nr_cpus; /* number of cpu */ @@ -153,6 +154,8 @@ check_elf_format(int fd, char *filename, int *phnum, un= signed int *num_load) return FALSE; } =20 +#define PF_DEV (1 << 4) + static int dump_Elf_load(Elf64_Phdr *prog, int num_load) { @@ -170,17 +173,37 @@ dump_Elf_load(Elf64_Phdr *prog, int num_load) pls->virt_end =3D pls->virt_start + prog->p_memsz; pls->file_offset =3D prog->p_offset; pls->file_size =3D prog->p_filesz; + pls->is_pmem =3D !!(prog->p_flags & PF_DEV); =20 if (num_load =3D=3D 0) - DEBUG_MSG("%8s %16s %16s %16s %16s\n", "", - "phys_start", "phys_end", "virt_start", "virt_end"); + DEBUG_MSG("%8s %16s %16s %16s %16s %8s\n", "", + "phys_start", "phys_end", "virt_start", "virt_end", + "is_pmem"); =20 - DEBUG_MSG("LOAD[%2d] %16llx %16llx %16llx %16llx\n", num_load, - pls->phys_start, pls->phys_end, pls->virt_start, pls->virt_end); + DEBUG_MSG("LOAD[%2d] %16llx %16llx %16llx %16llx %8s\n", num_load, + pls->phys_start, pls->phys_end, pls->virt_start, pls->virt_end, + pls->is_pmem ? "true": "false"); =20 return TRUE; } =20 +int is_pmem_pt_load_range(unsigned long long start, unsigned long long end) +{ + int i; + struct pt_load_segment *pls; + + for (i =3D 0; i < num_pt_loads; i++) { + pls =3D &pt_loads[i]; + if (pls->is_pmem && pls->phys_start =3D=3D NOT_PADDR) + return TRUE; + if (pls->is_pmem && pls->phys_start !=3D NOT_PADDR && + pls->phys_start <=3D start && pls->phys_end >=3D end) + return TRUE; + } + + return FALSE; +} + static off_t offset_next_note(void *note) { diff --git a/elf_info.h b/elf_info.h index d5416b32cdd7..a08d59a331f6 100644 --- a/elf_info.h +++ b/elf_info.h @@ -64,6 +64,7 @@ int get_pt_load_extents(int idx, off_t *file_offset, off_t *file_size); unsigned int get_num_pt_loads(void); +int is_pmem_pt_load_range(unsigned long long start, unsigned long long end= ); =20 void set_nr_cpus(int num); int get_nr_cpus(void); --=20 2.29.2 From nobody Thu Dec 18 08:58:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF1B3C7EE29 for ; Fri, 2 Jun 2023 10:31:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235210AbjFBKbc (ORCPT ); Fri, 2 Jun 2023 06:31:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235730AbjFBK3s (ORCPT ); Fri, 2 Jun 2023 06:29:48 -0400 Received: from esa4.hc1455-7.c3s2.iphmx.com (esa4.hc1455-7.c3s2.iphmx.com [68.232.139.117]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96BD319BE for ; Fri, 2 Jun 2023 03:27:35 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10728"; a="119231445" X-IronPort-AV: E=Sophos;i="6.00,212,1681138800"; d="scan'208";a="119231445" Received: from unknown (HELO yto-r2.gw.nic.fujitsu.com) ([218.44.52.218]) by esa4.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jun 2023 19:27:08 +0900 Received: from yto-m3.gw.nic.fujitsu.com (yto-nat-yto-m3.gw.nic.fujitsu.com [192.168.83.66]) by yto-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id A4C40C68E4 for ; Fri, 2 Jun 2023 19:27:06 +0900 (JST) Received: from kws-ab4.gw.nic.fujitsu.com (kws-ab4.gw.nic.fujitsu.com [192.51.206.22]) by yto-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id E5C54D9698 for ; Fri, 2 Jun 2023 19:27:05 +0900 (JST) Received: from localhost.localdomain (unknown [10.167.234.230]) by kws-ab4.gw.nic.fujitsu.com (Postfix) with ESMTP id DD90630C425; Fri, 2 Jun 2023 19:27:04 +0900 (JST) From: Li Zhijian To: kexec@lists.infradead.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, dan.j.williams@intel.com, bhe@redhat.com, ruansy.fnst@fujitsu.com, y-goto@fujitsu.com, yangx.jy@fujitsu.com, Li Zhijian , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , x86@kernel.org Subject: [RFC PATCH v3 2/3] x86/crash: Add pmem region into PT_LOADs of vmcore Date: Fri, 2 Jun 2023 18:26:51 +0800 Message-Id: <20230602102656.131654-3-lizhijian@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230602102656.131654-1-lizhijian@fujitsu.com> References: <20230602102656.131654-1-lizhijian@fujitsu.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-27666.006 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-27666.006 X-TMASE-Result: 10--8.329300-10.000000 X-TMASE-MatchedRID: SzbEz7SZt2uKi1hWqQY8Ws7ggHewVGq2wwrHF5pwze+fHrjLA9DhZkop D+RCCRBkWW2Dmwvfm8fHs1P4pGN3CkKry8ky4RfFrMZ+BqQt2NqDnmblFLAswsC5DTEMxpeQfiq 1gj2xET/kizndBEr04aS6UDs/n0a68yMI2JjGZ9a628cXbnOhTxgff28UuvITQmw1cPfvj6kgfJ 3S34h6EV+3LSA0WzRX64sVlliWKx8fE8yM4pjsDwtuKBGekqUpI/NGWt0UYPBTsAAS6F7IXbxSS Tr43YbG2CTYFQRbaau5CD61Mvm0FlqwxgxViRpo X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Only the region described by PT_LOADs of /proc/vmcore are dumpable/readble by dumping applications. Previously, on x86/x86_64 only system ram resources will be injected into PT_LOADs. So in order to make the entire pmem resource is dumpable/readable, we need to add pmem region into the PT_LOADs of /proc/vmcore. Here we introduce a new API walk_pmem_res() to sort out the pmem region. No= te that, unlike other walk_xxx_res() API in resource.c, we walk through pmem resources without IORESOUCE_BUSY flag. This is kexec_file_load() specific, for kexec_load(), kexec-tools will have= a similar change. CC: Thomas Gleixner CC: Ingo Molnar CC: Borislav Petkov CC: Dave Hansen CC: "H. Peter Anvin" CC: Baoquan He CC: x86@kernel.org Signed-off-by: Li Zhijian --- arch/x86/kernel/crash.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index cdd92ab43cda..97763ea804c6 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -168,6 +168,17 @@ static int get_nr_ram_ranges_callback(struct resource = *res, void *arg) return 0; } =20 +/* + * This function calls the @func callback against all memory ranges, which + * are ranges marked as IORESOURCE_MEM and IORES_DESC_PERSISTENT_MEMORY. + */ +static int walk_pmem_res(u64 start, u64 end, void *arg, + int (*func)(struct resource *, void *)) +{ + return walk_iomem_res_desc(IORES_DESC_PERSISTENT_MEMORY, IORESOURCE_MEM, + start, end, arg, func); +} + /* Gather all the required information to prepare elf headers for ram regi= ons */ static struct crash_mem *fill_up_crash_elf_data(void) { @@ -178,6 +189,7 @@ static struct crash_mem *fill_up_crash_elf_data(void) if (!nr_ranges) return NULL; =20 + walk_pmem_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback); /* * Exclusion of crash region and/or crashk_low_res may cause * another range split. So add extra two slots here. @@ -243,6 +255,7 @@ static int prepare_elf_headers(struct kimage *image, vo= id **addr, ret =3D walk_system_ram_res(0, -1, cmem, prepare_elf64_ram_headers_callba= ck); if (ret) goto out; + walk_pmem_res(0, -1, cmem, prepare_elf64_ram_headers_callback); =20 /* Exclude unwanted mem ranges */ ret =3D elf_header_exclude_ranges(cmem); --=20 2.29.2 From nobody Thu Dec 18 08:58:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7001C7EE29 for ; Fri, 2 Jun 2023 10:31:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235681AbjFBKbz (ORCPT ); Fri, 2 Jun 2023 06:31:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236429AbjFBKbE (ORCPT ); Fri, 2 Jun 2023 06:31:04 -0400 X-Greylist: delayed 72 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Fri, 02 Jun 2023 03:29:22 PDT Received: from esa3.hc1455-7.c3s2.iphmx.com (esa3.hc1455-7.c3s2.iphmx.com [207.54.90.49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72A0B2710 for ; Fri, 2 Jun 2023 03:29:22 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10728"; a="119282617" X-IronPort-AV: E=Sophos;i="6.00,212,1681138800"; d="scan'208";a="119282617" Received: from unknown (HELO oym-r3.gw.nic.fujitsu.com) ([210.162.30.91]) by esa3.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jun 2023 19:27:11 +0900 Received: from oym-m3.gw.nic.fujitsu.com (oym-nat-oym-m3.gw.nic.fujitsu.com [192.168.87.60]) by oym-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id 860F3CA1EB for ; Fri, 2 Jun 2023 19:27:09 +0900 (JST) Received: from kws-ab4.gw.nic.fujitsu.com (kws-ab4.gw.nic.fujitsu.com [192.51.206.22]) by oym-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id BE6A1D9A89 for ; Fri, 2 Jun 2023 19:27:08 +0900 (JST) Received: from localhost.localdomain (unknown [10.167.234.230]) by kws-ab4.gw.nic.fujitsu.com (Postfix) with ESMTP id CC25FE4ABD; Fri, 2 Jun 2023 19:27:07 +0900 (JST) From: Li Zhijian To: kexec@lists.infradead.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, dan.j.williams@intel.com, bhe@redhat.com, ruansy.fnst@fujitsu.com, y-goto@fujitsu.com, yangx.jy@fujitsu.com, Li Zhijian , Vivek Goyal , Dave Young Subject: [RFC PATCH makedumpfile v3 2/3] makedumpfile.c: Exclude all pmem pages Date: Fri, 2 Jun 2023 18:26:55 +0800 Message-Id: <20230602102656.131654-7-lizhijian@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230602102656.131654-1-lizhijian@fujitsu.com> References: <20230602102656.131654-1-lizhijian@fujitsu.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-27666.006 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-27666.006 X-TMASE-Result: 10--6.321800-10.000000 X-TMASE-MatchedRID: CCHfsHgEWa2BgK4uB6zi2wI0yP/uoH+D2+EDPw+8xrdJEjJjpEhCnwFV KntB9/BKIvrftAIhWmLy9zcRSkKatUAhvNB5B6uKzfqlpbtmcWgAPNCUrAcH+yS30GKAkBxW58T g0a5Ro0fiJKINn1ydSUNghIQqhUBNP7A6mmzUskCp3Btb1bH20NY3ddD/vCxPkHPVkBTu31P8bd qAerXT9wzx9NgblERANGy+MoqzZ01DQYe+1GQPNbrbxxduc6FPfS0Ip2eEHnz3IzXlXlpamPoLR 4+zsDTtH/zyL+gBqiwKmTdg1dVt31KQTaEXmx8EEvFiRzb4CfQizqc5rHUqnQ== X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Generally, the pmem is too large to suitable to be dumped. Further, only the namespace of the pmem is dumpable, but actually currently we have no idea the excatly layout of the namespace in pmem. So we exclude all of them temporarily. And later, we will try to support including/excluding metadata by specific parameter. CC: Baoquan He CC: Vivek Goyal CC: Dave Young CC: kexec@lists.infradead.org Signed-off-by: Li Zhijian --- makedumpfile.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/makedumpfile.c b/makedumpfile.c index cadc59662bef..f304f752b0ec 100644 --- a/makedumpfile.c +++ b/makedumpfile.c @@ -100,6 +100,7 @@ mdf_pfn_t pfn_user; mdf_pfn_t pfn_free; mdf_pfn_t pfn_hwpoison; mdf_pfn_t pfn_offline; +mdf_pfn_t pfn_pmem_userdata; mdf_pfn_t pfn_elf_excluded; =20 mdf_pfn_t num_dumped; @@ -6389,6 +6390,7 @@ __exclude_unnecessary_pages(unsigned long mem_map, unsigned int order_offset, dtor_offset; unsigned long flags, mapping, private =3D 0; unsigned long compound_dtor, compound_head =3D 0; + unsigned int is_pmem; =20 /* * If a multi-page exclusion is pending, do it first @@ -6443,6 +6445,13 @@ __exclude_unnecessary_pages(unsigned long mem_map, continue; } =20 + is_pmem =3D is_pmem_pt_load_range(pfn << PAGESHIFT(), (pfn + 1) << PAGES= HIFT()); + if (is_pmem) { + pfn_pmem_userdata++; + clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle); + continue; + } + index_pg =3D pfn % PGMM_CACHED; pcache =3D page_cache + (index_pg * SIZE(page)); =20 @@ -8122,7 +8131,7 @@ write_elf_pages_cyclic(struct cache_data *cd_header, = struct cache_data *cd_page) */ if (info->flag_cyclic) { pfn_zero =3D pfn_cache =3D pfn_cache_private =3D 0; - pfn_user =3D pfn_free =3D pfn_hwpoison =3D pfn_offline =3D 0; + pfn_user =3D pfn_free =3D pfn_hwpoison =3D pfn_offline =3D pfn_pmem_user= data =3D 0; pfn_memhole =3D info->max_mapnr; } =20 @@ -9460,7 +9469,7 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data= *cd_header, struct cache_d * Reset counter for debug message. */ pfn_zero =3D pfn_cache =3D pfn_cache_private =3D 0; - pfn_user =3D pfn_free =3D pfn_hwpoison =3D pfn_offline =3D 0; + pfn_user =3D pfn_free =3D pfn_hwpoison =3D pfn_offline =3D pfn_pmem_user= data =3D 0; pfn_memhole =3D info->max_mapnr; =20 /* @@ -10408,7 +10417,7 @@ print_report(void) */ pfn_original =3D info->max_mapnr - pfn_memhole; =20 - pfn_excluded =3D pfn_zero + pfn_cache + pfn_cache_private + pfn_excluded =3D pfn_zero + pfn_cache + pfn_cache_private + pfn_pmem_user= data + pfn_user + pfn_free + pfn_hwpoison + pfn_offline; =20 REPORT_MSG("\n"); @@ -10425,6 +10434,7 @@ print_report(void) REPORT_MSG(" Free pages : 0x%016llx\n", pfn_free); REPORT_MSG(" Hwpoison pages : 0x%016llx\n", pfn_hwpoison); REPORT_MSG(" Offline pages : 0x%016llx\n", pfn_offline); + REPORT_MSG(" pmem userdata pages : 0x%016llx\n", pfn_pmem_userdata= ); REPORT_MSG(" Remaining pages : 0x%016llx\n", pfn_original - pfn_excluded); =20 @@ -10464,7 +10474,7 @@ print_mem_usage(void) */ pfn_original =3D info->max_mapnr - pfn_memhole; =20 - pfn_excluded =3D pfn_zero + pfn_cache + pfn_cache_private + pfn_excluded =3D pfn_zero + pfn_cache + pfn_cache_private + pfn_pmem_user= data + pfn_user + pfn_free + pfn_hwpoison + pfn_offline; shrinking =3D (pfn_original - pfn_excluded) * 100; shrinking =3D shrinking / pfn_original; --=20 2.29.2 From nobody Thu Dec 18 08:58:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1D70C7EE29 for ; Fri, 2 Jun 2023 10:31:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235603AbjFBKbk (ORCPT ); Fri, 2 Jun 2023 06:31:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236413AbjFBKbC (ORCPT ); Fri, 2 Jun 2023 06:31:02 -0400 Received: from esa9.hc1455-7.c3s2.iphmx.com (esa9.hc1455-7.c3s2.iphmx.com [139.138.36.223]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 119DF2709 for ; Fri, 2 Jun 2023 03:29:20 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10728"; a="107447910" X-IronPort-AV: E=Sophos;i="6.00,212,1681138800"; d="scan'208";a="107447910" Received: from unknown (HELO oym-r3.gw.nic.fujitsu.com) ([210.162.30.91]) by esa9.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jun 2023 19:27:09 +0900 Received: from oym-m3.gw.nic.fujitsu.com (oym-nat-oym-m3.gw.nic.fujitsu.com [192.168.87.60]) by oym-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id 83306CA1F0 for ; Fri, 2 Jun 2023 19:27:07 +0900 (JST) Received: from kws-ab4.gw.nic.fujitsu.com (kws-ab4.gw.nic.fujitsu.com [192.51.206.22]) by oym-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id 76FE4D9A89 for ; Fri, 2 Jun 2023 19:27:06 +0900 (JST) Received: from localhost.localdomain (unknown [10.167.234.230]) by kws-ab4.gw.nic.fujitsu.com (Postfix) with ESMTP id A65F4E4AA6; Fri, 2 Jun 2023 19:27:05 +0900 (JST) From: Li Zhijian To: kexec@lists.infradead.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, dan.j.williams@intel.com, bhe@redhat.com, ruansy.fnst@fujitsu.com, y-goto@fujitsu.com, yangx.jy@fujitsu.com, Li Zhijian , Eric Biederman Subject: [RFC PATCH v3 3/3] kernel/kexec_file: Mark pmem region with new flag PF_DEV Date: Fri, 2 Jun 2023 18:26:52 +0800 Message-Id: <20230602102656.131654-4-lizhijian@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230602102656.131654-1-lizhijian@fujitsu.com> References: <20230602102656.131654-1-lizhijian@fujitsu.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-27666.006 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-27666.006 X-TMASE-Result: 10--6.493600-10.000000 X-TMASE-MatchedRID: 9KRLfRi0EIdXk4HjwySOxx1kSRHxj+Z5W+HVwTKSJIbfghYDxv+lXYyt cTfY0Fk5Fcbst+g5Co4SHwgOjzzIf6GGOyqBK41vEXjPIvKd74BMkOX0UoduubgbJOZ434BsXMi +6Pt1uebPz9CYF+mMT7BRAkACNnr7lwV2iaAfSWcURSScn+QSXhhJCIHRlO51+gtHj7OwNO0kL2 NLniq3NW48jxF4hJknuYLQT0SIw8Y4aBTofN3FK6vjRgCarhn/ X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" For pmem, metadata is specific to the namespace rather than the entire pmem region. Therefore, ranges that have not yet created a namespace or are unusable due to alignment reasons will not be associated with metadata. When an application attempts to access regions that do not have corresponding metadata, it will encounter an access error. With this flag, the dumping applications are able to know this access error, and then take special actions correspondingly. This is kexec_file_load() specific, for the traditional kexec_load(), kexec-tools will have a similar change. CC: Eric Biederman CC: Baoquan He CC: kexec@lists.infradead.org Signed-off-by: Li Zhijian --- kernel/kexec_file.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index f989f5f1933b..0d5b516b96ee 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -29,6 +29,8 @@ #include #include "kexec_internal.h" =20 +#define PF_DEV (1 << 4) + #ifdef CONFIG_KEXEC_SIG static bool sig_enforce =3D IS_ENABLED(CONFIG_KEXEC_SIG_FORCE); =20 @@ -1221,6 +1223,12 @@ int crash_exclude_mem_range(struct crash_mem *mem, return 0; } =20 +static bool is_pmem_range(u64 start, u64 size) +{ + return REGION_INTERSECTS =3D=3D region_intersects(start, size, + IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY); +} + int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map, void **addr, unsigned long *sz) { @@ -1302,6 +1310,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem= , int need_kernel_map, =20 phdr->p_type =3D PT_LOAD; phdr->p_flags =3D PF_R|PF_W|PF_X; + if (is_pmem_range(mstart, mend - mstart)) + phdr->p_flags |=3D PF_DEV; phdr->p_offset =3D mstart; =20 phdr->p_paddr =3D mstart; --=20 2.29.2 From nobody Thu Dec 18 08:58:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9ECA1C7EE29 for ; Fri, 2 Jun 2023 10:32:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234769AbjFBKb6 (ORCPT ); Fri, 2 Jun 2023 06:31:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235907AbjFBKaJ (ORCPT ); Fri, 2 Jun 2023 06:30:09 -0400 Received: from esa4.hc1455-7.c3s2.iphmx.com (esa4.hc1455-7.c3s2.iphmx.com [68.232.139.117]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5FAF71FC0 for ; Fri, 2 Jun 2023 03:28:09 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10728"; a="119231456" X-IronPort-AV: E=Sophos;i="6.00,212,1681138800"; d="scan'208";a="119231456" Received: from unknown (HELO yto-r2.gw.nic.fujitsu.com) ([218.44.52.218]) by esa4.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jun 2023 19:27:12 +0900 Received: from yto-m3.gw.nic.fujitsu.com (yto-nat-yto-m3.gw.nic.fujitsu.com [192.168.83.66]) by yto-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id 5D096C68E2 for ; Fri, 2 Jun 2023 19:27:10 +0900 (JST) Received: from kws-ab4.gw.nic.fujitsu.com (kws-ab4.gw.nic.fujitsu.com [192.51.206.22]) by yto-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id 9878DD20AF for ; Fri, 2 Jun 2023 19:27:09 +0900 (JST) Received: from localhost.localdomain (unknown [10.167.234.230]) by kws-ab4.gw.nic.fujitsu.com (Postfix) with ESMTP id 89B3840FE2; Fri, 2 Jun 2023 19:27:08 +0900 (JST) From: Li Zhijian To: kexec@lists.infradead.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, dan.j.williams@intel.com, bhe@redhat.com, ruansy.fnst@fujitsu.com, y-goto@fujitsu.com, yangx.jy@fujitsu.com, Li Zhijian , Vivek Goyal , Dave Young Subject: [RFC PATCH makedumpfile v3 3/3] makedumpfile: get metadata boundaries from pmem's infoblock Date: Fri, 2 Jun 2023 18:26:56 +0800 Message-Id: <20230602102656.131654-8-lizhijian@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230602102656.131654-1-lizhijian@fujitsu.com> References: <20230602102656.131654-1-lizhijian@fujitsu.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-27666.006 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-27666.006 X-TMASE-Result: 10--22.341900-10.000000 X-TMASE-MatchedRID: 44XoofGiUa/SFChHF48gQTiEPRj9j9rvUh4weWPqOWS638ZUY6gSd1YI HeJwbhUqNmVYIPY+Gy2YlVLttZCaEDBF7stuNMMxEVuC0eNRYvLxKR2kbb+f1wYkj8pwyAod12k t5HuVFgCbKneFtGQ4N1LlTv2kQ2/LP06ke90qDvPum6Nvy6t3NgmWvXEqQTm5wLkNMQzGl5B+Kr WCPbERP2XM/1LvxbWDZva/ER92lB2cfX6Ug1yFMM36paW7ZnFoGB9/bxS68hPk6Qbi+9i6Dz/zI kvUoSsdD7X3CqobQahpM4uQJ+eFBvdZ3f2rISMmfOaYwP8dcX6BiLDUCsch2+uLH9BII+4qlfC3 WuSrCNRwmg9/GN+fyAQTZYEbCLsYrJKo/rFyHQGVOwZbcOalSxZSD+Gbjz3IWH7Bxw4ADCPmKXL lUHkJC+EjtnMuJXJurgv6vPEWDp7YsZWA4GRU5h1qGr6sYOf/wJjn8yqLU6I1Y73PdzvXZKKJ8y eXnfHAZZ1LvVb67BW23YOnZxHtRsOrmarpjDAkngIgpj8eDcAZ1CdBJOsoY8RB0bsfrpPI6T/LT DsmJmg= X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" some code are copied from ndctl. This change requires libndctl which provides interface to walk through all existing namespaces. This also requires the namespace entered the force_raw mode(the kernel will ensure that). The resource interface provides the start of namespace, and device's superblock provides the usedata's offset of namespace. According to this information, we can caculate the scope of metadata. A new dump level(-d 63) is introduced to skip the metadata as well in this patch. CC: Baoquan He CC: Vivek Goyal CC: Dave Young CC: kexec@lists.infradead.org Signed-off-by: Li Zhijian --- Makefile | 2 +- makedumpfile.c | 196 +++++++++++++++++++++++++++++++++++++++++++++++-- makedumpfile.h | 3 +- 3 files changed, 192 insertions(+), 9 deletions(-) diff --git a/Makefile b/Makefile index 0608035e913f..fd0a792c5647 100644 --- a/Makefile +++ b/Makefile @@ -50,7 +50,7 @@ OBJ_PART=3D$(patsubst %.c,%.o,$(SRC_PART)) SRC_ARCH =3D arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c = arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loon= garch64.c OBJ_ARCH=3D$(patsubst %.c,%.o,$(SRC_ARCH)) =20 -LIBS =3D -ldw -lbz2 -ldl -lelf -lz +LIBS =3D -ldw -lbz2 -ldl -lelf -lz -lndctl ifneq ($(LINKTYPE), dynamic) LIBS :=3D -static $(LIBS) -llzma endif diff --git a/makedumpfile.c b/makedumpfile.c index f304f752b0ec..b68d261f3d1e 100644 --- a/makedumpfile.c +++ b/makedumpfile.c @@ -27,6 +27,8 @@ #include #include #include +#include +#include =20 struct symbol_table symbol_table; struct size_table size_table; @@ -100,6 +102,7 @@ mdf_pfn_t pfn_user; mdf_pfn_t pfn_free; mdf_pfn_t pfn_hwpoison; mdf_pfn_t pfn_offline; +mdf_pfn_t pfn_pmem_metadata; mdf_pfn_t pfn_pmem_userdata; mdf_pfn_t pfn_elf_excluded; =20 @@ -6374,6 +6377,173 @@ exclude_range(mdf_pfn_t *counter, mdf_pfn_t pfn, md= f_pfn_t endpfn, } } =20 +struct pmem_metadata_node { + unsigned long long start; + unsigned long long end; + struct pmem_metadata_node *next; +} pmem_metadata_head; + +struct pmem_metadata_node *pmem_head =3D NULL; + +static void pmem_add_next(unsigned long long start, unsigned long long dat= aoff) +{ + struct pmem_metadata_node *tail =3D pmem_head, *node; + + node =3D calloc(1, sizeof(*node)); + if (!node) + return; + + node->start =3D start >> info->page_shift; + node->end =3D (start + dataoff) >> info->page_shift; + node->next =3D NULL; + + if (!pmem_head) { + pmem_head =3D node; + return; + } + while (tail->next) + tail =3D tail->next; + tail->next =3D node; +} + +static void cleanup_pmem_metadata(void) +{ + struct pmem_metadata_node *head =3D pmem_head; + while (head) { + struct pmem_metadata_node *next =3D head->next; + free(head); + head =3D next; + } +} + +static int is_pmem_metadata_range(unsigned long long start, unsigned long = long end) +{ + struct pmem_metadata_node *head =3D pmem_head; + while (head) { + if (head->start <=3D start && head->end > end) + return TRUE; + head =3D head->next; + } + + return FALSE; +} + +static void dump_pmem_range(void) +{ + int i =3D 0; + struct pmem_metadata_node *node=3D pmem_head; + + fprintf(stderr, "dump_pmem_range start......\n\n\n"); + while (node) { + fprintf(stderr, "namespace[%d]: metadata[%llx, %llx]\n", i++, node->star= t, node->end); + node =3D node->next; + } + fprintf(stderr, "dump_pmem_range end........\n\n\n"); +} + +#define INFOBLOCK_SZ (8192) +#define SZ_4K (4096) +#define PFN_SIG_LEN 16 + +typedef uint64_t u64; +typedef int64_t s64; +typedef uint32_t u32; +typedef int32_t s32; +typedef uint16_t u16; +typedef int16_t s16; +typedef uint8_t u8; +typedef int8_t s8; + +typedef int64_t le64; +typedef int32_t le32; +typedef int16_t le16; + +struct pfn_sb { + u8 signature[PFN_SIG_LEN]; + u8 uuid[16]; + u8 parent_uuid[16]; + le32 flags; + le16 version_major; + le16 version_minor; + le64 dataoff; /* relative to namespace_base + start_pad */ + le64 npfns; + le32 mode; + /* minor-version-1 additions for section alignment */ + le32 start_pad; + le32 end_trunc; + /* minor-version-2 record the base alignment of the mapping */ + le32 align; + /* minor-version-3 guarantee the padding and flags are zero */ + /* minor-version-4 record the page size and struct page size */ + le32 page_size; + le16 page_struct_size; + u8 padding[3994]; + le64 checksum; +}; + +static int nd_read_infoblock_dataoff(struct ndctl_namespace *ndns) +{ + int fd, rc; + char path[50]; + char buf[INFOBLOCK_SZ + 1]; + struct pfn_sb *pfn_sb =3D (struct pfn_sb *)(buf + SZ_4K); + + sprintf(path, "/dev/%s", ndctl_namespace_get_block_device(ndns)); + + fd =3D open(path, O_RDONLY|O_EXCL); + if (fd < 0) + return -1; + + rc =3D read(fd, buf, INFOBLOCK_SZ); + if (rc < INFOBLOCK_SZ) { + return -1; + } + + return pfn_sb->dataoff; +} + +int inspect_pmem_namespace(void) +{ + struct ndctl_ctx *ctx; + struct ndctl_bus *bus; + int rc =3D -1; + + fprintf(stderr, "\n\ninspect_pmem_namespace!!\n\n"); + rc =3D ndctl_new(&ctx); + if (rc) + return -1; + + ndctl_bus_foreach(ctx, bus) { + struct ndctl_region *region; + + ndctl_region_foreach(bus, region) { + struct ndctl_namespace *ndns; + + ndctl_namespace_foreach(region, ndns) { + enum ndctl_namespace_mode mode; + long long start, end_metadata; + + mode =3D ndctl_namespace_get_mode(ndns); + /* kdump kernel should set force_raw, mode become *safe* */ + if (mode =3D=3D NDCTL_NS_MODE_SAFE) { + fprintf(stderr, "Only raw can be dumpable\n"); + continue; + } + + start =3D ndctl_namespace_get_resource(ndns); + end_metadata =3D nd_read_infoblock_dataoff(ndns); + + /* metadata really starts from 2M alignment */ + if (start !=3D ULLONG_MAX && end_metadata > 2 * 1024 * 1024) // 2M + pmem_add_next(start, end_metadata); + } + } + } + + ndctl_unref(ctx); + return 0; +} + int __exclude_unnecessary_pages(unsigned long mem_map, mdf_pfn_t pfn_start, mdf_pfn_t pfn_end, struct cycle *cycle) @@ -6447,9 +6617,17 @@ __exclude_unnecessary_pages(unsigned long mem_map, =20 is_pmem =3D is_pmem_pt_load_range(pfn << PAGESHIFT(), (pfn + 1) << PAGES= HIFT()); if (is_pmem) { - pfn_pmem_userdata++; - clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle); - continue; + if (is_pmem_metadata_range(pfn, pfn + 1)) { + if (info->dump_level & DL_EXCLUDE_PMEM_META) { + pfn_pmem_metadata++; + clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle); + continue; + } + } else { + pfn_pmem_userdata++; + clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle); + continue; + } } =20 index_pg =3D pfn % PGMM_CACHED; @@ -8130,7 +8308,7 @@ write_elf_pages_cyclic(struct cache_data *cd_header, = struct cache_data *cd_page) * Reset counter for debug message. */ if (info->flag_cyclic) { - pfn_zero =3D pfn_cache =3D pfn_cache_private =3D 0; + pfn_zero =3D pfn_cache =3D pfn_cache_private =3D pfn_pmem_metadata =3D 0; pfn_user =3D pfn_free =3D pfn_hwpoison =3D pfn_offline =3D pfn_pmem_user= data =3D 0; pfn_memhole =3D info->max_mapnr; } @@ -9468,7 +9646,7 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data= *cd_header, struct cache_d /* * Reset counter for debug message. */ - pfn_zero =3D pfn_cache =3D pfn_cache_private =3D 0; + pfn_zero =3D pfn_cache =3D pfn_cache_private =3D pfn_pmem_metadata =3D 0; pfn_user =3D pfn_free =3D pfn_hwpoison =3D pfn_offline =3D pfn_pmem_user= data =3D 0; pfn_memhole =3D info->max_mapnr; =20 @@ -10418,7 +10596,7 @@ print_report(void) pfn_original =3D info->max_mapnr - pfn_memhole; =20 pfn_excluded =3D pfn_zero + pfn_cache + pfn_cache_private + pfn_pmem_user= data - + pfn_user + pfn_free + pfn_hwpoison + pfn_offline; + + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_pmem_metadat= a; =20 REPORT_MSG("\n"); REPORT_MSG("Original pages : 0x%016llx\n", pfn_original); @@ -10434,6 +10612,7 @@ print_report(void) REPORT_MSG(" Free pages : 0x%016llx\n", pfn_free); REPORT_MSG(" Hwpoison pages : 0x%016llx\n", pfn_hwpoison); REPORT_MSG(" Offline pages : 0x%016llx\n", pfn_offline); + REPORT_MSG(" pmem metadata pages : 0x%016llx\n", pfn_pmem_metadata= ); REPORT_MSG(" pmem userdata pages : 0x%016llx\n", pfn_pmem_userdata= ); REPORT_MSG(" Remaining pages : 0x%016llx\n", pfn_original - pfn_excluded); @@ -10475,7 +10654,7 @@ print_mem_usage(void) pfn_original =3D info->max_mapnr - pfn_memhole; =20 pfn_excluded =3D pfn_zero + pfn_cache + pfn_cache_private + pfn_pmem_user= data - + pfn_user + pfn_free + pfn_hwpoison + pfn_offline; + + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_pmem_metadat= a; shrinking =3D (pfn_original - pfn_excluded) * 100; shrinking =3D shrinking / pfn_original; total_size =3D info->page_size * pfn_original; @@ -10768,6 +10947,8 @@ create_dumpfile(void) } } =20 + inspect_pmem_namespace(); + dump_pmem_range(); print_vtop(); =20 num_retry =3D 0; @@ -12441,6 +12622,7 @@ out: } } free_elf_info(); + cleanup_pmem_metadata(); =20 return retcd; } diff --git a/makedumpfile.h b/makedumpfile.h index 85e5a4932983..ecb2fb4d7a4c 100644 --- a/makedumpfile.h +++ b/makedumpfile.h @@ -206,7 +206,7 @@ test_bit(int nr, unsigned long addr) * Dump Level */ #define MIN_DUMP_LEVEL (0) -#define MAX_DUMP_LEVEL (31) +#define MAX_DUMP_LEVEL (63) #define NUM_ARRAY_DUMP_LEVEL (MAX_DUMP_LEVEL + 1) /* enough to allocate all the dump_level */ #define DL_EXCLUDE_ZERO (0x001) /* Exclude Pages filled with Zeros */ @@ -216,6 +216,7 @@ test_bit(int nr, unsigned long addr) with Private Pages */ #define DL_EXCLUDE_USER_DATA (0x008) /* Exclude UserProcessData Pages */ #define DL_EXCLUDE_FREE (0x010) /* Exclude Free Pages */ +#define DL_EXCLUDE_PMEM_META (0x020) /* Exclude pmem metadata Pages */ =20 =20 /* --=20 2.29.2