From nobody Tue Sep 9 16:53:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3D84C77B7D for ; Mon, 15 May 2023 06:08:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229669AbjEOGIX (ORCPT ); Mon, 15 May 2023 02:08:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239475AbjEOGIC (ORCPT ); Mon, 15 May 2023 02:08:02 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D5A430DD for ; Sun, 14 May 2023 23:03:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684130593; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WArNoC6uSt8maUFT7n6H0H1hJWsPEQmI39XIRXJdkyU=; b=TixGiILMA5SvjIglFPZDcop4+99CnTGufsC1HXAsRjukUINVdz0bVAdnoaUdLrgzLmVe9C AqPXobqbC2wFp7ALo6K/HNwxq1pcE6axfmw2IybxCkDwc07FufEulX6MNMOBFc8N7op/oL +F85S8WqnbJw3D9PZ2vHNTQVHtZOAas= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-564-59zbIiHkPuaB0hNwovmpWA-1; Mon, 15 May 2023 02:03:12 -0400 X-MC-Unique: 59zbIiHkPuaB0hNwovmpWA-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7D08185A588; Mon, 15 May 2023 06:03:11 +0000 (UTC) Received: from MiWiFi-R3L-srv.redhat.com (ovpn-12-32.pek2.redhat.com [10.72.12.32]) by smtp.corp.redhat.com (Postfix) with ESMTP id 161AA483EC2; Mon, 15 May 2023 06:03:06 +0000 (UTC) From: Baoquan He To: linux-kernel@vger.kernel.org Cc: catalin.marinas@arm.com, will@kernel.org, horms@kernel.org, thunder.leizhen@huawei.com, John.p.donnelly@oracle.com, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, Baoquan He Subject: [PATCH v6 1/2] arm64: kdump: simplify the reservation behaviour of crashkernel=,high Date: Mon, 15 May 2023 14:02:58 +0800 Message-Id: <20230515060259.830662-2-bhe@redhat.com> In-Reply-To: <20230515060259.830662-1-bhe@redhat.com> References: <20230515060259.830662-1-bhe@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" On arm64, reservation for 'crashkernel=3DxM,high' is taken by searching for suitable memory region top down. If the 'xM' of crashkernel high memory is reserved from high memory successfully, it will try to reserve crashkernel low memory later accoringly. Otherwise, it will try to search low memory area for the 'xM' suitable region. Please see the details in Documentation/admin-guide/kernel-parameters.txt. While we observed an unexpected case where a reserved region crosses the high and low meomry boundary. E.g on a system with 4G as low memory end, user added the kernel parameters like: 'crashkernel=3D512M,high', it could finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel. The crashkernel high region crossing low and high memory boudary will bring issues: 1) For crashkernel=3Dx,high, if getting crashkernel high region across low and high memory boundary, then user will see two memory regions in low memory, and one memory region in high memory. The two crashkernel low memory regions are confusing as shown in above example. 2) If people explicityly specify "crashkernel=3Dx,high crashkernel=3Dy,low" and y <=3D 128M, when crashkernel high region crosses low and high memory boundary and the part of crashkernel high reservation below boundary is bigger than y, the expected crahskernel low reservation will be skipped. But the expected crashkernel high reservation is shrank and could not satisfy user space requirement. 3) The crossing boundary behaviour of crahskernel high reservation is different than x86 arch. On x86_64, the low memory end is 4G fixedly, and the memory near 4G is reserved by system, e.g for mapping firmware, pci mapping, so the crashkernel reservation crossing boundary never happens. From distros point of view, this brings inconsistency and confusion. Users need to dig into x86 and arm64 system details to find out why. For kernel itself, the impact of issue 3) could be slight. While issue 1) and 2) cause actual impact because it brings obscure semantics and behaviour to crashkernel=3D,high reservation. Here, for crashkernel=3DxM,high, search the high memory for the suitable region only in high memory. If failed, try reserving the suitable region only in low memory. Like this, the crashkernel high region will only exist in high memory, and crashkernel low region only exists in low memory. The reservation behaviour for crashkernel=3D,high is clearer and simpler. Note: RPi4 has different zone ranges than normal memory. Its DMA zone is 0~1G, and DMA32 zone is 1G~4G if CONFIG_ZONE_DMA|DMA32 are enabled by default. The low memory end is 1G in order to validate all devices, high memory starts at 1G memory. However, for being consistent with normla arm64 system, its low memory end is still 1G, while reserving crashkernel high memory from 4G if crashkernel=3Dsize,high specified. This will remove confusion. With above change applied, summary of arm64 crashkernel reservation range: 1) RPi4(zone DMA:0~1G; DMA32:1G~4G): crashkernel=3Dsize 0~1G: low memory | 1G~top: high memory crashkernel=3Dsize,high 0~1G: low memory | 4G~top: high memory 2) Other normal system: crashkernel=3Dsize crashkernel=3Dsize,high 0~4G: low memory | 4G~top: high memory 3) Systems w/o zone DMA|DMA32 crashkernel=3Dsize crashkernel=3Dsize,high 0~top: low memory Signed-off-by: Baoquan He arm64: kdump: fix warning reported by static checker Signed-off-by: Baoquan He --- arch/arm64/mm/init.c | 44 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 34 insertions(+), 10 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 66e70ca47680..c28c2c8483cc 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -69,6 +69,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit; =20 #define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit #define CRASH_ADDR_HIGH_MAX (PHYS_MASK + 1) +#define CRASH_HIGH_SEARCH_BASE SZ_4G =20 #define DEFAULT_CRASH_KERNEL_LOW_SIZE (128UL << 20) =20 @@ -101,12 +102,13 @@ static int __init reserve_crashkernel_low(unsigned lo= ng long low_size) */ static void __init reserve_crashkernel(void) { - unsigned long long crash_base, crash_size; - unsigned long long crash_low_size =3D 0; + unsigned long long crash_low_size =3D 0, search_base =3D 0; unsigned long long crash_max =3D CRASH_ADDR_LOW_MAX; + unsigned long long crash_base, crash_size; char *cmdline =3D boot_command_line; - int ret; bool fixed_base =3D false; + bool high =3D false; + int ret; =20 if (!IS_ENABLED(CONFIG_KEXEC_CORE)) return; @@ -129,7 +131,9 @@ static void __init reserve_crashkernel(void) else if (ret) return; =20 + search_base =3D CRASH_HIGH_SEARCH_BASE; crash_max =3D CRASH_ADDR_HIGH_MAX; + high =3D true; } else if (ret || !crash_size) { /* The specified value is invalid */ return; @@ -140,31 +144,51 @@ static void __init reserve_crashkernel(void) /* User specifies base address explicitly. */ if (crash_base) { fixed_base =3D true; + search_base =3D crash_base; crash_max =3D crash_base + crash_size; } =20 retry: crash_base =3D memblock_phys_alloc_range(crash_size, CRASH_ALIGN, - crash_base, crash_max); + search_base, crash_max); if (!crash_base) { /* - * If the first attempt was for low memory, fall back to - * high memory, the minimum required low memory will be - * reserved later. + * For crashkernel=3Dsize[KMG]@offset[KMG], print out failure + * message if can't reserve the specified region. */ - if (!fixed_base && (crash_max =3D=3D CRASH_ADDR_LOW_MAX)) { + if (fixed_base) { + pr_warn("crashkernel reservation failed - memory is in use.\n"); + return; + } + + /* + * For crashkernel=3Dsize[KMG], if the first attempt was for + * low memory, fall back to high memory, the minimum required + * low memory will be reserved later. + */ + if (!high && crash_max =3D=3D CRASH_ADDR_LOW_MAX) { crash_max =3D CRASH_ADDR_HIGH_MAX; + search_base =3D CRASH_ADDR_LOW_MAX; crash_low_size =3D DEFAULT_CRASH_KERNEL_LOW_SIZE; goto retry; } =20 + /* + * For crashkernel=3Dsize[KMG],high, if the first attempt was + * for high memory, fall back to low memory. + */ + if (high && crash_max =3D=3D CRASH_ADDR_HIGH_MAX) { + crash_max =3D CRASH_ADDR_LOW_MAX; + search_base =3D 0; + goto retry; + } pr_warn("cannot allocate crashkernel (size:0x%llx)\n", crash_size); return; } =20 - if ((crash_base > CRASH_ADDR_LOW_MAX - crash_low_size) && - crash_low_size && reserve_crashkernel_low(crash_low_size)) { + if ((crash_base >=3D CRASH_ADDR_LOW_MAX) && crash_low_size && + reserve_crashkernel_low(crash_low_size)) { memblock_phys_free(crash_base, crash_size); return; } --=20 2.34.1 From nobody Tue Sep 9 16:53:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C06F8C77B7D for ; Mon, 15 May 2023 06:08:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239485AbjEOGId (ORCPT ); Mon, 15 May 2023 02:08:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238961AbjEOGIF (ORCPT ); Mon, 15 May 2023 02:08:05 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CEF02D7D for ; Sun, 14 May 2023 23:03:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684130600; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=q/HTD3f0P0/3Vfg2IW/92bJMtlhcGD951h+Pgg59OXQ=; b=IdwlBG1acQm5kWI8L8sOOvto8y85toc0IDV/l5SzAbeYnQG/JdtxLcDhBJgTcfDu2bi6ZC rjwSfxZtPVEOee3xLoQsSRsFfrNExJ5xVOpITHXZ0KQXXJoAaJpTSpF8x3Frv4dRQ/4PRF naX2ZuQFCooKZwM+7Ir5ft0wz2oN16Y= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-631-U_SSZYFTOVq2JwICkBmuDg-1; Mon, 15 May 2023 02:03:17 -0400 X-MC-Unique: U_SSZYFTOVq2JwICkBmuDg-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B1F72101A552; Mon, 15 May 2023 06:03:16 +0000 (UTC) Received: from MiWiFi-R3L-srv.redhat.com (ovpn-12-32.pek2.redhat.com [10.72.12.32]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4E56140B947; Mon, 15 May 2023 06:03:11 +0000 (UTC) From: Baoquan He To: linux-kernel@vger.kernel.org Cc: catalin.marinas@arm.com, will@kernel.org, horms@kernel.org, thunder.leizhen@huawei.com, John.p.donnelly@oracle.com, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, Baoquan He Subject: [PATCH v6 2/2] Documentation: add kdump.rst to present crashkernel reservation on arm64 Date: Mon, 15 May 2023 14:02:59 +0800 Message-Id: <20230515060259.830662-3-bhe@redhat.com> In-Reply-To: <20230515060259.830662-1-bhe@redhat.com> References: <20230515060259.830662-1-bhe@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" People complained the crashkernel reservation code flow is hard to follow, so add this document to explain the background, concepts and implementation of crashkernel reservation on arm64. Hope this can help people to understand it more easily. Signed-off-by: Baoquan He Reviewed-by: Zhen Lei --- Documentation/arm64/kdump.rst | 103 ++++++++++++++++++++++++++++++++++ 1 file changed, 103 insertions(+) create mode 100644 Documentation/arm64/kdump.rst diff --git a/Documentation/arm64/kdump.rst b/Documentation/arm64/kdump.rst new file mode 100644 index 000000000000..78b22017c490 --- /dev/null +++ b/Documentation/arm64/kdump.rst @@ -0,0 +1,103 @@ +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +crashkernel memory reservation on arm64 +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Author: Baoquan He + +Kdump mechanism is utilized to capture corrupted kernel's vmcore so +that people can analyze it to get the root cause of corruption. In +order to do that, a preliminarily reserved memory is needed to load +in kdump kernel, and switch to kdump kernel to boot up and run if +corruption happened. + +That reserved memory for kdump is adapted to be able to minimally +accommodate kdump kernel to boot and run, and user space programs +running to do the vmcore collecting. + +Kernel parameter +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Through kernel parameter like below, memory can be reserved +accordingly during early stage of 1st kernel's bootup so that +continuous large chunk of memomy can be found and reserved. Meanwhile, +the need of low memory need be considered if crashkernel is reserved +in high memory area. + +- crashkernel=3Dsize@offset +- crashkernel=3Dsize +- crashkernel=3Dsize,high crashkernel=3Dsize,low + +Low memory and high memory +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +What is low memory and high memory? In kdump reservation, low memory +means the memory area under a specific limitation, and it's usually +decided by the lowest addressing bits of PCI devices which kdump kernel +need rely on to boot up and collect vmcore successfully. Those devices +not related to vmcore dumping can be ignored, e.g on x86, those i2c may +only be able to access 24bits addressing area, but kdump kernel still +take 4G as the limitation because all known devices that kdump kernel +cares about have 32bits addressing ability. On arm64, the low memory +upper boundary is not fixed, it's 1G on RPi4 platform, while 4G on normal +arm64 system. On the special system with CONFIG_ZONE_DMA|DMA32 disabled, +the whole system RAM is low memory. Except of low memory, all the rest +of system RAM is high memory which kernel and user space programs can +require to allocate and use. + +Implementation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +1)crashkernel=3Dsize@offset +------------------------- +crashkernel memory must be reserved at the user specified region, otherwise +fail if already occupied. + + +2) crashkernel=3Dsize +------------------- +crashkernel memory region will be reserved in any available position +according to searching order. + +Firstly, it searches the low memory area for an available region with spec= ified +size. + +Secondly, if searching low memory failed, fallback to search the high memo= ry +area with the specified size. Meanwhile, if the reservation in high memory +succeeds, a default reservation in low memory will be done, the current de= fault +value is 128M which is satisfying the low memory needs, e.g pci device dri= ver +initialization. + +If both the above searching failed, the reservation will fail finally. + +Note: crashkernel=3Dsize is recommended option among crashkernel kernel +parameters. With it, user doesn't need to know much about system memory +information, just need to specify whatever memory kdump kernel needs to +make vmcore dumping succeed. + +3) crashkernel=3Dsize,high crashkernel=3Dsize,low +-------------------------------------------- +crashkernel=3Dsize,high is an important supplement to crashkernel=3Dsize. = It +allows user to precisely specify how much memory need be allocated from +high memory, and how much memory is needed from low memory. On system +with large memory, low memory is small and precious since some kernel +feature and many devices can only request memory from the area, while +requiring a large chunk of continuous memory from high memory area doesn't +matter much and can satisfy most of kernel and almost all user space +programs' requirement. In such case, only a small part of necessary memory +from low memory area can satisfy needs. With it, the 1st kernel's normal +running won't be impacted because of limited low memory resource. + +To reserve memory for crashkernel=3Dsize,high, firstly, searching is tried= in +high memory region. If reservation succeeds, low memory reservaton will be +done subsequently. + +Secondly, if reservation in high memory failed, fallback to search the +low memory with the specified size in crsahkernel=3D,high. If succeeds, +everything is fine since no low memory is needed. + +Notes: +- If crashkernel=3D,low is not specified, the default low memory reservati= on + will be done automically. + +- if crashkernel=3D0,low is specified, means that low memory reservation is + ommited intentionally. + +3) + --=20 2.34.1