[v2] kdump: crashkernel reservation from CMA

[PATCH v2 4/5] kdump: wait for DMA to finish when using CMA

Posted by Jiri Bohac 11 months, 3 weeks ago

When re-using the CMA area for kdump there is a risk of pending DMA into
pinned user pages in the CMA area.

Pages that are pinned long-term are migrated away from CMA, so these are not a
concern. Pages pinned without FOLL_LONGTERM remain in the CMA and may possibly
be the source or destination of a pending DMA transfer.

Although there is no clear specification how long a page may be pinned without
FOLL_LONGTERM, pinning without the flag shows an intent of the caller to
only use the memory for short-lived DMA transfers, not a transfer initiated
by a device asynchronously at a random time in the future.

Add a delay of CMA_DMA_TIMEOUT_MSEC milliseconds before starting the kdump
kernel, giving such short-lived DMA transfers time to finish before the CMA
memory is re-used by the kdump kernel.

Set CMA_DMA_TIMEOUT_MSEC to 1000 (one second) - chosen arbitrarily as both a
huge margin for a DMA transfer, yet not increasing the kdump time
significantly.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
---
 include/linux/crash_core.h |  5 +++++
 kernel/crash_core.c        | 10 ++++++++++
 2 files changed, 15 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 44305336314e..543e4a71f13c 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -56,6 +56,11 @@ static inline unsigned int crash_get_elfcorehdr_size(void) { return 0; }
 /* Alignment required for elf header segment */
 #define ELF_CORE_HEADER_ALIGN   4096
 
+/* Time to wait for possible DMA to finish before starting the kdump kernel
+ * when a CMA reservation is used
+ */
+#define CMA_DMA_TIMEOUT_MSEC 1000
+
 extern int crash_exclude_mem_range(struct crash_mem *mem,
 				   unsigned long long mstart,
 				   unsigned long long mend);
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 078fe5bc5a74..543e509b7926 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -21,6 +21,7 @@
 #include <linux/reboot.h>
 #include <linux/btf.h>
 #include <linux/objtool.h>
+#include <linux/delay.h>
 
 #include <asm/page.h>
 #include <asm/sections.h>
@@ -97,6 +98,14 @@ int kexec_crash_loaded(void)
 }
 EXPORT_SYMBOL_GPL(kexec_crash_loaded);
 
+static void crash_cma_clear_pending_dma(void)
+{
+	if (!crashk_cma_cnt)
+		return;
+
+	mdelay(CMA_DMA_TIMEOUT_MSEC);
+}
+
 /*
  * No panic_cpu check version of crash_kexec().  This function is called
  * only when panic_cpu holds the current CPU number; this is the only CPU
@@ -116,6 +125,7 @@ void __noclone __crash_kexec(struct pt_regs *regs)
 		if (kexec_crash_image) {
 			struct pt_regs fixed_regs;
 
+			crash_cma_clear_pending_dma();
 			crash_setup_regs(&fixed_regs, regs);
 			crash_save_vmcoreinfo();
 			machine_crash_shutdown(&fixed_regs);

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia

Re: [PATCH v2 4/5] kdump: wait for DMA to finish when using CMA

Posted by Baoquan He 11 months, 1 week ago

On 02/20/25 at 05:55pm, Jiri Bohac wrote:
> When re-using the CMA area for kdump there is a risk of pending DMA into
> pinned user pages in the CMA area.
> 
> Pages that are pinned long-term are migrated away from CMA, so these are not a
> concern. Pages pinned without FOLL_LONGTERM remain in the CMA and may possibly
> be the source or destination of a pending DMA transfer.
> 
> Although there is no clear specification how long a page may be pinned without
> FOLL_LONGTERM, pinning without the flag shows an intent of the caller to
> only use the memory for short-lived DMA transfers, not a transfer initiated
> by a device asynchronously at a random time in the future.
> 
> Add a delay of CMA_DMA_TIMEOUT_MSEC milliseconds before starting the kdump
> kernel, giving such short-lived DMA transfers time to finish before the CMA
> memory is re-used by the kdump kernel.
> 
> Set CMA_DMA_TIMEOUT_MSEC to 1000 (one second) - chosen arbitrarily as both a
> huge margin for a DMA transfer, yet not increasing the kdump time
> significantly.
> 
> Signed-off-by: Jiri Bohac <jbohac@suse.cz>
> ---
>  include/linux/crash_core.h |  5 +++++
>  kernel/crash_core.c        | 10 ++++++++++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index 44305336314e..543e4a71f13c 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -56,6 +56,11 @@ static inline unsigned int crash_get_elfcorehdr_size(void) { return 0; }
>  /* Alignment required for elf header segment */
>  #define ELF_CORE_HEADER_ALIGN   4096
>  
> +/* Time to wait for possible DMA to finish before starting the kdump kernel
> + * when a CMA reservation is used
> + */
> +#define CMA_DMA_TIMEOUT_MSEC 1000
> +
>  extern int crash_exclude_mem_range(struct crash_mem *mem,
>  				   unsigned long long mstart,
>  				   unsigned long long mend);
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 078fe5bc5a74..543e509b7926 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -21,6 +21,7 @@
>  #include <linux/reboot.h>
>  #include <linux/btf.h>
>  #include <linux/objtool.h>
> +#include <linux/delay.h>
>  
>  #include <asm/page.h>
>  #include <asm/sections.h>
> @@ -97,6 +98,14 @@ int kexec_crash_loaded(void)
>  }
>  EXPORT_SYMBOL_GPL(kexec_crash_loaded);
>  
> +static void crash_cma_clear_pending_dma(void)
> +{
> +	if (!crashk_cma_cnt)
> +		return;
> +
> +	mdelay(CMA_DMA_TIMEOUT_MSEC);
> +}
> +
>  /*
>   * No panic_cpu check version of crash_kexec().  This function is called
>   * only when panic_cpu holds the current CPU number; this is the only CPU
> @@ -116,6 +125,7 @@ void __noclone __crash_kexec(struct pt_regs *regs)
>  		if (kexec_crash_image) {
>  			struct pt_regs fixed_regs;
>  
> +			crash_cma_clear_pending_dma();

This could be too ideal, I am not sure if it's a good way. When crash
triggered, we need do the urgent and necessary thing as soon as
possible, then shutdown all CPU to avoid further damage. This one second
of waiting could give the strayed system too much time. My personal
opinion.

>  			crash_setup_regs(&fixed_regs, regs);
>  			crash_save_vmcoreinfo();
>  			machine_crash_shutdown(&fixed_regs);
> 
> -- 
> Jiri Bohac <jbohac@suse.cz>
> SUSE Labs, Prague, Czechia
>

Re: [PATCH v2 4/5] kdump: wait for DMA to finish when using CMA

Posted by Jiri Bohac 11 months ago

On Mon, Mar 03, 2025 at 10:02:38AM +0800, Baoquan He wrote:
> On 02/20/25 at 05:55pm, Jiri Bohac wrote:
> > +static void crash_cma_clear_pending_dma(void)
> > +{
> > +	if (!crashk_cma_cnt)
> > +		return;
> > +
> > +	mdelay(CMA_DMA_TIMEOUT_MSEC);
> > +}
> > +
> >  /*
> >   * No panic_cpu check version of crash_kexec().  This function is called
> >   * only when panic_cpu holds the current CPU number; this is the only CPU
> > @@ -116,6 +125,7 @@ void __noclone __crash_kexec(struct pt_regs *regs)
> >  		if (kexec_crash_image) {
> >  			struct pt_regs fixed_regs;
> >  
> > +			crash_cma_clear_pending_dma();
> 
> This could be too ideal, I am not sure if it's a good way. When crash
> triggered, we need do the urgent and necessary thing as soon as
> possible, then shutdown all CPU to avoid further damage. This one second
> of waiting could give the strayed system too much time. My personal
> opinion.

Good point! I think it makes sense to move the call to crash_cma_clear_pending_dma()
past the call of machine_crash_shutdown where all the shutdown
happens, like this:

> >  			crash_setup_regs(&fixed_regs, regs);
> >  			crash_save_vmcoreinfo();
> >  			machine_crash_shutdown(&fixed_regs);

+			crash_cma_clear_pending_dma();

I'll post a v3 with this change included.

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia

[PATCH v2 1/5] Add a new optional ",cma" suffix to the crashkernel= command line option
[PATCH v2 2/5] kdump: implement reserve_crashkernel_cma
[PATCH v2 3/5] kdump, documentation: describe craskernel CMA reservation
[PATCH v2 4/5] kdump: wait for DMA to finish when using CMA
[PATCH v2 5/5] x86: implement crashkernel cma reservation