When re-using the CMA area for kdump there is a risk of pending DMA
into pinned user pages in the CMA area.
Pages residing in CMA areas can usually not get long-term pinned and
are instead migrated away from the CMA area, so long-term pinning is
typically not a concern. (BUGs in the kernel might still lead to
long-term pinning of such pages if everything goes wrong.)
Pages pinned without FOLL_LONGTERM remain in the CMA and may possibly
be the source or destination of a pending DMA transfer.
Although there is no clear specification how long a page may be pinned
without FOLL_LONGTERM, pinning without the flag shows an intent of the
caller to only use the memory for short-lived DMA transfers, not a transfer
initiated by a device asynchronously at a random time in the future.
Add a delay of CMA_DMA_TIMEOUT_SEC seconds before starting the kdump
kernel, giving such short-lived DMA transfers time to finish before
the CMA memory is re-used by the kdump kernel.
Set CMA_DMA_TIMEOUT_SEC to 10 seconds - chosen arbitrarily as both
a huge margin for a DMA transfer, yet not increasing the kdump time
too significantly.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
---
Changes since v4:
- reworded the paragraph about long-term pinning
- simplified crash_cma_clear_pending_dma()
- dropped cma_dma_timeout_sec variable
---
Changes since v3:
- renamed CMA_DMA_TIMEOUT_SEC to CMA_DMA_TIMEOUT_MSEC, change delay to 10 seconds
- introduce a cma_dma_timeout_sec initialized to CMA_DMA_TIMEOUT_SEC
to make the timeout trivially tunable if needed in the future
---
kernel/crash_core.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 335b8425dd4b..a4ef79591eb2 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -21,6 +21,7 @@
#include <linux/reboot.h>
#include <linux/btf.h>
#include <linux/objtool.h>
+#include <linux/delay.h>
#include <asm/page.h>
#include <asm/sections.h>
@@ -33,6 +34,11 @@
/* Per cpu memory for storing cpu states in case of system crash. */
note_buf_t __percpu *crash_notes;
+/* time to wait for possible DMA to finish before starting the kdump kernel
+ * when a CMA reservation is used
+ */
+#define CMA_DMA_TIMEOUT_SEC 10
+
#ifdef CONFIG_CRASH_DUMP
int kimage_crash_copy_vmcoreinfo(struct kimage *image)
@@ -97,6 +103,14 @@ int kexec_crash_loaded(void)
}
EXPORT_SYMBOL_GPL(kexec_crash_loaded);
+static void crash_cma_clear_pending_dma(void)
+{
+ if (!crashk_cma_cnt)
+ return;
+
+ mdelay(CMA_DMA_TIMEOUT_SEC * 1000);
+}
+
/*
* No panic_cpu check version of crash_kexec(). This function is called
* only when panic_cpu holds the current CPU number; this is the only CPU
@@ -119,6 +133,7 @@ void __noclone __crash_kexec(struct pt_regs *regs)
crash_setup_regs(&fixed_regs, regs);
crash_save_vmcoreinfo();
machine_crash_shutdown(&fixed_regs);
+ crash_cma_clear_pending_dma();
machine_kexec(kexec_crash_image);
}
kexec_unlock();
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia
On Thu, 12 Jun 2025 12:18:40 +0200 Jiri Bohac <jbohac@suse.cz> wrote: > When re-using the CMA area for kdump there is a risk of pending DMA > into pinned user pages in the CMA area. > > Pages residing in CMA areas can usually not get long-term pinned and > are instead migrated away from the CMA area, so long-term pinning is > typically not a concern. (BUGs in the kernel might still lead to > long-term pinning of such pages if everything goes wrong.) > > Pages pinned without FOLL_LONGTERM remain in the CMA and may possibly > be the source or destination of a pending DMA transfer. > > Although there is no clear specification how long a page may be pinned > without FOLL_LONGTERM, pinning without the flag shows an intent of the > caller to only use the memory for short-lived DMA transfers, not a transfer > initiated by a device asynchronously at a random time in the future. > > Add a delay of CMA_DMA_TIMEOUT_SEC seconds before starting the kdump > kernel, giving such short-lived DMA transfers time to finish before > the CMA memory is re-used by the kdump kernel. > > Set CMA_DMA_TIMEOUT_SEC to 10 seconds - chosen arbitrarily as both > a huge margin for a DMA transfer, yet not increasing the kdump time > too significantly. Oh. 10s sounds a lot. How long does this process typically take? It's sad to add a 10s delay for something which some systems will never do. I wonder if there's some simple hack we can add. Like having a global flag which gets set the first time someone pins a CMA page for DMA and, if that flag is later found to be unset, skip the delay? Or something else along these lines?
On 13.06.25 01:47, Andrew Morton wrote: > On Thu, 12 Jun 2025 12:18:40 +0200 Jiri Bohac <jbohac@suse.cz> wrote: > >> When re-using the CMA area for kdump there is a risk of pending DMA >> into pinned user pages in the CMA area. >> >> Pages residing in CMA areas can usually not get long-term pinned and >> are instead migrated away from the CMA area, so long-term pinning is >> typically not a concern. (BUGs in the kernel might still lead to >> long-term pinning of such pages if everything goes wrong.) >> >> Pages pinned without FOLL_LONGTERM remain in the CMA and may possibly >> be the source or destination of a pending DMA transfer. >> >> Although there is no clear specification how long a page may be pinned >> without FOLL_LONGTERM, pinning without the flag shows an intent of the >> caller to only use the memory for short-lived DMA transfers, not a transfer >> initiated by a device asynchronously at a random time in the future. >> >> Add a delay of CMA_DMA_TIMEOUT_SEC seconds before starting the kdump >> kernel, giving such short-lived DMA transfers time to finish before >> the CMA memory is re-used by the kdump kernel. >> >> Set CMA_DMA_TIMEOUT_SEC to 10 seconds - chosen arbitrarily as both >> a huge margin for a DMA transfer, yet not increasing the kdump time >> too significantly. > > Oh. 10s sounds a lot. How long does this process typically take? > > It's sad to add a 10s delay for something which some systems will never > do. I wonder if there's some simple hack we can add. Like having a > global flag which gets set the first time someone pins a CMA page We would likely have to do that for any GUP on such a page (FOLL_GET | FOLL_PIN), both from gup-fast and gup-slow. Should work, but IMHO can be optimized later, on top of this series. -- Cheers, David / dhildenb
On Fri, Jun 13, 2025 at 11:19:11AM +0200, David Hildenbrand wrote: > > It's sad to add a 10s delay for something which some systems will never > > do. I wonder if there's some simple hack we can add. Like having a > > global flag which gets set the first time someone pins a CMA page > > We would likely have to do that for any GUP on such a page (FOLL_GET | > FOLL_PIN), both from gup-fast and gup-slow. > > Should work, but IMHO can be optimized later, on top of this series. The 10 s was David's suggestion during the discussion of v2 of this patchset. We already had a discussion about both the length of the delay and whether to make it configurable [1] We agreed it was best to start with a longer fixed delay to be on the safe side. If the CMA reservation becomes popular and anybody complains about the delay, then we can trivially make this configurable or think of other improvements. [1] https://lore.kernel.org/lkml/a1a5af90-bc8a-448a-81fa-485624d592f3@redhat.com/ -- Jiri Bohac <jbohac@suse.cz> SUSE Labs, Prague, Czechia
On 06/13/25 at 11:19am, David Hildenbrand wrote: > On 13.06.25 01:47, Andrew Morton wrote: > > On Thu, 12 Jun 2025 12:18:40 +0200 Jiri Bohac <jbohac@suse.cz> wrote: > > > > > When re-using the CMA area for kdump there is a risk of pending DMA > > > into pinned user pages in the CMA area. > > > > > > Pages residing in CMA areas can usually not get long-term pinned and > > > are instead migrated away from the CMA area, so long-term pinning is > > > typically not a concern. (BUGs in the kernel might still lead to > > > long-term pinning of such pages if everything goes wrong.) > > > > > > Pages pinned without FOLL_LONGTERM remain in the CMA and may possibly > > > be the source or destination of a pending DMA transfer. > > > > > > Although there is no clear specification how long a page may be pinned > > > without FOLL_LONGTERM, pinning without the flag shows an intent of the > > > caller to only use the memory for short-lived DMA transfers, not a transfer > > > initiated by a device asynchronously at a random time in the future. > > > > > > Add a delay of CMA_DMA_TIMEOUT_SEC seconds before starting the kdump > > > kernel, giving such short-lived DMA transfers time to finish before > > > the CMA memory is re-used by the kdump kernel. > > > > > > Set CMA_DMA_TIMEOUT_SEC to 10 seconds - chosen arbitrarily as both > > > a huge margin for a DMA transfer, yet not increasing the kdump time > > > too significantly. > > > > Oh. 10s sounds a lot. How long does this process typically take? > > > > It's sad to add a 10s delay for something which some systems will never > > do. I wonder if there's some simple hack we can add. Like having a > > global flag which gets set the first time someone pins a CMA page I have the same worry as Andrew. One system run off rails, we don't try to slam the brake, but wait 10 seconds instead to do that. Lucky we have noticed people the risk. > > We would likely have to do that for any GUP on such a page (FOLL_GET | > FOLL_PIN), both from gup-fast and gup-slow. There could be such GUP page, not always? This feature is an opt-in for users, they can decide or tune the waiting time too? My personal opinion. I will not suggest people to use it in RHEL, while other people feel free to try it as the risk has been warned. > > Should work, but IMHO can be optimized later, on top of this series. > > -- > Cheers, > > David / dhildenb >
© 2016 - 2025 Red Hat, Inc.