[v2] kdump: crashkernel reservation from CMA

[PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Jiri Bohac 11 months, 3 weeks ago

Hi,

this series implements a way to reserve additional crash kernel
memory using CMA.

Link to the v1 discussion:
https://lore.kernel.org/lkml/ZWD_fAPqEWkFlEkM@dwarf.suse.cz/
See below for the changes since v1 and how concerns from the 
discussion have been addressed.

Currently, all the memory for the crash kernel is not usable by
the 1st (production) kernel. It is also unmapped so that it can't
be corrupted by the fault that will eventually trigger the crash.
This makes sense for the memory actually used by the kexec-loaded
crash kernel image and initrd and the data prepared during the
load (vmcoreinfo, ...). However, the reserved space needs to be
much larger than that to provide enough run-time memory for the
crash kernel and the kdump userspace. Estimating the amount of
memory to reserve is difficult. Being too careful makes kdump
likely to end in OOM, being too generous takes even more memory
from the production system. Also, the reservation only allows
reserving a single contiguous block (or two with the "low"
suffix). I've seen systems where this fails because the physical
memory is fragmented.

By reserving additional crashkernel memory from CMA, the main
crashkernel reservation can be just large enough to fit the 
kernel and initrd image, minimizing the memory taken away from
the production system. Most of the run-time memory for the crash
kernel will be memory previously available to userspace in the
production system. As this memory is no longer wasted, the
reservation can be done with a generous margin, making kdump more
reliable. Kernel memory that we need to preserve for dumping is 
never allocated from CMA. User data is typically not dumped by 
makedumpfile. When dumping of user data is intended this new CMA 
reservation cannot be used.

There are five patches in this series:

The first adds a new ",cma" suffix to the recenly introduced generic
crashkernel parsing code. parse_crashkernel() takes one more
argument to store the cma reservation size.

The second patch implements reserve_crashkernel_cma() which
performs the reservation. If the requested size is not available
in a single range, multiple smaller ranges will be reserved.

The third patch updates Documentation/, explicitly mentioning the
potential DMA corruption of the CMA-reserved memory.

The fourth patch adds a short delay before booting the kdump
kernel, allowing pending DMA transfers to finish.

The fifth patch enables the functionality for x86 as a proof of
concept. There are just three things every arch needs to do:
- call reserve_crashkernel_cma()
- include the CMA-reserved ranges in the physical memory map
- exclude the CMA-reserved ranges from the memory available
  through /proc/vmcore by excluding them from the vmcoreinfo
  PT_LOAD ranges.

Adding other architectures is easy and I can do that as soon as
this series is merged.

With this series applied, specifying
	crashkernel=100M craskhernel=1G,cma
on the command line will make a standard crashkernel reservation
of 100M, where kexec will load the kernel and initrd.

An additional 1G will be reserved from CMA, still usable by the
production system. The crash kernel will have 1.1G memory
available. The 100M can be reliably predicted based on the size
of the kernel and initrd.

The new cma suffix is completely optional. When no
crashkernel=size,cma is specified, everything works as before.

---
Changes since v1:

The key concern raised in the v1 discussion was that pages in the
CMA region may be pinned and used for a DMA transfer, potentially
corrupting the new kernel's memory. When the cma suffix is used, kdump
may be less reliable and the corruption hard to debug

This v2 series addresses this concern in two ways:

1) Clearly stating the potential problem in the updated
Documentation and setting the expectation (patch 3/5)

Documentation now explicitly states that:
- the risk of kdump failure is increased
- the CMA reservation is intended for users who can not or don't
  want to sacrifice enough memory for a standard crashkernel reservation
  and who prefer less reliable kdump to no kdump at all

This is consistent with the documentation of the
crash_kexec_post_notifiers option, which can also increase the
risk of kdump failure, yet may be the only way to use kdump on
some systems. And just like the crash_kexec_post_notifiers
option, the cma crashkernel suffix is completely optional:
the series has zero effect when the suffix is not used.

2) Giving DMA time to finish before booting the kdump kernel
   (patch 4/5)

Pages can be pinned for long term use using the FOLL_LONGTERM
flag. Then they are migrated outside the CMA region. Pinning
without this flag shows that the intent of their user is to only
use them for short-lived DMA transfers. 

Delay the boot of the kdump kernel when the CMA reservation is
used, giving potential pending DMA transfers time to finish.

Other minor changes since v1:
- updated for 6.14-rc2
- moved #ifdefs and #defines to header files
- added __always_unused in parse_crashkernel() to silence a false
  unused variable warning
 

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Baoquan He 11 months, 1 week ago

On 02/20/25 at 05:48pm, Jiri Bohac wrote:
......snip... 
> ---
> Changes since v1:
> 
> The key concern raised in the v1 discussion was that pages in the
> CMA region may be pinned and used for a DMA transfer, potentially
> corrupting the new kernel's memory. When the cma suffix is used, kdump
> may be less reliable and the corruption hard to debug
> 
> This v2 series addresses this concern in two ways:
> 
> 1) Clearly stating the potential problem in the updated
> Documentation and setting the expectation (patch 3/5)
> 
> Documentation now explicitly states that:
> - the risk of kdump failure is increased
> - the CMA reservation is intended for users who can not or don't
>   want to sacrifice enough memory for a standard crashkernel reservation
>   and who prefer less reliable kdump to no kdump at all
> 
> This is consistent with the documentation of the
> crash_kexec_post_notifiers option, which can also increase the
> risk of kdump failure, yet may be the only way to use kdump on
> some systems. And just like the crash_kexec_post_notifiers
> option, the cma crashkernel suffix is completely optional:
> the series has zero effect when the suffix is not used.

Thanks for the effort to investigate and add clear note about the
potential risk in document. Except of the 1 second waiting for short
term pined page for DMA, the whole series looks good to me. Hope other
people can also give comment to evaluate the risk of waiting, I will
wait another week to add my personal ACK.

Thanks
Baoquan

> 
> 2) Giving DMA time to finish before booting the kdump kernel
>    (patch 4/5)
> 
> Pages can be pinned for long term use using the FOLL_LONGTERM
> flag. Then they are migrated outside the CMA region. Pinning
> without this flag shows that the intent of their user is to only
> use them for short-lived DMA transfers. 
> 
> Delay the boot of the kdump kernel when the CMA reservation is
> used, giving potential pending DMA transfers time to finish.
> 
> Other minor changes since v1:
> - updated for 6.14-rc2
> - moved #ifdefs and #defines to header files
> - added __always_unused in parse_crashkernel() to silence a false
>   unused variable warning
>  
> 
> -- 
> Jiri Bohac <jbohac@suse.cz>
> SUSE Labs, Prague, Czechia
> 
>

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by David Hildenbrand 11 months, 1 week ago

On 20.02.25 17:48, Jiri Bohac wrote:
> Hi,
> 
> this series implements a way to reserve additional crash kernel
> memory using CMA.
> 
> Link to the v1 discussion:
> https://lore.kernel.org/lkml/ZWD_fAPqEWkFlEkM@dwarf.suse.cz/
> See below for the changes since v1 and how concerns from the
> discussion have been addressed.
> 
> Currently, all the memory for the crash kernel is not usable by
> the 1st (production) kernel. It is also unmapped so that it can't
> be corrupted by the fault that will eventually trigger the crash.
> This makes sense for the memory actually used by the kexec-loaded
> crash kernel image and initrd and the data prepared during the
> load (vmcoreinfo, ...). However, the reserved space needs to be
> much larger than that to provide enough run-time memory for the
> crash kernel and the kdump userspace. Estimating the amount of
> memory to reserve is difficult. Being too careful makes kdump
> likely to end in OOM, being too generous takes even more memory
> from the production system. Also, the reservation only allows
> reserving a single contiguous block (or two with the "low"
> suffix). I've seen systems where this fails because the physical
> memory is fragmented.
> 
> By reserving additional crashkernel memory from CMA, the main
> crashkernel reservation can be just large enough to fit the
> kernel and initrd image, minimizing the memory taken away from
> the production system. Most of the run-time memory for the crash
> kernel will be memory previously available to userspace in the
> production system. As this memory is no longer wasted, the
> reservation can be done with a generous margin, making kdump more
> reliable. Kernel memory that we need to preserve for dumping is
> never allocated from CMA. User data is typically not dumped by
> makedumpfile. When dumping of user data is intended this new CMA
> reservation cannot be used.

Hi,

I'll note that your comment about "user space" is currently the case, 
but will likely not hold in the long run. The assumption you are making 
is that only user-space memory will be allocated from MIGRATE_CMA, which 
is not necessarily the case. Any movable allocation will end up in there.

Besides LRU folios (user space memory and the pagecache), we already 
support migration of some kernel allocations using the non-lru migration 
framework. Such allocations (which use __GFP_MOVABLE, see 
__SetPageMovable()) currently only include
* memory balloon: pages we never want to dump either way
* zsmalloc (->zpool): only used by zswap (-> compressed LRU pages)
* z3fold (->zpool): only used by zswap (-> compressed LRU pages)

Just imagine if we support migration of other kernel allocations, such 
as user page tables. The dump would be missing important information.

Once that happens, it will become a lot harder to judge whether CMA can 
be used or not. At least, the kernel could bail out/warn for these 
kernel configs.

> 
> There are five patches in this series:
> 
> The first adds a new ",cma" suffix to the recenly introduced generic
> crashkernel parsing code. parse_crashkernel() takes one more
> argument to store the cma reservation size.
> 
> The second patch implements reserve_crashkernel_cma() which
> performs the reservation. If the requested size is not available
> in a single range, multiple smaller ranges will be reserved.
> 
> The third patch updates Documentation/, explicitly mentioning the
> potential DMA corruption of the CMA-reserved memory.
> 
> The fourth patch adds a short delay before booting the kdump
> kernel, allowing pending DMA transfers to finish.

What does "short" mean? At least in theory, long-term pinning is 
forbidden for MIGRATE_CMA, so we should not have such pages mapped into 
an iommu where DMA can happily keep going on for quite a while.

But that assumes that our old kernel is not buggy, and doesn't end up 
mapping these pages into an IOMMU where DMA will just continue. I recall 
that DRM might currently be a problem, described here [1].

If kdump starts not working as expected in case our old kernel is buggy, 
doesn't that partially destroy the purpose of kdump (-> debug bugs in 
the old kernel)?

[1] https://lore.kernel.org/all/Z6MV_Y9WRdlBYeRs@phenom.ffwll.local/T/#u

-- 
Cheers,

David / dhildenb

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Jiri Bohac 11 months ago

On Mon, Mar 03, 2025 at 09:25:30AM +0100, David Hildenbrand wrote:
> On 20.02.25 17:48, Jiri Bohac wrote:
> > 
> > By reserving additional crashkernel memory from CMA, the main
> > crashkernel reservation can be just large enough to fit the
> > kernel and initrd image, minimizing the memory taken away from
> > the production system. Most of the run-time memory for the crash
> > kernel will be memory previously available to userspace in the
> > production system. As this memory is no longer wasted, the
> > reservation can be done with a generous margin, making kdump more
> > reliable. Kernel memory that we need to preserve for dumping is
> > never allocated from CMA. User data is typically not dumped by
> > makedumpfile. When dumping of user data is intended this new CMA
> > reservation cannot be used.
> 
> I'll note that your comment about "user space" is currently the case, but
> will likely not hold in the long run. The assumption you are making is that
> only user-space memory will be allocated from MIGRATE_CMA, which is not
> necessarily the case. Any movable allocation will end up in there.
> 
> Besides LRU folios (user space memory and the pagecache), we already support
> migration of some kernel allocations using the non-lru migration framework.
> Such allocations (which use __GFP_MOVABLE, see __SetPageMovable()) currently
> only include
> * memory balloon: pages we never want to dump either way
> * zsmalloc (->zpool): only used by zswap (-> compressed LRU pages)
> * z3fold (->zpool): only used by zswap (-> compressed LRU pages)
> 
> Just imagine if we support migration of other kernel allocations, such as
> user page tables. The dump would be missing important information.
> 
> Once that happens, it will become a lot harder to judge whether CMA can be
> used or not. At least, the kernel could bail out/warn for these kernel
> configs.

Thanks for ponting this out. I still don't see this as a
roadblock for my primary usecase of the CMA reservation: 
get at least some (less reliable and potentially
less useful) kdump where the user is not prepared to sacrifice
the memory needed for the standard reservation and where the only
other option is no kdump at all.

Still a lot can be analyzed with a vmcore that is missing those
__GFP_MOVABLE pages. Even if/when some user page tables are
missing.

I'll send a v3 with the documenatation part updated to better
describe this.

> > The fourth patch adds a short delay before booting the kdump
> > kernel, allowing pending DMA transfers to finish.
> 
> 
> What does "short" mean? At least in theory, long-term pinning is forbidden
> for MIGRATE_CMA, so we should not have such pages mapped into an iommu where
> DMA can happily keep going on for quite a while.

See patch 4/5 in the series:
I propose 1 second, which is a negligible time from the kdump POV
but I assume it should be plenty enough for non-long-term pins in
MIGRATE_CMA. 

> But that assumes that our old kernel is not buggy, and doesn't end up
> mapping these pages into an IOMMU where DMA will just continue. I recall
> that DRM might currently be a problem, described here [1].
>
> If kdump starts not working as expected in case our old kernel is buggy,
> doesn't that partially destroy the purpose of kdump (-> debug bugs in the
> old kernel)?

Again, this is meant as a kind of "lightweight best effort
kdump". If there is a bug that causes the crash _and_ a bug in a
driver that hogs MIGRATE_CMA and maps it into IOMMU then this
lightweight kdump may break. Then it's time to sacrifice more
memory and use a normal crashkernel reservation.

It's not like any bug in the old kernel will break it. It's a
very specific kind of bug that can potentially break it.

I see this whole thing as particularly useful for VMs. Unlike big
physical machines, where taking away a couple hundred MBs of
memory for kdump does not really hurt, a VM can ideally be given just
enough memory for its particular task. This can often be less
than 1 GB. Proper kdump reservation needs a couple hundred MBs,
so a very large proportion of the VM memory. In case of a
virtualization host running hundreds or thousands such VMs this
means a huge waste of memory. And VMs often don't use too many
drivers for real hardware, decreasing the risk of hitting a buggy
driver like this.

Thanks,

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Donald Dutile 11 months, 1 week ago


On 3/3/25 3:25 AM, David Hildenbrand wrote:
> On 20.02.25 17:48, Jiri Bohac wrote:
>> Hi,
>>
>> this series implements a way to reserve additional crash kernel
>> memory using CMA.
>>
>> Link to the v1 discussion:
>> https://lore.kernel.org/lkml/ZWD_fAPqEWkFlEkM@dwarf.suse.cz/
>> See below for the changes since v1 and how concerns from the
>> discussion have been addressed.
>>
>> Currently, all the memory for the crash kernel is not usable by
>> the 1st (production) kernel. It is also unmapped so that it can't
>> be corrupted by the fault that will eventually trigger the crash.
>> This makes sense for the memory actually used by the kexec-loaded
>> crash kernel image and initrd and the data prepared during the
>> load (vmcoreinfo, ...). However, the reserved space needs to be
>> much larger than that to provide enough run-time memory for the
>> crash kernel and the kdump userspace. Estimating the amount of
>> memory to reserve is difficult. Being too careful makes kdump
>> likely to end in OOM, being too generous takes even more memory
>> from the production system. Also, the reservation only allows
>> reserving a single contiguous block (or two with the "low"
>> suffix). I've seen systems where this fails because the physical
>> memory is fragmented.
>>
>> By reserving additional crashkernel memory from CMA, the main
>> crashkernel reservation can be just large enough to fit the
>> kernel and initrd image, minimizing the memory taken away from
>> the production system. Most of the run-time memory for the crash
>> kernel will be memory previously available to userspace in the
>> production system. As this memory is no longer wasted, the
>> reservation can be done with a generous margin, making kdump more
>> reliable. Kernel memory that we need to preserve for dumping is
>> never allocated from CMA. User data is typically not dumped by
>> makedumpfile. When dumping of user data is intended this new CMA
>> reservation cannot be used.
> 
> 
> Hi,
> 
> I'll note that your comment about "user space" is currently the case, but will likely not hold in the long run. The assumption you are making is that only user-space memory will be allocated from MIGRATE_CMA, which is not necessarily the case. Any movable allocation will end up in there.
> 
> Besides LRU folios (user space memory and the pagecache), we already support migration of some kernel allocations using the non-lru migration framework. Such allocations (which use __GFP_MOVABLE, see __SetPageMovable()) currently only include
> * memory balloon: pages we never want to dump either way
> * zsmalloc (->zpool): only used by zswap (-> compressed LRU pages)
> * z3fold (->zpool): only used by zswap (-> compressed LRU pages)
> 
> Just imagine if we support migration of other kernel allocations, such as user page tables. The dump would be missing important information.
> 
IOMMUFD is a near-term candidate for user page tables with multi-stage iommu support with going through upstream review atm.
Just saying, that David's case will be a norm in high-end VMs with performance-enhanced guest-driven iommu support (for GPUs).

> Once that happens, it will become a lot harder to judge whether CMA can be used or not. At least, the kernel could bail out/warn for these kernel configs.
> 
I don't think the aforementioned focus is to use CMA, but given its performance benefits, it won't take long to be the next perf improvement step taken.

>>
>> There are five patches in this series:
>>
>> The first adds a new ",cma" suffix to the recenly introduced generic
>> crashkernel parsing code. parse_crashkernel() takes one more
>> argument to store the cma reservation size.
>>
>> The second patch implements reserve_crashkernel_cma() which
>> performs the reservation. If the requested size is not available
>> in a single range, multiple smaller ranges will be reserved.
>>
>> The third patch updates Documentation/, explicitly mentioning the
>> potential DMA corruption of the CMA-reserved memory.
>>
>> The fourth patch adds a short delay before booting the kdump
>> kernel, allowing pending DMA transfers to finish.
> 
> 
> What does "short" mean? At least in theory, long-term pinning is forbidden for MIGRATE_CMA, so we should not have such pages mapped into an iommu where DMA can happily keep going on for quite a while.
> 
Hmmm, in the case I mentioned above, should there be a kexec hook in multi-stage IOMMU support for the hypervisor/VMM to invalidate/shut-off stage 2 mappings asap (a multi-microsecond process) so
DMA termination from VMs is stunted ?  is that already done today (due to 'simple', single-stage, device assignment in a VM)?

> But that assumes that our old kernel is not buggy, and doesn't end up mapping these pages into an IOMMU where DMA will just continue. I recall that DRM might currently be a problem, described here [1].
> 
> If kdump starts not working as expected in case our old kernel is buggy, doesn't that partially destroy the purpose of kdump (-> debug bugs in the old kernel)?
> 
> 
> [1] https://lore.kernel.org/all/Z6MV_Y9WRdlBYeRs@phenom.ffwll.local/T/#u
>

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Baoquan He 11 months, 1 week ago

On 03/03/25 at 09:17am, Donald Dutile wrote:
> 
> 
> On 3/3/25 3:25 AM, David Hildenbrand wrote:
> > On 20.02.25 17:48, Jiri Bohac wrote:
> > > Hi,
> > > 
> > > this series implements a way to reserve additional crash kernel
> > > memory using CMA.
> > > 
> > > Link to the v1 discussion:
> > > https://lore.kernel.org/lkml/ZWD_fAPqEWkFlEkM@dwarf.suse.cz/
> > > See below for the changes since v1 and how concerns from the
> > > discussion have been addressed.
> > > 
> > > Currently, all the memory for the crash kernel is not usable by
> > > the 1st (production) kernel. It is also unmapped so that it can't
> > > be corrupted by the fault that will eventually trigger the crash.
> > > This makes sense for the memory actually used by the kexec-loaded
> > > crash kernel image and initrd and the data prepared during the
> > > load (vmcoreinfo, ...). However, the reserved space needs to be
> > > much larger than that to provide enough run-time memory for the
> > > crash kernel and the kdump userspace. Estimating the amount of
> > > memory to reserve is difficult. Being too careful makes kdump
> > > likely to end in OOM, being too generous takes even more memory
> > > from the production system. Also, the reservation only allows
> > > reserving a single contiguous block (or two with the "low"
> > > suffix). I've seen systems where this fails because the physical
> > > memory is fragmented.
> > > 
> > > By reserving additional crashkernel memory from CMA, the main
> > > crashkernel reservation can be just large enough to fit the
> > > kernel and initrd image, minimizing the memory taken away from
> > > the production system. Most of the run-time memory for the crash
> > > kernel will be memory previously available to userspace in the
> > > production system. As this memory is no longer wasted, the
> > > reservation can be done with a generous margin, making kdump more
> > > reliable. Kernel memory that we need to preserve for dumping is
> > > never allocated from CMA. User data is typically not dumped by
> > > makedumpfile. When dumping of user data is intended this new CMA
> > > reservation cannot be used.
> > 
> > 
> > Hi,
> > 
> > I'll note that your comment about "user space" is currently the case, but will likely not hold in the long run. The assumption you are making is that only user-space memory will be allocated from MIGRATE_CMA, which is not necessarily the case. Any movable allocation will end up in there.
> > 
> > Besides LRU folios (user space memory and the pagecache), we already support migration of some kernel allocations using the non-lru migration framework. Such allocations (which use __GFP_MOVABLE, see __SetPageMovable()) currently only include
> > * memory balloon: pages we never want to dump either way
> > * zsmalloc (->zpool): only used by zswap (-> compressed LRU pages)
> > * z3fold (->zpool): only used by zswap (-> compressed LRU pages)
> > 
> > Just imagine if we support migration of other kernel allocations, such as user page tables. The dump would be missing important information.
> > 
> IOMMUFD is a near-term candidate for user page tables with multi-stage iommu support with going through upstream review atm.
> Just saying, that David's case will be a norm in high-end VMs with performance-enhanced guest-driven iommu support (for GPUs).

Thank both for valuable inputs, David and Don. I agree that we may argue
not every system have ballon or enabling swap for now, while future
extending of migration on other kernel allocation could become obstacle
we can't detour.

If we have known for sure this feature could be a bad code, we may need
to stop it in advance.

Thoughts, Jiri?

> 
> > Once that happens, it will become a lot harder to judge whether CMA can be used or not. At least, the kernel could bail out/warn for these kernel configs.
> > 
> I don't think the aforementioned focus is to use CMA, but given its performance benefits, it won't take long to be the next perf improvement step taken.
> 
> > > 
> > > There are five patches in this series:
> > > 
> > > The first adds a new ",cma" suffix to the recenly introduced generic
> > > crashkernel parsing code. parse_crashkernel() takes one more
> > > argument to store the cma reservation size.
> > > 
> > > The second patch implements reserve_crashkernel_cma() which
> > > performs the reservation. If the requested size is not available
> > > in a single range, multiple smaller ranges will be reserved.
> > > 
> > > The third patch updates Documentation/, explicitly mentioning the
> > > potential DMA corruption of the CMA-reserved memory.
> > > 
> > > The fourth patch adds a short delay before booting the kdump
> > > kernel, allowing pending DMA transfers to finish.
> > 
> > 
> > What does "short" mean? At least in theory, long-term pinning is forbidden for MIGRATE_CMA, so we should not have such pages mapped into an iommu where DMA can happily keep going on for quite a while.
> > 
> Hmmm, in the case I mentioned above, should there be a kexec hook in multi-stage IOMMU support for the hypervisor/VMM to invalidate/shut-off stage 2 mappings asap (a multi-microsecond process) so
> DMA termination from VMs is stunted ?  is that already done today (due to 'simple', single-stage, device assignment in a VM)?
> 
> > But that assumes that our old kernel is not buggy, and doesn't end up mapping these pages into an IOMMU where DMA will just continue. I recall that DRM might currently be a problem, described here [1].
> > 
> > If kdump starts not working as expected in case our old kernel is buggy, doesn't that partially destroy the purpose of kdump (-> debug bugs in the old kernel)?
> > 
> > 
> > [1] https://lore.kernel.org/all/Z6MV_Y9WRdlBYeRs@phenom.ffwll.local/T/#u
> > 
>

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by David Hildenbrand 8 months, 2 weeks ago

On 04.03.25 05:20, Baoquan He wrote:
> On 03/03/25 at 09:17am, Donald Dutile wrote:
>>
>>
>> On 3/3/25 3:25 AM, David Hildenbrand wrote:
>>> On 20.02.25 17:48, Jiri Bohac wrote:
>>>> Hi,
>>>>
>>>> this series implements a way to reserve additional crash kernel
>>>> memory using CMA.
>>>>
>>>> Link to the v1 discussion:
>>>> https://lore.kernel.org/lkml/ZWD_fAPqEWkFlEkM@dwarf.suse.cz/
>>>> See below for the changes since v1 and how concerns from the
>>>> discussion have been addressed.
>>>>
>>>> Currently, all the memory for the crash kernel is not usable by
>>>> the 1st (production) kernel. It is also unmapped so that it can't
>>>> be corrupted by the fault that will eventually trigger the crash.
>>>> This makes sense for the memory actually used by the kexec-loaded
>>>> crash kernel image and initrd and the data prepared during the
>>>> load (vmcoreinfo, ...). However, the reserved space needs to be
>>>> much larger than that to provide enough run-time memory for the
>>>> crash kernel and the kdump userspace. Estimating the amount of
>>>> memory to reserve is difficult. Being too careful makes kdump
>>>> likely to end in OOM, being too generous takes even more memory
>>>> from the production system. Also, the reservation only allows
>>>> reserving a single contiguous block (or two with the "low"
>>>> suffix). I've seen systems where this fails because the physical
>>>> memory is fragmented.
>>>>
>>>> By reserving additional crashkernel memory from CMA, the main
>>>> crashkernel reservation can be just large enough to fit the
>>>> kernel and initrd image, minimizing the memory taken away from
>>>> the production system. Most of the run-time memory for the crash
>>>> kernel will be memory previously available to userspace in the
>>>> production system. As this memory is no longer wasted, the
>>>> reservation can be done with a generous margin, making kdump more
>>>> reliable. Kernel memory that we need to preserve for dumping is
>>>> never allocated from CMA. User data is typically not dumped by
>>>> makedumpfile. When dumping of user data is intended this new CMA
>>>> reservation cannot be used.
>>>
>>>
>>> Hi,
>>>
>>> I'll note that your comment about "user space" is currently the case, but will likely not hold in the long run. The assumption you are making is that only user-space memory will be allocated from MIGRATE_CMA, which is not necessarily the case. Any movable allocation will end up in there.
>>>
>>> Besides LRU folios (user space memory and the pagecache), we already support migration of some kernel allocations using the non-lru migration framework. Such allocations (which use __GFP_MOVABLE, see __SetPageMovable()) currently only include
>>> * memory balloon: pages we never want to dump either way
>>> * zsmalloc (->zpool): only used by zswap (-> compressed LRU pages)
>>> * z3fold (->zpool): only used by zswap (-> compressed LRU pages)
>>>
>>> Just imagine if we support migration of other kernel allocations, such as user page tables. The dump would be missing important information.
>>>
>> IOMMUFD is a near-term candidate for user page tables with multi-stage iommu support with going through upstream review atm.
>> Just saying, that David's case will be a norm in high-end VMs with performance-enhanced guest-driven iommu support (for GPUs).
> 
> Thank both for valuable inputs, David and Don. I agree that we may argue
> not every system have ballon or enabling swap for now, while future
> extending of migration on other kernel allocation could become obstacle
> we can't detour.
> 
> If we have known for sure this feature could be a bad code, we may need
> to stop it in advance.

Sorry for the late reply.

I think we just have to be careful to document it properly -- especially 
the shortcomings and that this feature might become a problem in the 
future. Movable user-space page tables getting placed on CMA memory 
would probably not be a problem if we don't care about ... user-space 
data either way.

The whole "Direct I/O takes max 1s" part is a bit shaky. Maybe it could 
be configurable how long to wait? 10s is certainly "safer".

But maybe, in the target use case: VMs, direct I/O will not be that common.

-- 
Cheers,

David / dhildenb

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Jiri Bohac 8 months, 1 week ago

On Wed, May 28, 2025 at 11:01:04PM +0200, David Hildenbrand wrote:
> I think we just have to be careful to document it properly -- especially the
> shortcomings and that this feature might become a problem in the future.
> Movable user-space page tables getting placed on CMA memory would probably
> not be a problem if we don't care about ... user-space data either way.

Agreed; in the v3 series [1] I amended the documentation part [2] to
explicitly mention that kernel movable allocations could be
missing from the vmcore.

The risks associated with pending DMA are also mentioned.

Is there anything you're still missing from the v3 documentation?

> The whole "Direct I/O takes max 1s" part is a bit shaky. Maybe it could be
> configurable how long to wait? 10s is certainly "safer".

I have nothing against making this configurable, or just setting
the fixed/default delay to 10s. Which would you prefer?
Would you prefer a command-line option, config option or a sysfs
file?

Thanks!

[1] https://lore.kernel.org/lkml/Z9H10pYIFLBHNKpr@dwarf.suse.cz/
[2] https://lore.kernel.org/lkml/Z9H4E82EslkGR7pV@dwarf.suse.cz/

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Michal Hocko 8 months, 1 week ago

On Wed 28-05-25 23:01:04, David Hildenbrand wrote:
[...]
> I think we just have to be careful to document it properly -- especially the
> shortcomings and that this feature might become a problem in the future.
> Movable user-space page tables getting placed on CMA memory would probably
> not be a problem if we don't care about ... user-space data either way.

I think makedumpfile could refuse to capture a dump if userspace memory
is requested to enforce this.

> The whole "Direct I/O takes max 1s" part is a bit shaky. Maybe it could be
> configurable how long to wait? 10s is certainly "safer".

Quite honestly we will never know and rather than making this
configurable I would go with reasonably large. Couple of seconds
certainly do not matter for the kdump situations but I would go as far
as minutes.

-- 
Michal Hocko
SUSE Labs

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by David Hildenbrand 8 months, 1 week ago

On 29.05.25 09:46, Michal Hocko wrote:
> On Wed 28-05-25 23:01:04, David Hildenbrand wrote:
> [...]
>> I think we just have to be careful to document it properly -- especially the
>> shortcomings and that this feature might become a problem in the future.
>> Movable user-space page tables getting placed on CMA memory would probably
>> not be a problem if we don't care about ... user-space data either way.
> 
> I think makedumpfile could refuse to capture a dump if userspace memory
> is requested to enforce this.

Yeah, it will be tricky once we support placing other memory on CMA 
regions. E.g., there was the discussion of making some slab allocations 
movable.

But probably, in such a configuration, we would later simply refuse to 
active CMA kdump.

> 
>> The whole "Direct I/O takes max 1s" part is a bit shaky. Maybe it could be
>> configurable how long to wait? 10s is certainly "safer".
> 
> Quite honestly we will never know and rather than making this
> configurable I would go with reasonably large. Couple of seconds
> certainly do not matter for the kdump situations but I would go as far
> as minutes.

I recall that somebody raised that kdump downtime might be problematic 
(might affect service downtime?).

So I would just add a kconfig option with a default of 10s.

But even better if we can avoid the kconfig and just make it 10s for all 
setups.

I would not suggest having a different (runtime/boottime) way of 
configuring this.

-- 
Cheers,

David / dhildenb

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Michal Hocko 8 months, 1 week ago

On Fri 30-05-25 10:06:52, David Hildenbrand wrote:
> On 29.05.25 09:46, Michal Hocko wrote:
> > On Wed 28-05-25 23:01:04, David Hildenbrand wrote:
> > [...]
> > > I think we just have to be careful to document it properly -- especially the
> > > shortcomings and that this feature might become a problem in the future.
> > > Movable user-space page tables getting placed on CMA memory would probably
> > > not be a problem if we don't care about ... user-space data either way.
> > 
> > I think makedumpfile could refuse to capture a dump if userspace memory
> > is requested to enforce this.
> 
> Yeah, it will be tricky once we support placing other memory on CMA regions.
> E.g., there was the discussion of making some slab allocations movable.
> 
> But probably, in such a configuration, we would later simply refuse to
> active CMA kdump.

Or we can make the kdump CMA region more special and only allow
GFP_HIGHUSER_MOVABLE allocations from that. Anyaway I think we should
deal with this once we get there.

> > > The whole "Direct I/O takes max 1s" part is a bit shaky. Maybe it could be
> > > configurable how long to wait? 10s is certainly "safer".
> > 
> > Quite honestly we will never know and rather than making this
> > configurable I would go with reasonably large. Couple of seconds
> > certainly do not matter for the kdump situations but I would go as far
> > as minutes.
> 
> I recall that somebody raised that kdump downtime might be problematic
> (might affect service downtime?).
> 
> So I would just add a kconfig option with a default of 10s.

kconfig option usually doesn't really work for distro kernels. I am
personally not really keen on having a tuning knob because there is a
risk of cargo cult based tuning we have seen in other areas. That might
make it hard to remove the knob later on. Fundamentally we should have 2
situations though. Either we know that the HW is sane and then we
shouldn't really need any sleep or the HW might misbehave and then we
need to wait _some_ time. If our initial guess is incorrect then we can
always increase it and we would learn about that through bug reports.

All that being said I would go with an additional parameter to the
kdump cma setup - e.g. cma_sane_dma that would skip waiting and use 10s
otherwise. That would make the optimized behavior opt in, we do not need
to support all sorts of timeouts and also learn if this is not
sufficient.

Makes sense?
-- 
Michal Hocko
SUSE Labs

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by David Hildenbrand 8 months, 1 week ago

On 30.05.25 10:28, Michal Hocko wrote:
> On Fri 30-05-25 10:06:52, David Hildenbrand wrote:
>> On 29.05.25 09:46, Michal Hocko wrote:
>>> On Wed 28-05-25 23:01:04, David Hildenbrand wrote:
>>> [...]
>>>> I think we just have to be careful to document it properly -- especially the
>>>> shortcomings and that this feature might become a problem in the future.
>>>> Movable user-space page tables getting placed on CMA memory would probably
>>>> not be a problem if we don't care about ... user-space data either way.
>>>
>>> I think makedumpfile could refuse to capture a dump if userspace memory
>>> is requested to enforce this.
>>
>> Yeah, it will be tricky once we support placing other memory on CMA regions.
>> E.g., there was the discussion of making some slab allocations movable.
>>
>> But probably, in such a configuration, we would later simply refuse to
>> active CMA kdump.
> 
> Or we can make the kdump CMA region more special and only allow
> GFP_HIGHUSER_MOVABLE allocations from that. Anyaway I think we should
> deal with this once we get there.

Might be doable. When migrating (e.g., compacting) pages we'd have to 
make sure to also not migrate these pages into the CMA regions. Might be 
a bit more tricky, but likely solvable.

>   
>>>> The whole "Direct I/O takes max 1s" part is a bit shaky. Maybe it could be
>>>> configurable how long to wait? 10s is certainly "safer".
>>>
>>> Quite honestly we will never know and rather than making this
>>> configurable I would go with reasonably large. Couple of seconds
>>> certainly do not matter for the kdump situations but I would go as far
>>> as minutes.
>>
>> I recall that somebody raised that kdump downtime might be problematic
>> (might affect service downtime?).
>>
>> So I would just add a kconfig option with a default of 10s.
> 
> kconfig option usually doesn't really work for distro kernels. I am
> personally not really keen on having a tuning knob because there is a
> risk of cargo cult based tuning we have seen in other areas. That might
> make it hard to remove the knob later on. Fundamentally we should have 2
> situations though. Either we know that the HW is sane and then we
> shouldn't really need any sleep or the HW might misbehave and then we
> need to wait _some_ time. If our initial guess is incorrect then we can
> always increase it and we would learn about that through bug reports.

kconfigs are usually much easier to alter/remove than other tunables in 
my experience.

But yeah, it would have to go for the setting that works for all 
supported hw (iow, conservative timeout).

> 
> All that being said I would go with an additional parameter to the
> kdump cma setup - e.g. cma_sane_dma that would skip waiting and use 10s
> otherwise. That would make the optimized behavior opt in, we do not need
> to support all sorts of timeouts and also learn if this is not
> sufficient.
> 
> Makes sense?

Just so I understand correctly, you mean extending the "crashkernel=" 
option with a boolean parameter? If set, e.g., wait 1s, otherwise magic 
number 10?

-- 
Cheers,

David / dhildenb

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Michal Hocko 8 months, 1 week ago

On Fri 30-05-25 10:39:39, David Hildenbrand wrote:
> On 30.05.25 10:28, Michal Hocko wrote:
[...]
> > All that being said I would go with an additional parameter to the
> > kdump cma setup - e.g. cma_sane_dma that would skip waiting and use 10s
> > otherwise. That would make the optimized behavior opt in, we do not need
> > to support all sorts of timeouts and also learn if this is not
> > sufficient.
> > 
> > Makes sense?
> 
> Just so I understand correctly, you mean extending the "crashkernel=" option
> with a boolean parameter? If set, e.g., wait 1s, otherwise magic number 10?

crashkernel=1G,cma,cma_sane_dma # no wait on transition
crashkernel=1G,cma # wait on transition with e.g. 10s timeout
-- 
Michal Hocko
SUSE Labs

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by David Hildenbrand 8 months, 1 week ago

On 30.05.25 11:07, Michal Hocko wrote:
> On Fri 30-05-25 10:39:39, David Hildenbrand wrote:
>> On 30.05.25 10:28, Michal Hocko wrote:
> [...]
>>> All that being said I would go with an additional parameter to the
>>> kdump cma setup - e.g. cma_sane_dma that would skip waiting and use 10s
>>> otherwise. That would make the optimized behavior opt in, we do not need
>>> to support all sorts of timeouts and also learn if this is not
>>> sufficient.
>>>
>>> Makes sense?
>>
>> Just so I understand correctly, you mean extending the "crashkernel=" option
>> with a boolean parameter? If set, e.g., wait 1s, otherwise magic number 10?
> 
> crashkernel=1G,cma,cma_sane_dma # no wait on transition

But is no wait ok? I mean, any O_DIRECT with any device would at least 
take a bit, no?

Of course, there is a short time between the crash and actually 
triggerying kdump.

> crashkernel=1G,cma # wait on transition with e.g. 10s timeout

In general, would work for me.

-- 
Cheers,

David / dhildenb

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Jiri Bohac 8 months, 1 week ago

On Fri, May 30, 2025 at 11:11:40AM +0200, David Hildenbrand wrote:
> On 30.05.25 11:07, Michal Hocko wrote:
> > On Fri 30-05-25 10:39:39, David Hildenbrand wrote:
> > > On 30.05.25 10:28, Michal Hocko wrote:
> > [...]
> > > > All that being said I would go with an additional parameter to the
> > > > kdump cma setup - e.g. cma_sane_dma that would skip waiting and use 10s
> > > > otherwise. That would make the optimized behavior opt in, we do not need
> > > > to support all sorts of timeouts and also learn if this is not
> > > > sufficient.
> > > > 
> > > > Makes sense?
> > > 
> > > Just so I understand correctly, you mean extending the "crashkernel=" option
> > > with a boolean parameter? If set, e.g., wait 1s, otherwise magic number 10?
> > 
> > crashkernel=1G,cma,cma_sane_dma # no wait on transition
> 
> But is no wait ok? I mean, any O_DIRECT with any device would at least take
> a bit, no?
> 
> Of course, there is a short time between the crash and actually triggerying
> kdump.
> 
> > crashkernel=1G,cma # wait on transition with e.g. 10s timeout
> 
> In general, would work for me.

I don't like extending the crashkernel= syntax like this.
It would make hooking into the generic parsing code in
parse_crashkernel() really ugly. The syntax is already
convoluted as is and hard enough to explain in the documentation.

Also I don't see how adding a boolean knob is better than adding
one that allows setting any arbitrary timeout. It has less
flexibility and all the drawbacks of having an extra knob.

I am inclined to just setting the fixed delay to 10s for now and
adding a sysfs knob later if someone asks for it.

Would that work for you?

If you don't have other objections to the v3 series,
I'll just update it for v6.15 and post again a v4
with the 10s timeout...

Thanks for your input!

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by David Hildenbrand 8 months, 1 week ago

On 30.05.25 11:34, Jiri Bohac wrote:
> On Fri, May 30, 2025 at 11:11:40AM +0200, David Hildenbrand wrote:
>> On 30.05.25 11:07, Michal Hocko wrote:
>>> On Fri 30-05-25 10:39:39, David Hildenbrand wrote:
>>>> On 30.05.25 10:28, Michal Hocko wrote:
>>> [...]
>>>>> All that being said I would go with an additional parameter to the
>>>>> kdump cma setup - e.g. cma_sane_dma that would skip waiting and use 10s
>>>>> otherwise. That would make the optimized behavior opt in, we do not need
>>>>> to support all sorts of timeouts and also learn if this is not
>>>>> sufficient.
>>>>>
>>>>> Makes sense?
>>>>
>>>> Just so I understand correctly, you mean extending the "crashkernel=" option
>>>> with a boolean parameter? If set, e.g., wait 1s, otherwise magic number 10?
>>>
>>> crashkernel=1G,cma,cma_sane_dma # no wait on transition
>>
>> But is no wait ok? I mean, any O_DIRECT with any device would at least take
>> a bit, no?
>>
>> Of course, there is a short time between the crash and actually triggerying
>> kdump.
>>
>>> crashkernel=1G,cma # wait on transition with e.g. 10s timeout
>>
>> In general, would work for me.
> 
> I don't like extending the crashkernel= syntax like this.
> It would make hooking into the generic parsing code in
> parse_crashkernel() really ugly. The syntax is already
> convoluted as is and hard enough to explain in the documentation.

Would another boolean flag (on top of the other one you are adding) 
really make this significantly more ugly?

> 
> Also I don't see how adding a boolean knob is better than adding
> one that allows setting any arbitrary timeout. It has less
> flexibility and all the drawbacks of having an extra knob.

I guess Michals point is that specifying the higher-level problem and 
giving less flexibility mioght actually be less confusing for users.

> 
> I am inclined to just setting the fixed delay to 10s for now and
> adding a sysfs knob later if someone asks for it.
> 
> Would that work for you?

Sure. We could always add such a flag later if it's really a problem for 
someone.

-- 
Cheers,

David / dhildenb

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Jiri Bohac 8 months, 1 week ago

On Fri, May 30, 2025 at 11:47:46AM +0200, David Hildenbrand wrote:
> > > > crashkernel=1G,cma,cma_sane_dma # no wait on transition
> > > 
> > > But is no wait ok? I mean, any O_DIRECT with any device would at least take
> > > a bit, no?
> > > 
> > > Of course, there is a short time between the crash and actually triggerying
> > > kdump.
> > > 
> > > > crashkernel=1G,cma # wait on transition with e.g. 10s timeout
> > > 
> > > In general, would work for me.
> > 
> > I don't like extending the crashkernel= syntax like this.
> > It would make hooking into the generic parsing code in
> > parse_crashkernel() really ugly. The syntax is already
> > convoluted as is and hard enough to explain in the documentation.
> 
> Would another boolean flag (on top of the other one you are adding) really
> make this significantly more ugly?

the current code does not split the parameter by commas and treat
the part as boolean flags.

Both ",cma" and ",cma,cma_sane_dma" (and possibly
",cma_sane_dma,cma") would need to be added to suffix_tbl[]
(carefully thinking about the order because one is a prefix of the
other); then handled almost the same except setting the flag.

Also I think using the command line is way less flexible than
sysfs. E.g. the userspace tool loading the crash kernel (kdump)
may want to decide if the hardware is sane using its own
whitelist/blacklist...

> > I am inclined to just setting the fixed delay to 10s for now and
> > adding a sysfs knob later if someone asks for it.
> > 
> > Would that work for you?
> 
> Sure. We could always add such a flag later if it's really a problem for
> someone.

OK, thanks! Will post the v4 shortly.

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Michal Hocko 8 months, 1 week ago

On Fri 30-05-25 11:47:46, David Hildenbrand wrote:
> On 30.05.25 11:34, Jiri Bohac wrote:
[...]
> > I am inclined to just setting the fixed delay to 10s for now and
> > adding a sysfs knob later if someone asks for it.
> > 
> > Would that work for you?
> 
> Sure. We could always add such a flag later if it's really a problem for
> someone.

Yes, no objection with the most conservative approach first.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Michal Hocko 8 months, 1 week ago

On Fri 30-05-25 11:11:40, David Hildenbrand wrote:
> On 30.05.25 11:07, Michal Hocko wrote:
> > On Fri 30-05-25 10:39:39, David Hildenbrand wrote:
> > > On 30.05.25 10:28, Michal Hocko wrote:
> > [...]
> > > > All that being said I would go with an additional parameter to the
> > > > kdump cma setup - e.g. cma_sane_dma that would skip waiting and use 10s
> > > > otherwise. That would make the optimized behavior opt in, we do not need
> > > > to support all sorts of timeouts and also learn if this is not
> > > > sufficient.
> > > > 
> > > > Makes sense?
> > > 
> > > Just so I understand correctly, you mean extending the "crashkernel=" option
> > > with a boolean parameter? If set, e.g., wait 1s, otherwise magic number 10?
> > 
> > crashkernel=1G,cma,cma_sane_dma # no wait on transition
> 
> But is no wait ok? I mean, any O_DIRECT with any device would at least take
> a bit, no?
> 
> Of course, there is a short time between the crash and actually triggerying
> kdump.

This is something we can test for and if we need a short timeout in this
case as well then it is just trivial to add it. I am much more
concerned about those potentially unpredictable DMA transfers that could
take too long and it is impossible to test for those and therefore we
need to overshoot.
 
> > crashkernel=1G,cma # wait on transition with e.g. 10s timeout
> 
> In general, would work for me.
> 
> -- 
> Cheers,
> 
> David / dhildenb

-- 
Michal Hocko
SUSE Labs

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by David Hildenbrand 8 months, 1 week ago

On 30.05.25 11:26, Michal Hocko wrote:
> On Fri 30-05-25 11:11:40, David Hildenbrand wrote:
>> On 30.05.25 11:07, Michal Hocko wrote:
>>> On Fri 30-05-25 10:39:39, David Hildenbrand wrote:
>>>> On 30.05.25 10:28, Michal Hocko wrote:
>>> [...]
>>>>> All that being said I would go with an additional parameter to the
>>>>> kdump cma setup - e.g. cma_sane_dma that would skip waiting and use 10s
>>>>> otherwise. That would make the optimized behavior opt in, we do not need
>>>>> to support all sorts of timeouts and also learn if this is not
>>>>> sufficient.
>>>>>
>>>>> Makes sense?
>>>>
>>>> Just so I understand correctly, you mean extending the "crashkernel=" option
>>>> with a boolean parameter? If set, e.g., wait 1s, otherwise magic number 10?
>>>
>>> crashkernel=1G,cma,cma_sane_dma # no wait on transition
>>
>> But is no wait ok? I mean, any O_DIRECT with any device would at least take
>> a bit, no?
>>
>> Of course, there is a short time between the crash and actually triggerying
>> kdump.
> 
> This is something we can test for and if we need a short timeout in this
> case as well then it is just trivial to add it. I am much more
> concerned about those potentially unpredictable DMA transfers that could
> take too long and it is impossible to test for those and therefore we
> need to overshoot.

Agreed.

-- 
Cheers,

David / dhildenb

Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA

Posted by Michal Hocko 8 months, 1 week ago

On Thu 29-05-25 09:46:28, Michal Hocko wrote:
> On Wed 28-05-25 23:01:04, David Hildenbrand wrote:
> [...]
> > I think we just have to be careful to document it properly -- especially the
> > shortcomings and that this feature might become a problem in the future.
> > Movable user-space page tables getting placed on CMA memory would probably
> > not be a problem if we don't care about ... user-space data either way.
> 
> I think makedumpfile could refuse to capture a dump if userspace memory
> is requested to enforce this.
> 
> > The whole "Direct I/O takes max 1s" part is a bit shaky. Maybe it could be
> > configurable how long to wait? 10s is certainly "safer".
> 
> Quite honestly we will never know and rather than making this
> configurable I would go with reasonably large. Couple of seconds
> certainly do not matter for the kdump situations but I would go as far

typo
s@I would go@I would not go@

> as minutes.

-- 
Michal Hocko
SUSE Labs