include/linux/cma.h | 1 + kernel/dma/contiguous.c | 14 ++++++++++++-- mm/cma.c | 11 ++++++++++- 3 files changed, 23 insertions(+), 3 deletions(-)
There was a report on a multi-numa-nodes ARM server that when IOMMU is
disabled, the dma_alloc_coherent() function always returns memory from
node 0 even for devices attaching to other nodes, while they can get
local dma memory when IOMMU is on with the same API.
The reason is, when IOMMU is disabled, the dma_alloc_coherent() will
go the direct way and call dma_alloc_contiguous(). The system doesn't
have any explicit cma setting (like per-numa cma), and only has a
default 64MB cma reserved area (on node 0), where kernel will try
first to allocate memory from.
Robin Murphy suggested to setup pernuma cma or disable cma, which did
solve the issue. While there is still concern that for customers
which don't have much kernel knowledge, they could still suffer from
this silently as some architectures enable cma area by default (not
an issue for X86 though, which set CONFIG_CMA_SIZE_MBYTES to 0 by
default) for most Linux distributions.
One thought is to follow the current cma reserving policy for platform
with 'CONFIG_DMA_NUMA_CMA=y', that if the numa cma (either the 'numa cma'
or 'cma pernuma' method) is not explicitly configured, set it up
according to size of default 'dma_contiguous_default_area', while
skipping the numa node where the 'dma_contiguous_default_area' lies
in, this way the default behavior of platform with one NUMA node is
kept unchanged.
To get the node info of cma area, add some helpr funciton and setup
in cma code.
Reported-by: Changrong Chen <chenchangrong.ccr@alibaba-inc.com>
Suggested-by: Ying Huang <ying.huang@linux.alibaba.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
Changelog:
since v2:
* setup the numa cma are following default cma, while
skipping the node holds the default cma (Robin Murphy)
* add cma_get_node() help and related code
* add reporter info
since v1:
* don't use the original way of adding alloc_pages_node()
before trying default cma node (Robin Murphy)
* setup default numa cma area if not configured (Ying Huang)
v2: https://lore.kernel.org/lkml/20260423095243.14239-1-feng.tang@linux.alibaba.com/
v1: https://lore.kernel.org/lkml/20260414090310.92055-1-feng.tang@linux.alibaba.com/
include/linux/cma.h | 1 +
kernel/dma/contiguous.c | 14 ++++++++++++--
mm/cma.c | 11 ++++++++++-
3 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 8555d38a97b1..acc9ecdf28e1 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -26,6 +26,7 @@ extern unsigned long totalcma_pages;
extern phys_addr_t cma_get_base(const struct cma *cma);
extern unsigned long cma_get_size(const struct cma *cma);
extern const char *cma_get_name(const struct cma *cma);
+extern int cma_get_nid(const struct cma *cma);
extern int __init cma_declare_contiguous_nid(phys_addr_t base,
phys_addr_t size, phys_addr_t limit,
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 03f52bd17120..ae6d856c5559 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -136,6 +136,7 @@ static struct cma *dma_contiguous_numa_area[MAX_NUMNODES];
static phys_addr_t numa_cma_size[MAX_NUMNODES] __initdata;
static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES];
static phys_addr_t pernuma_size_bytes __initdata;
+static bool numa_cma_configured;
static int __init early_numa_cma(char *p)
{
@@ -164,6 +165,7 @@ static int __init early_numa_cma(char *p)
break;
}
+ numa_cma_configured = true;
return 0;
}
early_param("numa_cma", early_numa_cma);
@@ -171,6 +173,7 @@ early_param("numa_cma", early_numa_cma);
static int __init early_cma_pernuma(char *p)
{
pernuma_size_bytes = memparse(p, &p);
+ numa_cma_configured = true;
return 0;
}
early_param("cma_pernuma", early_cma_pernuma);
@@ -221,6 +224,13 @@ static void __init dma_numa_cma_reserve(void)
ret, nid);
}
+ if (!numa_cma_configured && dma_contiguous_default_area) {
+ if (nid != cma_get_nid(dma_contiguous_default_area))
+ numa_cma_size[nid] = cma_get_size(dma_contiguous_default_area);
+ else
+ dma_contiguous_numa_area[nid] = dma_contiguous_default_area;
+ }
+
if (numa_cma_size[nid]) {
cma = &dma_contiguous_numa_area[nid];
@@ -255,8 +265,6 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
phys_addr_t selected_limit = limit;
bool fixed = false;
- dma_numa_cma_reserve();
-
pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
if (size_cmdline != -1) {
@@ -312,6 +320,8 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
if (ret)
pr_warn("Couldn't queue default CMA region for heap creation.");
}
+
+ dma_numa_cma_reserve();
}
void __weak
diff --git a/mm/cma.c b/mm/cma.c
index c7ca567f4c5c..3bbfafeaf6c1 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -54,6 +54,11 @@ const char *cma_get_name(const struct cma *cma)
}
EXPORT_SYMBOL_GPL(cma_get_name);
+extern int cma_get_nid(const struct cma *cma)
+{
+ return cma->nid;
+}
+
static unsigned long cma_bitmap_aligned_mask(const struct cma *cma,
unsigned int align_order)
{
@@ -511,7 +516,11 @@ static int __init __cma_declare_contiguous_nid(phys_addr_t *basep,
return ret;
}
- (*res_cma)->nid = nid;
+ if (IS_ENABLED(CONFIG_NUMA) && nid == NUMA_NO_NODE)
+ (*res_cma)->nid = early_pfn_to_nid((*res_cma)->ranges[0].base_pfn);
+ else
+ (*res_cma)->nid = nid;
+
*basep = base;
return 0;
--
2.39.5 (Apple Git-154)
Hi Feng,
kernel test robot noticed the following build warnings:
[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master v7.1-rc1 next-20260430]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Feng-Tang/dma-contiguous-setup-default-numa-cma-area-if-not-configured-explicitly/20260430-073422
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20260428060550.7167-1-feng.tang%40linux.alibaba.com
patch subject: [PATCH v3] dma-contiguous: setup default numa cma area if not configured explicitly
config: x86_64-randconfig-r112-20260501 (https://download.01.org/0day-ci/archive/20260501/202605011354.YrrRWXdx-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
sparse: v0.6.5-rc1
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260501/202605011354.YrrRWXdx-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605011354.YrrRWXdx-lkp@intel.com/
sparse warnings: (new ones prefixed by >>)
>> mm/cma.c:57:12: sparse: sparse: function 'cma_get_nid' with external linkage has definition
vim +/cma_get_nid +57 mm/cma.c
56
> 57 extern int cma_get_nid(const struct cma *cma)
58 {
59 return cma->nid;
60 }
61
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On 4/28/26 08:05, Feng Tang wrote:
> There was a report on a multi-numa-nodes ARM server that when IOMMU is
> disabled, the dma_alloc_coherent() function always returns memory from
> node 0 even for devices attaching to other nodes, while they can get
> local dma memory when IOMMU is on with the same API.
>
> The reason is, when IOMMU is disabled, the dma_alloc_coherent() will
> go the direct way and call dma_alloc_contiguous(). The system doesn't
> have any explicit cma setting (like per-numa cma), and only has a
> default 64MB cma reserved area (on node 0), where kernel will try
> first to allocate memory from.
>
> Robin Murphy suggested to setup pernuma cma or disable cma, which did
> solve the issue.
That sounds like the obvious approach to me.
> While there is still concern that for customers
> which don't have much kernel knowledge, they could still suffer from
> this silently as some architectures enable cma area by default (not
> an issue for X86 though, which set CONFIG_CMA_SIZE_MBYTES to 0 by
> default) for most Linux distributions.
Okay, so on x86 it is not silent, because they don't even have a default CMA area?
>
> One thought is to follow the current cma reserving policy for platform
> with 'CONFIG_DMA_NUMA_CMA=y', that if the numa cma (either the 'numa cma'
> or 'cma pernuma' method) is not explicitly configured, set it up
> according to size of default 'dma_contiguous_default_area', while
> skipping the numa node where the 'dma_contiguous_default_area' lies
> in, this way the default behavior of platform with one NUMA node is
> kept unchanged.
So, the kernel is configured to have a certain CONFIG_CMA_SIZE_MBYTES size, but
you go ahead and multiply that by the number of nodes? Sounds wrong.
The whole proposal here looks rather hacky.
Wouldn't a default for e.g., pernuma_size_bytes make more sense, that users can
then overwrite on the cmdline?
>
> To get the node info of cma area, add some helpr funciton and setup
> in cma code.
>
> Reported-by: Changrong Chen <chenchangrong.ccr@alibaba-inc.com>
> Suggested-by: Ying Huang <ying.huang@linux.alibaba.com>
> Suggested-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> ---
> Changelog:
>
> since v2:
> * setup the numa cma are following default cma, while
> skipping the node holds the default cma (Robin Murphy)
> * add cma_get_node() help and related code
> * add reporter info
>
> since v1:
> * don't use the original way of adding alloc_pages_node()
> before trying default cma node (Robin Murphy)
> * setup default numa cma area if not configured (Ying Huang)
>
> v2: https://lore.kernel.org/lkml/20260423095243.14239-1-feng.tang@linux.alibaba.com/
> v1: https://lore.kernel.org/lkml/20260414090310.92055-1-feng.tang@linux.alibaba.com/
>
> include/linux/cma.h | 1 +
> kernel/dma/contiguous.c | 14 ++++++++++++--
> mm/cma.c | 11 ++++++++++-
> 3 files changed, 23 insertions(+), 3 deletions(-)
[...]
> if (numa_cma_size[nid]) {
>
> cma = &dma_contiguous_numa_area[nid];
> @@ -255,8 +265,6 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
> phys_addr_t selected_limit = limit;
> bool fixed = false;
>
> - dma_numa_cma_reserve();
> -
> pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
>
> if (size_cmdline != -1) {
> @@ -312,6 +320,8 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
> if (ret)
> pr_warn("Couldn't queue default CMA region for heap creation.");
> }
> +
> + dma_numa_cma_reserve();
> }
>
> void __weak
> diff --git a/mm/cma.c b/mm/cma.c
> index c7ca567f4c5c..3bbfafeaf6c1 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -54,6 +54,11 @@ const char *cma_get_name(const struct cma *cma)
> }
> EXPORT_SYMBOL_GPL(cma_get_name);
>
> +extern int cma_get_nid(const struct cma *cma)
> +{
> + return cma->nid;
> +}
Why do you have to store the nid instead of just looking it up from the base_pfn
in here?
Also, what is the expectation when the ranges would span different NIDs? (is
that possible?)
--
Cheers,
David
Hi David,
Thanks for the review!
On Tue, Apr 28, 2026 at 09:52:15AM +0200, David Hildenbrand (Arm) wrote:
> On 4/28/26 08:05, Feng Tang wrote:
> > There was a report on a multi-numa-nodes ARM server that when IOMMU is
> > disabled, the dma_alloc_coherent() function always returns memory from
> > node 0 even for devices attaching to other nodes, while they can get
> > local dma memory when IOMMU is on with the same API.
> >
> > The reason is, when IOMMU is disabled, the dma_alloc_coherent() will
> > go the direct way and call dma_alloc_contiguous(). The system doesn't
> > have any explicit cma setting (like per-numa cma), and only has a
> > default 64MB cma reserved area (on node 0), where kernel will try
> > first to allocate memory from.
> >
> > Robin Murphy suggested to setup pernuma cma or disable cma, which did
> > solve the issue.
>
> That sounds like the obvious approach to me.
Indeed. We also gave some of them to the reporter once we got the report.
> > While there is still concern that for customers
> > which don't have much kernel knowledge, they could still suffer from
> > this silently as some architectures enable cma area by default (not
> > an issue for X86 though, which set CONFIG_CMA_SIZE_MBYTES to 0 by
> > default) for most Linux distributions.
>
> Okay, so on x86 it is not silent, because they don't even have a default CMA area?
Right for default kernel configs.
In kernel/dma/Kconfig:
config CMA_SIZE_MBYTES
int "Size in Mega Bytes"
depends on !CMA_SIZE_SEL_PERCENTAGE
default 0 if X86
default 16
config CMA_SIZE_PERCENTAGE
int "Percentage of total memory"
depends on !CMA_SIZE_SEL_MBYTES
default 0 if X86
default 10
> >
> > One thought is to follow the current cma reserving policy for platform
> > with 'CONFIG_DMA_NUMA_CMA=y', that if the numa cma (either the 'numa cma'
> > or 'cma pernuma' method) is not explicitly configured, set it up
> > according to size of default 'dma_contiguous_default_area', while
> > skipping the numa node where the 'dma_contiguous_default_area' lies
> > in, this way the default behavior of platform with one NUMA node is
> > kept unchanged.
>
> So, the kernel is configured to have a certain CONFIG_CMA_SIZE_MBYTES size, but
> you go ahead and multiply that by the number of nodes? Sounds wrong.
Yes. I thought about that, and didn't have good solution, and used this
given it's on a multi-numa-node machine, which may not be too bad
regarding memory usage.
Robin did concern about the memory usage for embedded/small devices in
v2 review, and we change to v3 to not affect them.
>
> The whole proposal here looks rather hacky.
I agree :)
> Wouldn't a default for e.g., pernuma_size_bytes make more sense, that users can
> then overwrite on the cmdline?
This sounds good to me, if no objection from others. Maybe default 64MB
or more. One good part is, all these setup is under protection of
CONFIG_DMA_NUMA_CMA.
> >
> > To get the node info of cma area, add some helpr funciton and setup
> > in cma code.
> >
> > Reported-by: Changrong Chen <chenchangrong.ccr@alibaba-inc.com>
> > Suggested-by: Ying Huang <ying.huang@linux.alibaba.com>
> > Suggested-by: Robin Murphy <robin.murphy@arm.com>
> > Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> > ---
> > Changelog:
> >
> > since v2:
> > * setup the numa cma are following default cma, while
> > skipping the node holds the default cma (Robin Murphy)
> > * add cma_get_node() help and related code
> > * add reporter info
> >
> > since v1:
> > * don't use the original way of adding alloc_pages_node()
> > before trying default cma node (Robin Murphy)
> > * setup default numa cma area if not configured (Ying Huang)
> >
> > v2: https://lore.kernel.org/lkml/20260423095243.14239-1-feng.tang@linux.alibaba.com/
> > v1: https://lore.kernel.org/lkml/20260414090310.92055-1-feng.tang@linux.alibaba.com/
> >
> > include/linux/cma.h | 1 +
> > kernel/dma/contiguous.c | 14 ++++++++++++--
> > mm/cma.c | 11 ++++++++++-
> > 3 files changed, 23 insertions(+), 3 deletions(-)
[...]
> > +extern int cma_get_nid(const struct cma *cma)
> > +{
> > + return cma->nid;
> > +}
>
> Why do you have to store the nid instead of just looking it up from the base_pfn
> in here?
My thought was 'struct cma' already have 'nid' member, and when CONFIG_NUMA=y,
it may be useful to save the 'nid' info instead of NUMA_NO_NODE for the default
cma area (cmdline like cma=XXG@YYG could make it on different node)
>
> Also, what is the expectation when the ranges would span different NIDs? (is
> that possible?)
Per my understanding, it won't. There is a cma_validate_zones() to prevent it
from crossing zones.
Thanks,
Feng
>
> --
> Cheers,
>
> David
On 4/28/26 10:37, Feng Tang wrote:
> Hi David,
Hi!
[...]
>>
>> Okay, so on x86 it is not silent, because they don't even have a default CMA area?
>
> Right for default kernel configs.
>
> In kernel/dma/Kconfig:
>
> config CMA_SIZE_MBYTES
> int "Size in Mega Bytes"
> depends on !CMA_SIZE_SEL_PERCENTAGE
> default 0 if X86
> default 16
>
> config CMA_SIZE_PERCENTAGE
> int "Percentage of total memory"
> depends on !CMA_SIZE_SEL_MBYTES
> default 0 if X86
> default 10
>
>>>
>>> One thought is to follow the current cma reserving policy for platform
>>> with 'CONFIG_DMA_NUMA_CMA=y', that if the numa cma (either the 'numa cma'
>>> or 'cma pernuma' method) is not explicitly configured, set it up
>>> according to size of default 'dma_contiguous_default_area', while
>>> skipping the numa node where the 'dma_contiguous_default_area' lies
>>> in, this way the default behavior of platform with one NUMA node is
>>> kept unchanged.
>>
>> So, the kernel is configured to have a certain CONFIG_CMA_SIZE_MBYTES size, but
>> you go ahead and multiply that by the number of nodes? Sounds wrong.
>
> Yes. I thought about that, and didn't have good solution, and used this
> given it's on a multi-numa-node machine, which may not be too bad
> regarding memory usage.
It sounds wrong given the existing config options.
>
> Robin did concern about the memory usage for embedded/small devices in
> v2 review, and we change to v3 to not affect them.
>
>>
>> The whole proposal here looks rather hacky.
>
> I agree :)
>
>> Wouldn't a default for e.g., pernuma_size_bytes make more sense, that users can
>> then overwrite on the cmdline?
>
> This sounds good to me, if no objection from others. Maybe default 64MB
> or more. One good part is, all these setup is under protection of
> CONFIG_DMA_NUMA_CMA.
I cannot do the heavy thinking here because -EBUSY, but maybe you want a config
option similar to CMA_SIZE_MBYTES/CMA_SIZE_PERCENTAGE that either controls that
these sizes will be split over NUMA nodes, or another one, that sets the default
for pernuma.
[...]
>>> +extern int cma_get_nid(const struct cma *cma)
>>> +{
>>> + return cma->nid;
>>> +}
>>
>> Why do you have to store the nid instead of just looking it up from the base_pfn
>> in here?
>
> My thought was 'struct cma' already have 'nid' member, and when CONFIG_NUMA=y,
> it may be useful to save the 'nid' info instead of NUMA_NO_NODE for the default
> cma area (cmdline like cma=XXG@YYG could make it on different node)
Ah, yeah. It's a bit nasty that we have to handle the default area like that.
Another sign that we probably shouldn't deal with the default area :)
>
>>
>> Also, what is the expectation when the ranges would span different NIDs? (is
>> that possible?)
>
> Per my understanding, it won't. There is a cma_validate_zones() to prevent it
> from crossing zones.
It's a bit confusing, because it ignores other nids.
--
Cheers,
David
On Fri, May 01, 2026 at 08:51:39PM +0200, David Hildenbrand (Arm) wrote:
> On 4/28/26 10:37, Feng Tang wrote:
> > Hi David,
>
> Hi!
>
> [...]
>
> >>
> >> Okay, so on x86 it is not silent, because they don't even have a default CMA area?
> >
> > Right for default kernel configs.
> >
> > In kernel/dma/Kconfig:
> >
> > config CMA_SIZE_MBYTES
> > int "Size in Mega Bytes"
> > depends on !CMA_SIZE_SEL_PERCENTAGE
> > default 0 if X86
> > default 16
> >
> > config CMA_SIZE_PERCENTAGE
> > int "Percentage of total memory"
> > depends on !CMA_SIZE_SEL_MBYTES
> > default 0 if X86
> > default 10
> >
> >>>
> >>> One thought is to follow the current cma reserving policy for platform
> >>> with 'CONFIG_DMA_NUMA_CMA=y', that if the numa cma (either the 'numa cma'
> >>> or 'cma pernuma' method) is not explicitly configured, set it up
> >>> according to size of default 'dma_contiguous_default_area', while
> >>> skipping the numa node where the 'dma_contiguous_default_area' lies
> >>> in, this way the default behavior of platform with one NUMA node is
> >>> kept unchanged.
> >>
> >> So, the kernel is configured to have a certain CONFIG_CMA_SIZE_MBYTES size, but
> >> you go ahead and multiply that by the number of nodes? Sounds wrong.
> >
> > Yes. I thought about that, and didn't have good solution, and used this
> > given it's on a multi-numa-node machine, which may not be too bad
> > regarding memory usage.
>
> It sounds wrong given the existing config options.
Yes, it is confusing.
> >
> > Robin did concern about the memory usage for embedded/small devices in
> > v2 review, and we change to v3 to not affect them.
> >
> >>
> >> The whole proposal here looks rather hacky.
> >
> > I agree :)
> >
> >> Wouldn't a default for e.g., pernuma_size_bytes make more sense, that users can
> >> then overwrite on the cmdline?
> >
> > This sounds good to me, if no objection from others. Maybe default 64MB
> > or more. One good part is, all these setup is under protection of
> > CONFIG_DMA_NUMA_CMA.
>
> I cannot do the heavy thinking here because -EBUSY, but maybe you want a config
> option similar to CMA_SIZE_MBYTES/CMA_SIZE_PERCENTAGE that either controls that
> these sizes will be split over NUMA nodes, or another one, that sets the default
> for pernuma.
Maybe a CMA_NUMA_SIZE_MBYTES?
> [...]
>
> >>> +extern int cma_get_nid(const struct cma *cma)
> >>> +{
> >>> + return cma->nid;
> >>> +}
> >>
> >> Why do you have to store the nid instead of just looking it up from the base_pfn
> >> in here?
> >
> > My thought was 'struct cma' already have 'nid' member, and when CONFIG_NUMA=y,
> > it may be useful to save the 'nid' info instead of NUMA_NO_NODE for the default
> > cma area (cmdline like cma=XXG@YYG could make it on different node)
>
> Ah, yeah. It's a bit nasty that we have to handle the default area like that.
>
> Another sign that we probably shouldn't deal with the default area :)
Yep, in v2 I didn't touch the default area, while Robin had a concern
that the v2 approach will bindly add an extra per-numa cma area for
the node which already has the default area, which will hurt those
small/embedded devices which has limited number of memory. Adding
the nid check is trying to keep the behavior of one node device
unchanged.
> >
> >>
> >> Also, what is the expectation when the ranges would span different NIDs? (is
> >> that possible?)
> >
> > Per my understanding, it won't. There is a cma_validate_zones() to prevent it
> > from crossing zones.
>
> It's a bit confusing, because it ignores other nids.
I might have missed your point. Do you mean one cma are could have
multiple ranges? IIUC, the default cma area could have only one range
which was covered by this check, while hugetlb_cma could have multiple
ranges.
Thanks,
Feng
On 5/6/26 17:46, Feng Tang wrote: > On Fri, May 01, 2026 at 08:51:39PM +0200, David Hildenbrand (Arm) wrote: >> On 4/28/26 10:37, Feng Tang wrote: >>> Hi David, >> >> Hi! >> >> [...] >> >>> >>> Right for default kernel configs. >>> >>> In kernel/dma/Kconfig: >>> >>> config CMA_SIZE_MBYTES >>> int "Size in Mega Bytes" >>> depends on !CMA_SIZE_SEL_PERCENTAGE >>> default 0 if X86 >>> default 16 >>> >>> config CMA_SIZE_PERCENTAGE >>> int "Percentage of total memory" >>> depends on !CMA_SIZE_SEL_MBYTES >>> default 0 if X86 >>> default 10 >>> >>> >>> Yes. I thought about that, and didn't have good solution, and used this >>> given it's on a multi-numa-node machine, which may not be too bad >>> regarding memory usage. >> >> It sounds wrong given the existing config options. > > Yes, it is confusing. > >>> >>> Robin did concern about the memory usage for embedded/small devices in >>> v2 review, and we change to v3 to not affect them. >>> >>> >>> I agree :) >>> >>> >>> This sounds good to me, if no objection from others. Maybe default 64MB >>> or more. One good part is, all these setup is under protection of >>> CONFIG_DMA_NUMA_CMA. >> >> I cannot do the heavy thinking here because -EBUSY, but maybe you want a config >> option similar to CMA_SIZE_MBYTES/CMA_SIZE_PERCENTAGE that either controls that >> these sizes will be split over NUMA nodes, or another one, that sets the default >> for pernuma. > > Maybe a CMA_NUMA_SIZE_MBYTES? Maybe, I'm hoping some CMA DMA people have the capacity to provide input. > >> [...] >> >>> >>> My thought was 'struct cma' already have 'nid' member, and when CONFIG_NUMA=y, >>> it may be useful to save the 'nid' info instead of NUMA_NO_NODE for the default >>> cma area (cmdline like cma=XXG@YYG could make it on different node) >> >> Ah, yeah. It's a bit nasty that we have to handle the default area like that. >> >> Another sign that we probably shouldn't deal with the default area :) > > Yep, in v2 I didn't touch the default area, while Robin had a concern > that the v2 approach will bindly add an extra per-numa cma area for > the node which already has the default area, which will hurt those > small/embedded devices which has limited number of memory. Adding > the nid check is trying to keep the behavior of one node device > unchanged. > >>> >>> >>> Per my understanding, it won't. There is a cma_validate_zones() to prevent it >>> from crossing zones. >> >> It's a bit confusing, because it ignores other nids. > > I might have missed your point. Do you mean one cma are could have > multiple ranges? I don't know, it's confusing :) -- Cheers, David
On 2026-05-08 12:46 pm, David Hildenbrand (Arm) wrote: > On 5/6/26 17:46, Feng Tang wrote: >> On Fri, May 01, 2026 at 08:51:39PM +0200, David Hildenbrand (Arm) wrote: >>> On 4/28/26 10:37, Feng Tang wrote: >>>> Hi David, >>> >>> Hi! >>> >>> [...] >>> >>>> >>>> Right for default kernel configs. >>>> >>>> In kernel/dma/Kconfig: >>>> >>>> config CMA_SIZE_MBYTES >>>> int "Size in Mega Bytes" >>>> depends on !CMA_SIZE_SEL_PERCENTAGE >>>> default 0 if X86 >>>> default 16 >>>> >>>> config CMA_SIZE_PERCENTAGE >>>> int "Percentage of total memory" >>>> depends on !CMA_SIZE_SEL_MBYTES >>>> default 0 if X86 >>>> default 10 >>>> >>>> >>>> Yes. I thought about that, and didn't have good solution, and used this >>>> given it's on a multi-numa-node machine, which may not be too bad >>>> regarding memory usage. >>> >>> It sounds wrong given the existing config options. >> >> Yes, it is confusing. >> >>>> >>>> Robin did concern about the memory usage for embedded/small devices in >>>> v2 review, and we change to v3 to not affect them. >>>> >>>> >>>> I agree :) >>>> >>>> >>>> This sounds good to me, if no objection from others. Maybe default 64MB >>>> or more. One good part is, all these setup is under protection of >>>> CONFIG_DMA_NUMA_CMA. >>> >>> I cannot do the heavy thinking here because -EBUSY, but maybe you want a config >>> option similar to CMA_SIZE_MBYTES/CMA_SIZE_PERCENTAGE that either controls that >>> these sizes will be split over NUMA nodes, or another one, that sets the default >>> for pernuma. >> >> Maybe a CMA_NUMA_SIZE_MBYTES? > > Maybe, I'm hoping some CMA DMA people have the capacity to provide input. But really that _is_ pretty much the idea here - we're effecting a kernel-level default "pernuma" value, which just happens to also be CMA_SIZE_*. But in the process we also need to tweak the "pernuma" behaviour itself to work as a default, since quietly forcing the current opt-in behaviour on single node systems could only hurt them - multiple default CMA areas on the same node offers no performance benefit, while reducing non-movable allocation capacity which could well be detrimental. Indeed I am rather assuming that actual NUMA systems should have enough memory that this isn't a big deal, but I don't believe that's particularly unreasonable. End users should still be able to override with "numa_cma=0:0" if they don't want it, the only potential gap is if distros want to ship kernels with DMA_NUMA_CMA enabled for command-line opt-in but _without_ this new default behaviour. For that we could perhaps add something like: config CMA_SIZE_PERNUMA bool "Default CMA area per NUMA node" depends on DMA_NUMA_CMA default y help On systems with more than one NUMA node, the selected CMA area size will be also allocated on each additional node, so that most devices may have benefit from better DMA locality without an explicit command-line opt-in. Thanks, Robin. > >> >>> [...] >>> >>>> >>>> My thought was 'struct cma' already have 'nid' member, and when CONFIG_NUMA=y, >>>> it may be useful to save the 'nid' info instead of NUMA_NO_NODE for the default >>>> cma area (cmdline like cma=XXG@YYG could make it on different node) >>> >>> Ah, yeah. It's a bit nasty that we have to handle the default area like that. >>> >>> Another sign that we probably shouldn't deal with the default area :) >> >> Yep, in v2 I didn't touch the default area, while Robin had a concern >> that the v2 approach will bindly add an extra per-numa cma area for >> the node which already has the default area, which will hurt those >> small/embedded devices which has limited number of memory. Adding >> the nid check is trying to keep the behavior of one node device >> unchanged. >> >>>> >>>> >>>> Per my understanding, it won't. There is a cma_validate_zones() to prevent it >>>> from crossing zones. >>> >>> It's a bit confusing, because it ignores other nids. >> >> I might have missed your point. Do you mean one cma are could have >> multiple ranges? > > I don't know, it's confusing :) >
On 5/8/26 14:58, Robin Murphy wrote: > On 2026-05-08 12:46 pm, David Hildenbrand (Arm) wrote: >> On 5/6/26 17:46, Feng Tang wrote: >>> >>> Yes, it is confusing. >>> >>> >>> Maybe a CMA_NUMA_SIZE_MBYTES? >> >> Maybe, I'm hoping some CMA DMA people have the capacity to provide input. > > But really that _is_ pretty much the idea here Cool, I guess that's the right direction then, thanks. -- Cheers, David
On Tue, Apr 28, 2026 at 04:37:08PM +0800, Feng Tang wrote:
> > > include/linux/cma.h | 1 +
> > > kernel/dma/contiguous.c | 14 ++++++++++++--
> > > mm/cma.c | 11 ++++++++++-
> > > 3 files changed, 23 insertions(+), 3 deletions(-)
> [...]
> > > +extern int cma_get_nid(const struct cma *cma)
> > > +{
> > > + return cma->nid;
> > > +}
> >
> > Why do you have to store the nid instead of just looking it up from the base_pfn
> > in here?
>
> My thought was 'struct cma' already have 'nid' member, and when CONFIG_NUMA=y,
> it may be useful to save the 'nid' info instead of NUMA_NO_NODE for the default
> cma area (cmdline like cma=XXG@YYG could make it on different node)
One interesting thing is about the API to get nid, I initially worked on
6.19 base, and used 'pfn_to_nid()' which worked well, but I got a kernel
panic when rebasing to 7.1.
It turned out, latest kernel changed the order of default cma reserving to
before the initialization 'struct page' (only tested on arm64), and
pfn_to_page() will return NULL at the reserving time, so I changed to
early_pfn_to_nid(). Storing the 'nid' could save the logic of chosing the
right API in cma_get_nid().
Thanks,
Feng
© 2016 - 2026 Red Hat, Inc.