arch/x86/include/asm/page_32.h | 6 ++++ arch/x86/include/asm/page_64.h | 27 +++++++++------ arch/x86/lib/clear_page_64.S | 52 +++++++++++++++++++++-------- arch/x86/mm/Makefile | 1 + arch/x86/mm/memory.c | 60 ++++++++++++++++++++++++++++++++++ include/linux/mm.h | 1 + mm/memory.c | 38 ++++++++++++++++++--- 7 files changed, 156 insertions(+), 29 deletions(-) create mode 100644 arch/x86/mm/memory.c
This series adds multi-page clearing for hugepages. It is a rework
of [1] which took a detour through PREEMPT_LAZY [2].
Why multi-page clearing?: multi-page clearing improves upon the
current page-at-a-time approach by providing the processor with a
hint as to the real region size. A processor could use this hint to,
for instance, elide cacheline allocation when clearing a large
region.
This optimization in particular is done by REP; STOS on AMD Zen
where regions larger than L3-size use non-temporal stores.
This results in significantly better performance.
We also see performance improvement for cases where this optimization is
unavailable (pg-sz=2MB on AMD, and pg-sz=2MB|1GB on Intel) because
REP; STOS is typically microcoded which can now be amortized over
larger regions and the hint allows the hardware prefetcher to do a
better job.
Milan (EPYC 7J13, boost=0, preempt=full|lazy):
mm/folio_zero_user x86/folio_zero_user change
(GB/s +- stddev) (GB/s +- stddev)
pg-sz=1GB 16.51 +- 0.54% 42.80 +- 3.48% + 159.2%
pg-sz=2MB 11.89 +- 0.78% 16.12 +- 0.12% + 35.5%
Icelakex (Platinum 8358, no_turbo=1, preempt=full|lazy):
mm/folio_zero_user x86/folio_zero_user change
(GB/s +- stddev) (GB/s +- stddev)
pg-sz=1GB 8.01 +- 0.24% 11.26 +- 0.48% + 40.57%
pg-sz=2MB 7.95 +- 0.30% 10.90 +- 0.26% + 37.10%
Interaction with preemption: as discussed in [3], zeroing large
regions with string instructions doesn't work well with cooperative
preemption models which need regular invocations of cond_resched(). So,
this optimization is limited to only preemptible models (full, lazy).
This is done by overriding __folio_zero_user() -- which does the usual
page-at-a-time zeroing -- by an architecture optimized version but
only when running under preemptible models.
As such this ties an architecture specific optimization too closely
to preemption. Should be easy enough to change but seemed like the
simplest approach.
Comments appreciated!
Also at:
github.com/terminus/linux clear-pages-preempt.v1
[1] https://lore.kernel.org/lkml/20230830184958.2333078-1-ankur.a.arora@oracle.com/
[2] https://lore.kernel.org/lkml/87cyyfxd4k.ffs@tglx/
[3] https://lore.kernel.org/lkml/CAHk-=wj9En-BC4t7J9xFZOws5ShwaR9yor7FxHZr8CTVyEP_+Q@mail.gmail.com/
Ankur Arora (4):
x86/clear_page: extend clear_page*() for multi-page clearing
x86/clear_page: add clear_pages()
huge_page: allow arch override for folio_zero_user()
x86/folio_zero_user: multi-page clearing
arch/x86/include/asm/page_32.h | 6 ++++
arch/x86/include/asm/page_64.h | 27 +++++++++------
arch/x86/lib/clear_page_64.S | 52 +++++++++++++++++++++--------
arch/x86/mm/Makefile | 1 +
arch/x86/mm/memory.c | 60 ++++++++++++++++++++++++++++++++++
include/linux/mm.h | 1 +
mm/memory.c | 38 ++++++++++++++++++---
7 files changed, 156 insertions(+), 29 deletions(-)
create mode 100644 arch/x86/mm/memory.c
--
2.31.1
On 4/14/2025 9:16 AM, Ankur Arora wrote:
> This series adds multi-page clearing for hugepages. It is a rework
> of [1] which took a detour through PREEMPT_LAZY [2].
>
> Why multi-page clearing?: multi-page clearing improves upon the
> current page-at-a-time approach by providing the processor with a
> hint as to the real region size. A processor could use this hint to,
> for instance, elide cacheline allocation when clearing a large
> region.
>
> This optimization in particular is done by REP; STOS on AMD Zen
> where regions larger than L3-size use non-temporal stores.
>
> This results in significantly better performance.
>
> We also see performance improvement for cases where this optimization is
> unavailable (pg-sz=2MB on AMD, and pg-sz=2MB|1GB on Intel) because
> REP; STOS is typically microcoded which can now be amortized over
> larger regions and the hint allows the hardware prefetcher to do a
> better job.
>
> Milan (EPYC 7J13, boost=0, preempt=full|lazy):
>
> mm/folio_zero_user x86/folio_zero_user change
> (GB/s +- stddev) (GB/s +- stddev)
>
> pg-sz=1GB 16.51 +- 0.54% 42.80 +- 3.48% + 159.2%
> pg-sz=2MB 11.89 +- 0.78% 16.12 +- 0.12% + 35.5%
>
> Icelakex (Platinum 8358, no_turbo=1, preempt=full|lazy):
>
> mm/folio_zero_user x86/folio_zero_user change
> (GB/s +- stddev) (GB/s +- stddev)
>
> pg-sz=1GB 8.01 +- 0.24% 11.26 +- 0.48% + 40.57%
> pg-sz=2MB 7.95 +- 0.30% 10.90 +- 0.26% + 37.10%
>
[...]
Hello Ankur,
Thank you for the patches. Was able to test briefly w/ lazy preempt
mode.
(I do understand that, there could be lot of churn based on Ingo,
Mateusz and others' comments)
But here it goes:
SUT: AMD EPYC 9B24 (Genoa) preempt=lazy
metric = time taken in sec (lower is better). total SIZE=64GB
mm/folio_zero_user x86/folio_zero_user change
pg-sz=1GB 2.47044 +- 0.38% 1.060877 +- 0.07% 57.06
pg-sz=2MB 5.098403 +- 0.01% 2.52015 +- 0.36% 50.57
More details (1G example run):
base kernel = 6.14 (preempt = lazy)
mm/folio_zero_user
Performance counter stats for 'numactl -m 0 -N 0 map_hugetlb_1G' (10 runs):
2,476.47 msec task-clock # 1.002
CPUs utilized ( +- 0.39% )
5 context-switches # 2.025
/sec ( +- 29.70% )
2 cpu-migrations # 0.810
/sec ( +- 21.15% )
202 page-faults # 81.806
/sec ( +- 0.18% )
7,348,664,233 cycles # 2.976 GHz
( +- 0.38% ) (38.39%)
878,805,326 stalled-cycles-frontend # 11.99%
frontend cycles idle ( +- 0.74% ) (38.43%)
339,023,729 instructions # 0.05
insn per cycle
# 2.53 stalled
cycles per insn ( +- 0.08% ) (38.47%)
88,579,915 branches # 35.873
M/sec ( +- 0.06% ) (38.51%)
17,369,776 branch-misses # 19.55% of
all branches ( +- 0.04% ) (38.55%)
2,261,339,695 L1-dcache-loads # 915.795
M/sec ( +- 0.06% ) (38.56%)
1,073,880,164 L1-dcache-load-misses # 47.48% of
all L1-dcache accesses ( +- 0.05% ) (38.56%)
511,231,988 L1-icache-loads # 207.038
M/sec ( +- 0.25% ) (38.52%)
128,533 L1-icache-load-misses # 0.02% of
all L1-icache accesses ( +- 0.40% ) (38.48%)
38,134 dTLB-loads # 15.443
K/sec ( +- 4.22% ) (38.44%)
33,992 dTLB-load-misses # 114.39% of
all dTLB cache accesses ( +- 9.42% ) (38.40%)
156 iTLB-loads # 63.177
/sec ( +- 13.34% ) (38.36%)
156 iTLB-load-misses # 102.50% of
all iTLB cache accesses ( +- 25.98% ) (38.36%)
2.47044 +- 0.00949 seconds time elapsed ( +- 0.38% )
x86/folio_zero_user
1,056.72 msec task-clock # 0.996
CPUs utilized ( +- 0.07% )
10 context-switches # 9.436
/sec ( +- 3.59% )
3 cpu-migrations # 2.831
/sec ( +- 11.33% )
200 page-faults # 188.718
/sec ( +- 0.15% )
3,146,571,264 cycles # 2.969 GHz
( +- 0.07% ) (38.35%)
17,226,261 stalled-cycles-frontend # 0.55%
frontend cycles idle ( +- 4.12% ) (38.44%)
14,130,553 instructions # 0.00
insn per cycle
# 1.39 stalled
cycles per insn ( +- 1.59% ) (38.53%)
3,578,614 branches # 3.377
M/sec ( +- 1.54% ) (38.62%)
415,807 branch-misses # 12.45% of
all branches ( +- 1.17% ) (38.62%)
22,208,699 L1-dcache-loads # 20.956
M/sec ( +- 5.27% ) (38.60%)
7,312,684 L1-dcache-load-misses # 27.79% of
all L1-dcache accesses ( +- 8.46% ) (38.51%)
4,032,315 L1-icache-loads # 3.805
M/sec ( +- 1.29% ) (38.48%)
15,094 L1-icache-load-misses # 0.38% of
all L1-icache accesses ( +- 1.14% ) (38.39%)
14,365 dTLB-loads # 13.555
K/sec ( +- 7.23% ) (38.38%)
9,477 dTLB-load-misses # 65.36% of
all dTLB cache accesses ( +- 12.05% ) (38.38%)
18 iTLB-loads # 16.985
/sec ( +- 34.84% ) (38.38%)
67 iTLB-load-misses # 158.39% of
all iTLB cache accesses ( +- 48.32% ) (38.32%)
1.060877 +- 0.000766 seconds time elapsed ( +- 0.07% )
Thanks and Regards
- Raghu
Raghavendra K T <raghavendra.kt@amd.com> writes:
> On 4/14/2025 9:16 AM, Ankur Arora wrote:
>> This series adds multi-page clearing for hugepages. It is a rework
>> of [1] which took a detour through PREEMPT_LAZY [2].
>> Why multi-page clearing?: multi-page clearing improves upon the
>> current page-at-a-time approach by providing the processor with a
>> hint as to the real region size. A processor could use this hint to,
>> for instance, elide cacheline allocation when clearing a large
>> region.
>> This optimization in particular is done by REP; STOS on AMD Zen
>> where regions larger than L3-size use non-temporal stores.
>> This results in significantly better performance.
>> We also see performance improvement for cases where this optimization is
>> unavailable (pg-sz=2MB on AMD, and pg-sz=2MB|1GB on Intel) because
>> REP; STOS is typically microcoded which can now be amortized over
>> larger regions and the hint allows the hardware prefetcher to do a
>> better job.
>> Milan (EPYC 7J13, boost=0, preempt=full|lazy):
>> mm/folio_zero_user x86/folio_zero_user change
>> (GB/s +- stddev) (GB/s +- stddev)
>> pg-sz=1GB 16.51 +- 0.54% 42.80 +- 3.48% + 159.2%
>> pg-sz=2MB 11.89 +- 0.78% 16.12 +- 0.12% + 35.5%
>> Icelakex (Platinum 8358, no_turbo=1, preempt=full|lazy):
>> mm/folio_zero_user x86/folio_zero_user change
>> (GB/s +- stddev) (GB/s +- stddev)
>> pg-sz=1GB 8.01 +- 0.24% 11.26 +- 0.48% + 40.57%
>> pg-sz=2MB 7.95 +- 0.30% 10.90 +- 0.26% + 37.10%
>>
> [...]
>
> Hello Ankur,
>
> Thank you for the patches. Was able to test briefly w/ lazy preempt
> mode.
Thanks for testing.
> (I do understand that, there could be lot of churn based on Ingo,
> Mateusz and others' comments)
> But here it goes:
>
> SUT: AMD EPYC 9B24 (Genoa) preempt=lazy
>
> metric = time taken in sec (lower is better). total SIZE=64GB
> mm/folio_zero_user x86/folio_zero_user change
> pg-sz=1GB 2.47044 +- 0.38% 1.060877 +- 0.07% 57.06
> pg-sz=2MB 5.098403 +- 0.01% 2.52015 +- 0.36% 50.57
Just to translate it into the same units I was using above:
mm/folio_zero_user x86/folio_zero_user
pg-sz=1GB 25.91 GBps +- 0.38% 60.37 GBps +- 0.07%
pg-sz=2MB 12.57 GBps +- 0.01% 25.39 GBps +- 0.36%
That's a decent improvement over Milan. Btw, are you using boost=1?
Also, any idea why the huge delta in the mm/folio_zero_user 2MB, 1GB
cases? Both of these are doing 4k page at a time, so the huge delta
is a little head scratching.
There's a gap on Milan as well but it is much smaller.
Thanks
Ankur
> More details (1G example run):
>
> base kernel = 6.14 (preempt = lazy)
>
> mm/folio_zero_user
> Performance counter stats for 'numactl -m 0 -N 0 map_hugetlb_1G' (10 runs):
>
> 2,476.47 msec task-clock # 1.002 CPUs
> utilized ( +- 0.39% )
> 5 context-switches # 2.025 /sec ( +- 29.70% )
> 2 cpu-migrations # 0.810 /sec ( +- 21.15% )
> 202 page-faults # 81.806 /sec ( +- 0.18% )
> 7,348,664,233 cycles # 2.976 GHz ( +- 0.38% ) (38.39%)
> 878,805,326 stalled-cycles-frontend # 11.99% frontend cycles idle ( +- 0.74% ) (38.43%)
> 339,023,729 instructions # 0.05 insn per
> cycle
> # 2.53 stalled cycles per
> insn ( +- 0.08% )
> (38.47%)
> 88,579,915 branches # 35.873 M/sec
> ( +- 0.06% ) (38.51%)
> 17,369,776 branch-misses # 19.55% of all
> branches ( +- 0.04% ) (38.55%)
> 2,261,339,695 L1-dcache-loads # 915.795 M/sec
> ( +- 0.06% ) (38.56%)
> 1,073,880,164 L1-dcache-load-misses # 47.48% of all
> L1-dcache accesses ( +- 0.05% ) (38.56%)
> 511,231,988 L1-icache-loads # 207.038 M/sec
> ( +- 0.25% ) (38.52%)
> 128,533 L1-icache-load-misses # 0.02% of all
> L1-icache accesses ( +- 0.40% ) (38.48%)
> 38,134 dTLB-loads # 15.443 K/sec
> ( +- 4.22% ) (38.44%)
> 33,992 dTLB-load-misses # 114.39% of all dTLB
> cache accesses ( +- 9.42% ) (38.40%)
> 156 iTLB-loads # 63.177 /sec
> ( +- 13.34% ) (38.36%)
> 156 iTLB-load-misses # 102.50% of all iTLB
> cache accesses ( +- 25.98% ) (38.36%)
>
> 2.47044 +- 0.00949 seconds time elapsed ( +- 0.38% )
>
> x86/folio_zero_user
> 1,056.72 msec task-clock # 0.996 CPUs
> utilized ( +- 0.07% )
> 10 context-switches # 9.436 /sec
> ( +- 3.59% )
> 3 cpu-migrations # 2.831 /sec
> ( +- 11.33% )
> 200 page-faults # 188.718 /sec
> ( +- 0.15% )
> 3,146,571,264 cycles # 2.969 GHz
> ( +- 0.07% ) (38.35%)
> 17,226,261 stalled-cycles-frontend # 0.55% frontend
> cycles idle ( +- 4.12% ) (38.44%)
> 14,130,553 instructions # 0.00 insn per
> cycle
> # 1.39 stalled cycles per
> insn ( +- 1.59% )
> (38.53%)
> 3,578,614 branches # 3.377 M/sec
> ( +- 1.54% ) (38.62%)
> 415,807 branch-misses # 12.45% of all
> branches ( +- 1.17% ) (38.62%)
> 22,208,699 L1-dcache-loads # 20.956 M/sec
> ( +- 5.27% ) (38.60%)
> 7,312,684 L1-dcache-load-misses # 27.79% of all
> L1-dcache accesses ( +- 8.46% ) (38.51%)
> 4,032,315 L1-icache-loads # 3.805 M/sec
> ( +- 1.29% ) (38.48%)
> 15,094 L1-icache-load-misses # 0.38% of all
> L1-icache accesses ( +- 1.14% ) (38.39%)
> 14,365 dTLB-loads # 13.555 K/sec
> ( +- 7.23% ) (38.38%)
> 9,477 dTLB-load-misses # 65.36% of all dTLB
> cache accesses ( +- 12.05% ) (38.38%)
> 18 iTLB-loads # 16.985 /sec
> ( +- 34.84% ) (38.38%)
> 67 iTLB-load-misses # 158.39% of all iTLB
> cache accesses ( +- 48.32% ) (38.32%)
>
> 1.060877 +- 0.000766 seconds time elapsed ( +- 0.07% )
>
> Thanks and Regards
> - Raghu
--
ankur
On 4/23/2025 12:52 AM, Ankur Arora wrote: > > Raghavendra K T <raghavendra.kt@amd.com> writes: [...] >> >> SUT: AMD EPYC 9B24 (Genoa) preempt=lazy >> >> metric = time taken in sec (lower is better). total SIZE=64GB >> mm/folio_zero_user x86/folio_zero_user change >> pg-sz=1GB 2.47044 +- 0.38% 1.060877 +- 0.07% 57.06 >> pg-sz=2MB 5.098403 +- 0.01% 2.52015 +- 0.36% 50.57 > > > Just to translate it into the same units I was using above: > > mm/folio_zero_user x86/folio_zero_user > pg-sz=1GB 25.91 GBps +- 0.38% 60.37 GBps +- 0.07% > pg-sz=2MB 12.57 GBps +- 0.01% 25.39 GBps +- 0.36% > > That's a decent improvement over Milan. Btw, are you using boost=1? > yes boost=1 > Also, any idea why the huge delta in the mm/folio_zero_user 2MB, 1GB > cases? Both of these are doing 4k page at a time, so the huge delta > is a little head scratching. > > There's a gap on Milan as well but it is much smaller. > Need to think/analyze further, but from perf stat I see glaring difference in: 2M 1G pagefaults 32,906 202 iTLB-load-misses 44,490 156 - Raghu
On 4/23/2025 1:42 PM, Raghavendra K T wrote: [...] >> Also, any idea why the huge delta in the mm/folio_zero_user 2MB, 1GB >> cases? Both of these are doing 4k page at a time, so the huge delta >> is a little head scratching. >> >> There's a gap on Milan as well but it is much smaller. >> > Sorry forgot to add perhaps even more valid/important data point > Need to think/analyze further, but from perf stat I see > glaring difference in: > 2M 1G > pagefaults 32,906 202 > iTLB-load-misses 44,490 156 dTLB-loads 1,000,169 38,134 dTLB-load-misses 402,641 33,992 - Raghu
On 13 Apr 2025, at 23:46, Ankur Arora wrote: > This series adds multi-page clearing for hugepages. It is a rework > of [1] which took a detour through PREEMPT_LAZY [2]. > > Why multi-page clearing?: multi-page clearing improves upon the > current page-at-a-time approach by providing the processor with a > hint as to the real region size. A processor could use this hint to, > for instance, elide cacheline allocation when clearing a large > region. > > This optimization in particular is done by REP; STOS on AMD Zen > where regions larger than L3-size use non-temporal stores. > > This results in significantly better performance. Do you have init_on_alloc=1 in your kernel? With that, pages coming from buddy allocator are zeroed in post_alloc_hook() by kernel_init_pages(), which is a for loop of clear_highpage_kasan_tagged(), a wrap of clear_page(). And folio_zero_user() is not used. At least Debian, Fedora, and Ubuntu by default have CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y, which means init_on_alloc=1. Maybe kernel_init_pages() should get your optimization as well, unless you only target hugetlb pages. Best Regards, Yan, Zi
Zi Yan <ziy@nvidia.com> writes: > On 13 Apr 2025, at 23:46, Ankur Arora wrote: > >> This series adds multi-page clearing for hugepages. It is a rework >> of [1] which took a detour through PREEMPT_LAZY [2]. >> >> Why multi-page clearing?: multi-page clearing improves upon the >> current page-at-a-time approach by providing the processor with a >> hint as to the real region size. A processor could use this hint to, >> for instance, elide cacheline allocation when clearing a large >> region. >> >> This optimization in particular is done by REP; STOS on AMD Zen >> where regions larger than L3-size use non-temporal stores. >> >> This results in significantly better performance. > > Do you have init_on_alloc=1 in your kernel? > With that, pages coming from buddy allocator are zeroed > in post_alloc_hook() by kernel_init_pages(), which is a for loop > of clear_highpage_kasan_tagged(), a wrap of clear_page(). > And folio_zero_user() is not used. > > At least Debian, Fedora, and Ubuntu by default have > CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y, which means init_on_alloc=1. > > Maybe kernel_init_pages() should get your optimization as well, > unless you only target hugetlb pages. Thanks for the suggestion. I do plan to look for other places where we could be zeroing contiguous regions. Often the problem is that even if the underlying region is contiguous, it isn't so under CONFIG_HIGHMEM. For instance, clear_highpage_kasan_tagged() does a kmap/kunmap_local_page() around the clearing. This breaks the contiguous region into multiple 4K pages even when CONFIG_HIGHMEM is not defined. I wonder if we need a clear_highpages() kind of abstraction which lets HIGHMEM and non-HIGHMEM go their separate ways. -- ankur
* Ankur Arora <ankur.a.arora@oracle.com> wrote: > We also see performance improvement for cases where this optimization is > unavailable (pg-sz=2MB on AMD, and pg-sz=2MB|1GB on Intel) because > REP; STOS is typically microcoded which can now be amortized over > larger regions and the hint allows the hardware prefetcher to do a > better job. > > Milan (EPYC 7J13, boost=0, preempt=full|lazy): > > mm/folio_zero_user x86/folio_zero_user change > (GB/s +- stddev) (GB/s +- stddev) > > pg-sz=1GB 16.51 +- 0.54% 42.80 +- 3.48% + 159.2% > pg-sz=2MB 11.89 +- 0.78% 16.12 +- 0.12% + 35.5% > > Icelakex (Platinum 8358, no_turbo=1, preempt=full|lazy): > > mm/folio_zero_user x86/folio_zero_user change > (GB/s +- stddev) (GB/s +- stddev) > > pg-sz=1GB 8.01 +- 0.24% 11.26 +- 0.48% + 40.57% > pg-sz=2MB 7.95 +- 0.30% 10.90 +- 0.26% + 37.10% How was this measured? Could you integrate this measurement as a new tools/perf/bench/ subcommand so that people can try it on different systems, etc.? There's already a 'perf bench mem' subcommand space where this feature could be added to. Thanks, Ingo
Ingo Molnar <mingo@kernel.org> writes: > * Ankur Arora <ankur.a.arora@oracle.com> wrote: > >> We also see performance improvement for cases where this optimization is >> unavailable (pg-sz=2MB on AMD, and pg-sz=2MB|1GB on Intel) because >> REP; STOS is typically microcoded which can now be amortized over >> larger regions and the hint allows the hardware prefetcher to do a >> better job. >> >> Milan (EPYC 7J13, boost=0, preempt=full|lazy): >> >> mm/folio_zero_user x86/folio_zero_user change >> (GB/s +- stddev) (GB/s +- stddev) >> >> pg-sz=1GB 16.51 +- 0.54% 42.80 +- 3.48% + 159.2% >> pg-sz=2MB 11.89 +- 0.78% 16.12 +- 0.12% + 35.5% >> >> Icelakex (Platinum 8358, no_turbo=1, preempt=full|lazy): >> >> mm/folio_zero_user x86/folio_zero_user change >> (GB/s +- stddev) (GB/s +- stddev) >> >> pg-sz=1GB 8.01 +- 0.24% 11.26 +- 0.48% + 40.57% >> pg-sz=2MB 7.95 +- 0.30% 10.90 +- 0.26% + 37.10% > > How was this measured? Could you integrate this measurement as a new > tools/perf/bench/ subcommand so that people can try it on different > systems, etc.? There's already a 'perf bench mem' subcommand space > where this feature could be added to. This was a standalone trivial mmap workload similar to what qemu does when creating a VM, really any hugetlb mmap(). x86-64-stosq (lib/memset_64.S::__memset) should have the same performance characteristics but it uses malloc() for allocation. For this workload we want to control the allocation path as well. Let me see if it makes sense to extend perf bench mem memset to optionally allocate via mmap(MAP_HUGETLB) or add a new workload under perf bench mem which does that. Thanks for the review! -- ankur
* Ankur Arora <ankur.a.arora@oracle.com> wrote: > Ankur Arora (4): > x86/clear_page: extend clear_page*() for multi-page clearing > x86/clear_page: add clear_pages() > huge_page: allow arch override for folio_zero_user() > x86/folio_zero_user: multi-page clearing These are not how x86 commit titles should look like. Please take a look at the titles of previous commits to the x86 files you are modifying and follow that style. (Capitalization, use of verbs, etc.) Thanks, Ingo
Ingo Molnar <mingo@kernel.org> writes: > * Ankur Arora <ankur.a.arora@oracle.com> wrote: > >> Ankur Arora (4): >> x86/clear_page: extend clear_page*() for multi-page clearing >> x86/clear_page: add clear_pages() >> huge_page: allow arch override for folio_zero_user() >> x86/folio_zero_user: multi-page clearing > > These are not how x86 commit titles should look like. Please take a > look at the titles of previous commits to the x86 files you are > modifying and follow that style. (Capitalization, use of verbs, etc.) Ack. Will fix. -- ankur
© 2016 - 2025 Red Hat, Inc.