mm/mmap.c | 6 ++++++ 1 file changed, 6 insertions(+)
When unmapping VMA pages, pages will be gathered in batch and released by
tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
which calls lru_add_drain() to drain cached pages in folio_batch before
releasing gathered pages. Thus, it is redundant to call lru_add_drain()
before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
Remove lru_add_drain() prior to gathering and unmapping pages in
exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
Note that the page unmapping process in oom_killer (e.g., in
__oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
redundant lru_add_drain(). So, this commit makes the code more consistent.
Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
---
mm/mmap.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c
index 1971bfffcc03..da0308eef435 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
struct mmu_gather tlb;
unsigned long mt_start = mas->index;
+ /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
+#ifdef CONFIG_MMU_GATHER_NO_GATHER
lru_add_drain();
+#endif
tlb_gather_mmu(&tlb, mm);
update_hiwater_rss(mm);
unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
@@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm)
return;
}
+ /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
+#ifdef CONFIG_MMU_GATHER_NO_GATHER
lru_add_drain();
+#endif
flush_cache_mm(mm);
tlb_gather_mmu_fullmm(&tlb, mm);
/* update_hiwater_rss(mm) here? but nobody should be looking */
--
2.42.1
Hello,
kernel test robot noticed "kernel_BUG_at_mm/page_alloc.c" on:
commit: 5e3b1fa97b04faab8760b62ddab2cd2d62110400 ("[PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages")
url: https://github.com/intel-lab-lkp/linux/commits/Jianfeng-Wang/mm-remove-redundant-lru_add_drain-prior-to-unmapping-pages/20231215-062828
base: v6.7-rc5
patch link: https://lore.kernel.org/all/20231214222717.50277-1-jianfeng.w.wang@oracle.com/
patch subject: [PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages
in testcase: trinity
version: trinity-i386-abe9de86-1_20230429
with following parameters:
runtime: 600s
test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/
compiler: clang-16
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
+--------------------------------------------------------------------------------------+----------+------------+
| | v6.7-rc5 | 5e3b1fa97b |
+--------------------------------------------------------------------------------------+----------+------------+
| kernel_BUG_at_mm/page_alloc.c | 0 | 10 |
| invalid_opcode:#[##] | 0 | 10 |
| EIP:free_unref_page_prepare | 0 | 10 |
+--------------------------------------------------------------------------------------+----------+------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202312221154.85f3c4a3-oliver.sang@intel.com
[ 127.901286][ T2652] ------------[ cut here ]------------
[ 127.901579][ T2652] kernel BUG at mm/page_alloc.c:1084!
[ 127.901870][ T2652] invalid opcode: 0000 [#1] PREEMPT SMP
[ 127.902168][ T2652] CPU: 1 PID: 2652 Comm: trinity-c3 Tainted: G W N 6.7.0-rc5-00001-g5e3b1fa97b04 #1 e678fecaf3798641ec130c7b4b31f90b64a8d53d
[ 127.902921][ T2652] EIP: free_unref_page_prepare (mm/page_alloc.c:1084)
[ 127.903243][ T2652] Code: 8b 55 e4 b9 07 00 00 00 e8 a3 de ff ff 89 47 10 b0 01 83 c4 10 5e 5f 5b 5d 31 c9 31 d2 c3 89 f8 ba 4c 7c 68 c2 e8 86 76 fd ff <0f> 0b 68 c8 21 b2 c2 e8 8a f4 1f 00 0f 0b 0f a3 05 1c e4 c5 c2 0f
All code
========
0: 8b 55 e4 mov -0x1c(%rbp),%edx
3: b9 07 00 00 00 mov $0x7,%ecx
8: e8 a3 de ff ff call 0xffffffffffffdeb0
d: 89 47 10 mov %eax,0x10(%rdi)
10: b0 01 mov $0x1,%al
12: 83 c4 10 add $0x10,%esp
15: 5e pop %rsi
16: 5f pop %rdi
17: 5b pop %rbx
18: 5d pop %rbp
19: 31 c9 xor %ecx,%ecx
1b: 31 d2 xor %edx,%edx
1d: c3 ret
1e: 89 f8 mov %edi,%eax
20: ba 4c 7c 68 c2 mov $0xc2687c4c,%edx
25: e8 86 76 fd ff call 0xfffffffffffd76b0
2a:* 0f 0b ud2 <-- trapping instruction
2c: 68 c8 21 b2 c2 push $0xffffffffc2b221c8
31: e8 8a f4 1f 00 call 0x1ff4c0
36: 0f 0b ud2
38: 0f a3 05 1c e4 c5 c2 bt %eax,-0x3d3a1be4(%rip) # 0xffffffffc2c5e45b
3f: 0f .byte 0xf
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 68 c8 21 b2 c2 push $0xffffffffc2b221c8
7: e8 8a f4 1f 00 call 0x1ff496
c: 0f 0b ud2
e: 0f a3 05 1c e4 c5 c2 bt %eax,-0x3d3a1be4(%rip) # 0xffffffffc2c5e431
15: 0f .byte 0xf
[ 127.904296][ T2652] EAX: 00000000 EBX: e869ad40 ECX: 00000000 EDX: 00000000
[ 127.904681][ T2652] ESI: 00000001 EDI: e869ad40 EBP: ed779b3b ESP: ed779b1f
[ 127.905066][ T2652] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246
[ 127.905480][ T2652] CR0: 80050033 CR2: b69e1000 CR3: 2cd56000 CR4: 00040690
[ 127.905867][ T2652] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 127.906253][ T2652] DR6: fffe0ff0 DR7: 00000400
[ 127.906508][ T2652] Call Trace:
[ 127.906711][ T2652] ? fault_in_iov_iter_readable (lib/iov_iter.c:109)
[ 127.907028][ T2652] ? generic_perform_write (mm/filemap.c:?)
[ 127.907339][ T2652] ? shmem_file_write_iter (mm/shmem.c:?)
[ 127.907637][ T2652] ? do_iter_write (fs/read_write.c:736)
[ 127.907899][ T2652] ? vfs_writev (fs/read_write.c:933)
[ 127.908144][ T2652] ? do_writev (fs/read_write.c:976)
[ 127.908376][ T2652] ? __ia32_sys_writev (fs/read_write.c:1049 fs/read_write.c:1046 fs/read_write.c:1046)
[ 127.908644][ T2652] ? __do_fast_syscall_32 (arch/x86/entry/common.c:165)
[ 127.908927][ T2652] ? irqentry_exit_to_user_mode (kernel/entry/common.c:312)
[ 127.909235][ T2652] ? irqentry_exit (kernel/entry/common.c:445)
[ 127.909485][ T2652] ? do_fast_syscall_32 (arch/x86/entry/common.c:346)
[ 127.909757][ T2652] ? do_SYSENTER_32 (arch/x86/entry/common.c:384)
[ 127.910012][ T2652] ? entry_SYSENTER_32 (arch/x86/entry/entry_32.S:840)
[ 127.910282][ T2652] Modules linked in:
[ 127.910512][ T2652] ---[ end trace 0000000000000000 ]---
[ 127.910806][ T2652] EIP: free_unref_page_prepare (mm/page_alloc.c:1084)
[ 127.911140][ T2652] Code: 8b 55 e4 b9 07 00 00 00 e8 a3 de ff ff 89 47 10 b0 01 83 c4 10 5e 5f 5b 5d 31 c9 31 d2 c3 89 f8 ba 4c 7c 68 c2 e8 86 76 fd ff <0f> 0b 68 c8 21 b2 c2 e8 8a f4 1f 00 0f 0b 0f a3 05 1c e4 c5 c2 0f
All code
========
0: 8b 55 e4 mov -0x1c(%rbp),%edx
3: b9 07 00 00 00 mov $0x7,%ecx
8: e8 a3 de ff ff call 0xffffffffffffdeb0
d: 89 47 10 mov %eax,0x10(%rdi)
10: b0 01 mov $0x1,%al
12: 83 c4 10 add $0x10,%esp
15: 5e pop %rsi
16: 5f pop %rdi
17: 5b pop %rbx
18: 5d pop %rbp
19: 31 c9 xor %ecx,%ecx
1b: 31 d2 xor %edx,%edx
1d: c3 ret
1e: 89 f8 mov %edi,%eax
20: ba 4c 7c 68 c2 mov $0xc2687c4c,%edx
25: e8 86 76 fd ff call 0xfffffffffffd76b0
2a:* 0f 0b ud2 <-- trapping instruction
2c: 68 c8 21 b2 c2 push $0xffffffffc2b221c8
31: e8 8a f4 1f 00 call 0x1ff4c0
36: 0f 0b ud2
38: 0f a3 05 1c e4 c5 c2 bt %eax,-0x3d3a1be4(%rip) # 0xffffffffc2c5e45b
3f: 0f .byte 0xf
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 68 c8 21 b2 c2 push $0xffffffffc2b221c8
7: e8 8a f4 1f 00 call 0x1ff496
c: 0f 0b ud2
e: 0f a3 05 1c e4 c5 c2 bt %eax,-0x3d3a1be4(%rip) # 0xffffffffc2c5e431
15: 0f .byte 0xf
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231222/202312221154.85f3c4a3-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote: > When unmapping VMA pages, pages will be gathered in batch and released by > tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function > tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), > which calls lru_add_drain() to drain cached pages in folio_batch before > releasing gathered pages. Thus, it is redundant to call lru_add_drain() > before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. > > Remove lru_add_drain() prior to gathering and unmapping pages in > exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. > > Note that the page unmapping process in oom_killer (e.g., in > __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have > redundant lru_add_drain(). So, this commit makes the code more consistent. Shouldn't we put this in __tlb_gather_mmu() which already has the CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg zap_page_range_single() too. > Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> > --- > mm/mmap.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 1971bfffcc03..da0308eef435 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, > struct mmu_gather tlb; > unsigned long mt_start = mas->index; > > + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ > +#ifdef CONFIG_MMU_GATHER_NO_GATHER > lru_add_drain(); > +#endif > tlb_gather_mmu(&tlb, mm); > update_hiwater_rss(mm); > unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); > @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm) > return; > } > > + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ > +#ifdef CONFIG_MMU_GATHER_NO_GATHER > lru_add_drain(); > +#endif > flush_cache_mm(mm); > tlb_gather_mmu_fullmm(&tlb, mm); > /* update_hiwater_rss(mm) here? but nobody should be looking */ > -- > 2.42.1 > >
On 12/14/23 3:00 PM, Matthew Wilcox wrote: > On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote: >> When unmapping VMA pages, pages will be gathered in batch and released by >> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function >> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), >> which calls lru_add_drain() to drain cached pages in folio_batch before >> releasing gathered pages. Thus, it is redundant to call lru_add_drain() >> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Remove lru_add_drain() prior to gathering and unmapping pages in >> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Note that the page unmapping process in oom_killer (e.g., in >> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have >> redundant lru_add_drain(). So, this commit makes the code more consistent. > > Shouldn't we put this in __tlb_gather_mmu() which already has the > CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg > zap_page_range_single() too. > After looking at different use cases of tlb_gather_mmu(), I feel it is questionable to move lru_add_drain() into __tlb_gather_mmu(). There are two use cases of tlb_gather_mmu(): one for unmapping and releasing pages (e.g., the two cases in mmap.c); the other one is to update page table entries and flush TLB without releasing pages (e.g., together with mprotect_fixup()). For the latter use case, it is reasonable to not call lru_add_drain() prior to or within tlb_gather_mmu(). Of course, we may update tlb_gather_mmu()'s API to take this into account. For example, we can have tlb_gather_mmu_for_release() for the first case and tlb_gather_mmu() for the latter. I'd like to have your opinion on this. Thanks! >> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> >> --- >> mm/mmap.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 1971bfffcc03..da0308eef435 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, >> struct mmu_gather tlb; >> unsigned long mt_start = mas->index; >> >> + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >> lru_add_drain(); >> +#endif >> tlb_gather_mmu(&tlb, mm); >> update_hiwater_rss(mm); >> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); >> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm) >> return; >> } >> >> + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >> lru_add_drain(); >> +#endif >> flush_cache_mm(mm); >> tlb_gather_mmu_fullmm(&tlb, mm); >> /* update_hiwater_rss(mm) here? but nobody should be looking */ >> -- >> 2.42.1 >> >>
On Fri, Dec 15, 2023 at 12:48:10AM -0800, Jianfeng Wang wrote: > On 12/14/23 3:00 PM, Matthew Wilcox wrote: > > Shouldn't we put this in __tlb_gather_mmu() which already has the > > CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg > > zap_page_range_single() too. > > After looking at different use cases of tlb_gather_mmu(), I feel it is > questionable to move lru_add_drain() into __tlb_gather_mmu(). There are > two use cases of tlb_gather_mmu(): one for unmapping and releasing pages > (e.g., the two cases in mmap.c); the other one is to update page table > entries and flush TLB without releasing pages (e.g., together with > mprotect_fixup()). For the latter use case, it is reasonable to not call > lru_add_drain() prior to or within tlb_gather_mmu(). > > Of course, we may update tlb_gather_mmu()'s API to take this into account. > For example, we can have tlb_gather_mmu_for_release() for the first case > and tlb_gather_mmu() for the latter. I'd like to have your opinion on this. Yes, I like this idea. You're right that there's no need to drain the lru lists for the other use case, so it makes sense to have two APIs.
On 12/14/23 3:00 PM, Matthew Wilcox wrote: > On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote: >> When unmapping VMA pages, pages will be gathered in batch and released by >> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function >> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), >> which calls lru_add_drain() to drain cached pages in folio_batch before >> releasing gathered pages. Thus, it is redundant to call lru_add_drain() >> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Remove lru_add_drain() prior to gathering and unmapping pages in >> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Note that the page unmapping process in oom_killer (e.g., in >> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have >> redundant lru_add_drain(). So, this commit makes the code more consistent. > > Shouldn't we put this in __tlb_gather_mmu() which already has the > CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg > zap_page_range_single() too. > Thanks. It makes sense to me. This commit is motivated by a workload that use mmap/unmap heavily. While the mmu_gather feature is also used by hugetlb, madvise, mprotect, etc., thus I prefer to have another standalone commit (following this one) that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for not making redundant lru_add_drain() calls when using mmu_gather. >> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> >> --- >> mm/mmap.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 1971bfffcc03..da0308eef435 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, >> struct mmu_gather tlb; >> unsigned long mt_start = mas->index; >> >> + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >> lru_add_drain(); >> +#endif >> tlb_gather_mmu(&tlb, mm); >> update_hiwater_rss(mm); >> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); >> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm) >> return; >> } >> >> + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >> lru_add_drain(); >> +#endif >> flush_cache_mm(mm); >> tlb_gather_mmu_fullmm(&tlb, mm); >> /* update_hiwater_rss(mm) here? but nobody should be looking */ >> -- >> 2.42.1 >> >>
On Thu, Dec 14, 2023 at 03:59:00PM -0800, Jianfeng Wang wrote: > On 12/14/23 3:00 PM, Matthew Wilcox wrote: > > On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote: > >> When unmapping VMA pages, pages will be gathered in batch and released by > >> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function > >> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), > >> which calls lru_add_drain() to drain cached pages in folio_batch before > >> releasing gathered pages. Thus, it is redundant to call lru_add_drain() > >> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. > >> > >> Remove lru_add_drain() prior to gathering and unmapping pages in > >> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. > >> > >> Note that the page unmapping process in oom_killer (e.g., in > >> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have > >> redundant lru_add_drain(). So, this commit makes the code more consistent. > > > > Shouldn't we put this in __tlb_gather_mmu() which already has the > > CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg > > zap_page_range_single() too. > > > > Thanks. It makes sense to me. > This commit is motivated by a workload that use mmap/unmap heavily. > While the mmu_gather feature is also used by hugetlb, madvise, mprotect, > etc., thus I prefer to have another standalone commit (following this one) > that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for > not making redundant lru_add_drain() calls when using mmu_gather. That's not normally the approach we take.
On 12/14/23 7:06 PM, Matthew Wilcox wrote: > On Thu, Dec 14, 2023 at 03:59:00PM -0800, Jianfeng Wang wrote: >> On 12/14/23 3:00 PM, Matthew Wilcox wrote: >>> On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote: >>>> When unmapping VMA pages, pages will be gathered in batch and released by >>>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function >>>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), >>>> which calls lru_add_drain() to drain cached pages in folio_batch before >>>> releasing gathered pages. Thus, it is redundant to call lru_add_drain() >>>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. >>>> >>>> Remove lru_add_drain() prior to gathering and unmapping pages in >>>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. >>>> >>>> Note that the page unmapping process in oom_killer (e.g., in >>>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have >>>> redundant lru_add_drain(). So, this commit makes the code more consistent. >>> >>> Shouldn't we put this in __tlb_gather_mmu() which already has the >>> CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg >>> zap_page_range_single() too. >>> >> >> Thanks. It makes sense to me. >> This commit is motivated by a workload that use mmap/unmap heavily. >> While the mmu_gather feature is also used by hugetlb, madvise, mprotect, >> etc., thus I prefer to have another standalone commit (following this one) >> that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for >> not making redundant lru_add_drain() calls when using mmu_gather. > > That's not normally the approach we take. Okay, understood. Thanks for pointing it out. Let me send a new patch that aligns with your suggestion.
On 12/14/23 3:00 PM, Matthew Wilcox wrote: > On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote: >> When unmapping VMA pages, pages will be gathered in batch and released by >> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function >> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), >> which calls lru_add_drain() to drain cached pages in folio_batch before >> releasing gathered pages. Thus, it is redundant to call lru_add_drain() >> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Remove lru_add_drain() prior to gathering and unmapping pages in >> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Note that the page unmapping process in oom_killer (e.g., in >> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have >> redundant lru_add_drain(). So, this commit makes the code more consistent. > > Shouldn't we put this in __tlb_gather_mmu() which already has the > CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg > zap_page_range_single() too. > Thanks. It makes sense to me. This commit is motivated by a workload that use mmap/unmap heavily. While the mmu_gather feature is also used by hugetlb, madvise, mprotect, etc., thus I prefer to have another standalone commit (following this one) that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for not making redundant lru_add_drain() calls when using mmu_gather. >> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> >> --- >> mm/mmap.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 1971bfffcc03..da0308eef435 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, >> struct mmu_gather tlb; >> unsigned long mt_start = mas->index; >> >> + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >> lru_add_drain(); >> +#endif >> tlb_gather_mmu(&tlb, mm); >> update_hiwater_rss(mm); >> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); >> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm) >> return; >> } >> >> + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >> lru_add_drain(); >> +#endif >> flush_cache_mm(mm); >> tlb_gather_mmu_fullmm(&tlb, mm); >> /* update_hiwater_rss(mm) here? but nobody should be looking */ >> -- >> 2.42.1 >> >>
© 2016 - 2025 Red Hat, Inc.