[v2] mm: remove redundant lru_add_drain() prior to unmapping pages

[PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages

Posted by Jianfeng Wang 2 years, 1 month ago

When unmapping VMA pages, pages will be gathered in batch and released by
tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
which calls lru_add_drain() to drain cached pages in folio_batch before
releasing gathered pages. Thus, it is redundant to call lru_add_drain()
before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.

Remove lru_add_drain() prior to gathering and unmapping pages in
exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.

Note that the page unmapping process in oom_killer (e.g., in
__oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
redundant lru_add_drain(). So, this commit makes the code more consistent.

Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
---
 mm/mmap.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index 1971bfffcc03..da0308eef435 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
 	struct mmu_gather tlb;
 	unsigned long mt_start = mas->index;
 
+	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
+#ifdef CONFIG_MMU_GATHER_NO_GATHER
 	lru_add_drain();
+#endif
 	tlb_gather_mmu(&tlb, mm);
 	update_hiwater_rss(mm);
 	unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
@@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm)
 		return;
 	}
 
+	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
+#ifdef CONFIG_MMU_GATHER_NO_GATHER
 	lru_add_drain();
+#endif
 	flush_cache_mm(mm);
 	tlb_gather_mmu_fullmm(&tlb, mm);
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
-- 
2.42.1

Re: [PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages

Posted by kernel test robot 2 years, 1 month ago


Hello,

kernel test robot noticed "kernel_BUG_at_mm/page_alloc.c" on:

commit: 5e3b1fa97b04faab8760b62ddab2cd2d62110400 ("[PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages")
url: https://github.com/intel-lab-lkp/linux/commits/Jianfeng-Wang/mm-remove-redundant-lru_add_drain-prior-to-unmapping-pages/20231215-062828
base: v6.7-rc5
patch link: https://lore.kernel.org/all/20231214222717.50277-1-jianfeng.w.wang@oracle.com/
patch subject: [PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages

in testcase: trinity
version: trinity-i386-abe9de86-1_20230429
with following parameters:

	runtime: 600s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


compiler: clang-16
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+--------------------------------------------------------------------------------------+----------+------------+
|                                                                                      | v6.7-rc5 | 5e3b1fa97b |
+--------------------------------------------------------------------------------------+----------+------------+
| kernel_BUG_at_mm/page_alloc.c                                                        | 0        | 10         |
| invalid_opcode:#[##]                                                                 | 0        | 10         |
| EIP:free_unref_page_prepare                                                          | 0        | 10         |
+--------------------------------------------------------------------------------------+----------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202312221154.85f3c4a3-oliver.sang@intel.com


[  127.901286][ T2652] ------------[ cut here ]------------
[  127.901579][ T2652] kernel BUG at mm/page_alloc.c:1084!
[  127.901870][ T2652] invalid opcode: 0000 [#1] PREEMPT SMP
[  127.902168][ T2652] CPU: 1 PID: 2652 Comm: trinity-c3 Tainted: G        W        N 6.7.0-rc5-00001-g5e3b1fa97b04 #1 e678fecaf3798641ec130c7b4b31f90b64a8d53d
[ 127.902921][ T2652] EIP: free_unref_page_prepare (mm/page_alloc.c:1084) 
[ 127.903243][ T2652] Code: 8b 55 e4 b9 07 00 00 00 e8 a3 de ff ff 89 47 10 b0 01 83 c4 10 5e 5f 5b 5d 31 c9 31 d2 c3 89 f8 ba 4c 7c 68 c2 e8 86 76 fd ff <0f> 0b 68 c8 21 b2 c2 e8 8a f4 1f 00 0f 0b 0f a3 05 1c e4 c5 c2 0f
All code
========
   0:	8b 55 e4             	mov    -0x1c(%rbp),%edx
   3:	b9 07 00 00 00       	mov    $0x7,%ecx
   8:	e8 a3 de ff ff       	call   0xffffffffffffdeb0
   d:	89 47 10             	mov    %eax,0x10(%rdi)
  10:	b0 01                	mov    $0x1,%al
  12:	83 c4 10             	add    $0x10,%esp
  15:	5e                   	pop    %rsi
  16:	5f                   	pop    %rdi
  17:	5b                   	pop    %rbx
  18:	5d                   	pop    %rbp
  19:	31 c9                	xor    %ecx,%ecx
  1b:	31 d2                	xor    %edx,%edx
  1d:	c3                   	ret
  1e:	89 f8                	mov    %edi,%eax
  20:	ba 4c 7c 68 c2       	mov    $0xc2687c4c,%edx
  25:	e8 86 76 fd ff       	call   0xfffffffffffd76b0
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	68 c8 21 b2 c2       	push   $0xffffffffc2b221c8
  31:	e8 8a f4 1f 00       	call   0x1ff4c0
  36:	0f 0b                	ud2
  38:	0f a3 05 1c e4 c5 c2 	bt     %eax,-0x3d3a1be4(%rip)        # 0xffffffffc2c5e45b
  3f:	0f                   	.byte 0xf

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	68 c8 21 b2 c2       	push   $0xffffffffc2b221c8
   7:	e8 8a f4 1f 00       	call   0x1ff496
   c:	0f 0b                	ud2
   e:	0f a3 05 1c e4 c5 c2 	bt     %eax,-0x3d3a1be4(%rip)        # 0xffffffffc2c5e431
  15:	0f                   	.byte 0xf
[  127.904296][ T2652] EAX: 00000000 EBX: e869ad40 ECX: 00000000 EDX: 00000000
[  127.904681][ T2652] ESI: 00000001 EDI: e869ad40 EBP: ed779b3b ESP: ed779b1f
[  127.905066][ T2652] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246
[  127.905480][ T2652] CR0: 80050033 CR2: b69e1000 CR3: 2cd56000 CR4: 00040690
[  127.905867][ T2652] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  127.906253][ T2652] DR6: fffe0ff0 DR7: 00000400
[  127.906508][ T2652] Call Trace:
[ 127.906711][ T2652] ? fault_in_iov_iter_readable (lib/iov_iter.c:109) 
[ 127.907028][ T2652] ? generic_perform_write (mm/filemap.c:?) 
[ 127.907339][ T2652] ? shmem_file_write_iter (mm/shmem.c:?) 
[ 127.907637][ T2652] ? do_iter_write (fs/read_write.c:736) 
[ 127.907899][ T2652] ? vfs_writev (fs/read_write.c:933) 
[ 127.908144][ T2652] ? do_writev (fs/read_write.c:976) 
[ 127.908376][ T2652] ? __ia32_sys_writev (fs/read_write.c:1049 fs/read_write.c:1046 fs/read_write.c:1046) 
[ 127.908644][ T2652] ? __do_fast_syscall_32 (arch/x86/entry/common.c:165) 
[ 127.908927][ T2652] ? irqentry_exit_to_user_mode (kernel/entry/common.c:312) 
[ 127.909235][ T2652] ? irqentry_exit (kernel/entry/common.c:445) 
[ 127.909485][ T2652] ? do_fast_syscall_32 (arch/x86/entry/common.c:346) 
[ 127.909757][ T2652] ? do_SYSENTER_32 (arch/x86/entry/common.c:384) 
[ 127.910012][ T2652] ? entry_SYSENTER_32 (arch/x86/entry/entry_32.S:840) 
[  127.910282][ T2652] Modules linked in:
[  127.910512][ T2652] ---[ end trace 0000000000000000 ]---
[ 127.910806][ T2652] EIP: free_unref_page_prepare (mm/page_alloc.c:1084) 
[ 127.911140][ T2652] Code: 8b 55 e4 b9 07 00 00 00 e8 a3 de ff ff 89 47 10 b0 01 83 c4 10 5e 5f 5b 5d 31 c9 31 d2 c3 89 f8 ba 4c 7c 68 c2 e8 86 76 fd ff <0f> 0b 68 c8 21 b2 c2 e8 8a f4 1f 00 0f 0b 0f a3 05 1c e4 c5 c2 0f
All code
========
   0:	8b 55 e4             	mov    -0x1c(%rbp),%edx
   3:	b9 07 00 00 00       	mov    $0x7,%ecx
   8:	e8 a3 de ff ff       	call   0xffffffffffffdeb0
   d:	89 47 10             	mov    %eax,0x10(%rdi)
  10:	b0 01                	mov    $0x1,%al
  12:	83 c4 10             	add    $0x10,%esp
  15:	5e                   	pop    %rsi
  16:	5f                   	pop    %rdi
  17:	5b                   	pop    %rbx
  18:	5d                   	pop    %rbp
  19:	31 c9                	xor    %ecx,%ecx
  1b:	31 d2                	xor    %edx,%edx
  1d:	c3                   	ret
  1e:	89 f8                	mov    %edi,%eax
  20:	ba 4c 7c 68 c2       	mov    $0xc2687c4c,%edx
  25:	e8 86 76 fd ff       	call   0xfffffffffffd76b0
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	68 c8 21 b2 c2       	push   $0xffffffffc2b221c8
  31:	e8 8a f4 1f 00       	call   0x1ff4c0
  36:	0f 0b                	ud2
  38:	0f a3 05 1c e4 c5 c2 	bt     %eax,-0x3d3a1be4(%rip)        # 0xffffffffc2c5e45b
  3f:	0f                   	.byte 0xf

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	68 c8 21 b2 c2       	push   $0xffffffffc2b221c8
   7:	e8 8a f4 1f 00       	call   0x1ff496
   c:	0f 0b                	ud2
   e:	0f a3 05 1c e4 c5 c2 	bt     %eax,-0x3d3a1be4(%rip)        # 0xffffffffc2c5e431
  15:	0f                   	.byte 0xf


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231222/202312221154.85f3c4a3-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages

Posted by Matthew Wilcox 2 years, 1 month ago

On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote:
> When unmapping VMA pages, pages will be gathered in batch and released by
> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
> which calls lru_add_drain() to drain cached pages in folio_batch before
> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
> 
> Remove lru_add_drain() prior to gathering and unmapping pages in
> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
> 
> Note that the page unmapping process in oom_killer (e.g., in
> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
> redundant lru_add_drain(). So, this commit makes the code more consistent.

Shouldn't we put this in __tlb_gather_mmu() which already has the
CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
zap_page_range_single() too.

> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
> ---
>  mm/mmap.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 1971bfffcc03..da0308eef435 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
>  	struct mmu_gather tlb;
>  	unsigned long mt_start = mas->index;
>  
> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>  	lru_add_drain();
> +#endif
>  	tlb_gather_mmu(&tlb, mm);
>  	update_hiwater_rss(mm);
>  	unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm)
>  		return;
>  	}
>  
> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>  	lru_add_drain();
> +#endif
>  	flush_cache_mm(mm);
>  	tlb_gather_mmu_fullmm(&tlb, mm);
>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
> -- 
> 2.42.1
> 
>

Re: [PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages

Posted by Jianfeng Wang 2 years, 1 month ago

On 12/14/23 3:00 PM, Matthew Wilcox wrote:
> On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote:
>> When unmapping VMA pages, pages will be gathered in batch and released by
>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
>> which calls lru_add_drain() to drain cached pages in folio_batch before
>> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Remove lru_add_drain() prior to gathering and unmapping pages in
>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Note that the page unmapping process in oom_killer (e.g., in
>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
>> redundant lru_add_drain(). So, this commit makes the code more consistent.
> 
> Shouldn't we put this in __tlb_gather_mmu() which already has the
> CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
> zap_page_range_single() too.
> 

After looking at different use cases of tlb_gather_mmu(), I feel it is
questionable to move lru_add_drain() into __tlb_gather_mmu(). There are
two use cases of tlb_gather_mmu(): one for unmapping and releasing pages
(e.g., the two cases in mmap.c); the other one is to update page table
entries and flush TLB without releasing pages (e.g., together with
mprotect_fixup()). For the latter use case, it is reasonable to not call
lru_add_drain() prior to or within tlb_gather_mmu().

Of course, we may update tlb_gather_mmu()'s API to take this into account.
For example, we can have tlb_gather_mmu_for_release() for the first case
and tlb_gather_mmu() for the latter. I'd like to have your opinion on this.
Thanks!

>> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
>> ---
>>  mm/mmap.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 1971bfffcc03..da0308eef435 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
>>  	struct mmu_gather tlb;
>>  	unsigned long mt_start = mas->index;
>>  
>> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>  	lru_add_drain();
>> +#endif
>>  	tlb_gather_mmu(&tlb, mm);
>>  	update_hiwater_rss(mm);
>>  	unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
>> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm)
>>  		return;
>>  	}
>>  
>> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>  	lru_add_drain();
>> +#endif
>>  	flush_cache_mm(mm);
>>  	tlb_gather_mmu_fullmm(&tlb, mm);
>>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
>> -- 
>> 2.42.1
>>
>>

Re: [PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages

Posted by Matthew Wilcox 2 years, 1 month ago

On Fri, Dec 15, 2023 at 12:48:10AM -0800, Jianfeng Wang wrote:
> On 12/14/23 3:00 PM, Matthew Wilcox wrote:
> > Shouldn't we put this in __tlb_gather_mmu() which already has the
> > CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
> > zap_page_range_single() too.
> 
> After looking at different use cases of tlb_gather_mmu(), I feel it is
> questionable to move lru_add_drain() into __tlb_gather_mmu(). There are
> two use cases of tlb_gather_mmu(): one for unmapping and releasing pages
> (e.g., the two cases in mmap.c); the other one is to update page table
> entries and flush TLB without releasing pages (e.g., together with
> mprotect_fixup()). For the latter use case, it is reasonable to not call
> lru_add_drain() prior to or within tlb_gather_mmu().
> 
> Of course, we may update tlb_gather_mmu()'s API to take this into account.
> For example, we can have tlb_gather_mmu_for_release() for the first case
> and tlb_gather_mmu() for the latter. I'd like to have your opinion on this.

Yes, I like this idea.  You're right that there's no need to drain the
lru lists for the other use case, so it makes sense to have two APIs.

Re: [PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages

Posted by Jianfeng Wang 2 years, 1 month ago

On 12/14/23 3:00 PM, Matthew Wilcox wrote:
> On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote:
>> When unmapping VMA pages, pages will be gathered in batch and released by
>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
>> which calls lru_add_drain() to drain cached pages in folio_batch before
>> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Remove lru_add_drain() prior to gathering and unmapping pages in
>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Note that the page unmapping process in oom_killer (e.g., in
>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
>> redundant lru_add_drain(). So, this commit makes the code more consistent.
> 
> Shouldn't we put this in __tlb_gather_mmu() which already has the
> CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
> zap_page_range_single() too.
> 

Thanks. It makes sense to me.
This commit is motivated by a workload that use mmap/unmap heavily.
While the mmu_gather feature is also used by hugetlb, madvise, mprotect,
etc., thus I prefer to have another standalone commit (following this one)
that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for
not making redundant lru_add_drain() calls when using mmu_gather.

>> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
>> ---
>>  mm/mmap.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 1971bfffcc03..da0308eef435 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
>>  	struct mmu_gather tlb;
>>  	unsigned long mt_start = mas->index;
>>  
>> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>  	lru_add_drain();
>> +#endif
>>  	tlb_gather_mmu(&tlb, mm);
>>  	update_hiwater_rss(mm);
>>  	unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
>> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm)
>>  		return;
>>  	}
>>  
>> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>  	lru_add_drain();
>> +#endif
>>  	flush_cache_mm(mm);
>>  	tlb_gather_mmu_fullmm(&tlb, mm);
>>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
>> -- 
>> 2.42.1
>>
>>

Re: [PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages

Posted by Matthew Wilcox 2 years, 1 month ago

On Thu, Dec 14, 2023 at 03:59:00PM -0800, Jianfeng Wang wrote:
> On 12/14/23 3:00 PM, Matthew Wilcox wrote:
> > On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote:
> >> When unmapping VMA pages, pages will be gathered in batch and released by
> >> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
> >> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
> >> which calls lru_add_drain() to drain cached pages in folio_batch before
> >> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
> >> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
> >>
> >> Remove lru_add_drain() prior to gathering and unmapping pages in
> >> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
> >>
> >> Note that the page unmapping process in oom_killer (e.g., in
> >> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
> >> redundant lru_add_drain(). So, this commit makes the code more consistent.
> > 
> > Shouldn't we put this in __tlb_gather_mmu() which already has the
> > CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
> > zap_page_range_single() too.
> > 
> 
> Thanks. It makes sense to me.
> This commit is motivated by a workload that use mmap/unmap heavily.
> While the mmu_gather feature is also used by hugetlb, madvise, mprotect,
> etc., thus I prefer to have another standalone commit (following this one)
> that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for
> not making redundant lru_add_drain() calls when using mmu_gather.

That's not normally the approach we take.

Re: [PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages

Posted by Jianfeng Wang 2 years, 1 month ago

On 12/14/23 7:06 PM, Matthew Wilcox wrote:
> On Thu, Dec 14, 2023 at 03:59:00PM -0800, Jianfeng Wang wrote:
>> On 12/14/23 3:00 PM, Matthew Wilcox wrote:
>>> On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote:
>>>> When unmapping VMA pages, pages will be gathered in batch and released by
>>>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
>>>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
>>>> which calls lru_add_drain() to drain cached pages in folio_batch before
>>>> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
>>>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>>>
>>>> Remove lru_add_drain() prior to gathering and unmapping pages in
>>>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>>>
>>>> Note that the page unmapping process in oom_killer (e.g., in
>>>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
>>>> redundant lru_add_drain(). So, this commit makes the code more consistent.
>>>
>>> Shouldn't we put this in __tlb_gather_mmu() which already has the
>>> CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
>>> zap_page_range_single() too.
>>>
>>
>> Thanks. It makes sense to me.
>> This commit is motivated by a workload that use mmap/unmap heavily.
>> While the mmu_gather feature is also used by hugetlb, madvise, mprotect,
>> etc., thus I prefer to have another standalone commit (following this one)
>> that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for
>> not making redundant lru_add_drain() calls when using mmu_gather.
> 
> That's not normally the approach we take.

Okay, understood. Thanks for pointing it out.
Let me send a new patch that aligns with your suggestion.

Re: [External] : Re: [PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages

Posted by Jianfeng Wang 2 years, 1 month ago

On 12/14/23 3:00 PM, Matthew Wilcox wrote:
> On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote:
>> When unmapping VMA pages, pages will be gathered in batch and released by
>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
>> which calls lru_add_drain() to drain cached pages in folio_batch before
>> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Remove lru_add_drain() prior to gathering and unmapping pages in
>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Note that the page unmapping process in oom_killer (e.g., in
>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
>> redundant lru_add_drain(). So, this commit makes the code more consistent.
> 
> Shouldn't we put this in __tlb_gather_mmu() which already has the
> CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
> zap_page_range_single() too.
> 

Thanks. It makes sense to me.
This commit is motivated by a workload that use mmap/unmap heavily.
While the mmu_gather feature is also used by hugetlb, madvise, mprotect,
etc., thus I prefer to have another standalone commit (following this one)
that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for
not making redundant lru_add_drain() calls when using mmu_gather.

>> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
>> ---
>>  mm/mmap.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 1971bfffcc03..da0308eef435 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
>>  	struct mmu_gather tlb;
>>  	unsigned long mt_start = mas->index;
>>  
>> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>  	lru_add_drain();
>> +#endif
>>  	tlb_gather_mmu(&tlb, mm);
>>  	update_hiwater_rss(mm);
>>  	unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
>> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm)
>>  		return;
>>  	}
>>  
>> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>  	lru_add_drain();
>> +#endif
>>  	flush_cache_mm(mm);
>>  	tlb_gather_mmu_fullmm(&tlb, mm);
>>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
>> -- 
>> 2.42.1
>>
>>