mm/mmap.c | 4 ++++ 1 file changed, 4 insertions(+)
When unmapping VMA pages, pages will be gathered in batch and released by
tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
which calls lru_add_drain() to drain cached pages in folio_batch before
releasing gathered pages. Thus, it is redundant to call lru_add_drain()
before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
Remove lru_add_drain() prior to gathering and unmapping pages in
exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
Note that the page unmapping process in oom_killer (e.g., in
__oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
redundant lru_add_drain(). So, this commit makes the code more consistent.
Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
---
mm/mmap.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c
index 1971bfffcc03..0451285dee4f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
struct mmu_gather tlb;
unsigned long mt_start = mas->index;
+#ifdef CONFIG_MMU_GATHER_NO_GATHER
lru_add_drain();
+#endif
tlb_gather_mmu(&tlb, mm);
update_hiwater_rss(mm);
unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
@@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm)
return;
}
+#ifdef CONFIG_MMU_GATHER_NO_GATHER
lru_add_drain();
+#endif
flush_cache_mm(mm);
tlb_gather_mmu_fullmm(&tlb, mm);
/* update_hiwater_rss(mm) here? but nobody should be looking */
--
2.42.1
On Tue, 2023-12-12 at 23:28 -0800, Jianfeng Wang wrote: > When unmapping VMA pages, pages will be gathered in batch and released by > tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function > tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), > which calls lru_add_drain() to drain cached pages in folio_batch before > releasing gathered pages. Thus, it is redundant to call lru_add_drain() > before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. > > Remove lru_add_drain() prior to gathering and unmapping pages in > exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. > > Note that the page unmapping process in oom_killer (e.g., in > __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have > redundant lru_add_drain(). So, this commit makes the code more consistent. > > Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> > --- > mm/mmap.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 1971bfffcc03..0451285dee4f 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, > struct mmu_gather tlb; > unsigned long mt_start = mas->index; > > +#ifdef CONFIG_MMU_GATHER_NO_GATHER In your comment you say skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER is *not* set. So shouldn't this be #ifndef CONFIG_MMU_GATHER_NO_GATHER ? > lru_add_drain(); > +#endif > tlb_gather_mmu(&tlb, mm); > update_hiwater_rss(mm); > unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); > @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm) > return; > } > > +#ifdef CONFIG_MMU_GATHER_NO_GATHER same question as above. > lru_add_drain(); > +#endif > flush_cache_mm(mm); > tlb_gather_mmu_fullmm(&tlb, mm); > /* update_hiwater_rss(mm) here? but nobody should be looking */
On 12/13/23 2:57 PM, Tim Chen wrote: > On Tue, 2023-12-12 at 23:28 -0800, Jianfeng Wang wrote: >> When unmapping VMA pages, pages will be gathered in batch and released by >> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function >> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), >> which calls lru_add_drain() to drain cached pages in folio_batch before >> releasing gathered pages. Thus, it is redundant to call lru_add_drain() >> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Remove lru_add_drain() prior to gathering and unmapping pages in >> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Note that the page unmapping process in oom_killer (e.g., in >> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have >> redundant lru_add_drain(). So, this commit makes the code more consistent. >> >> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> >> --- >> mm/mmap.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 1971bfffcc03..0451285dee4f 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, >> struct mmu_gather tlb; >> unsigned long mt_start = mas->index; >> >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER > > In your comment you say skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER > is *not* set. So shouldn't this be > > #ifndef CONFIG_MMU_GATHER_NO_GATHER ? > Hi Tim, The mmu_gather feature is used to gather pages produced by unmap_vmas() and release them in batch in tlb_finish_mmu(). The feature is *on* if CONFIG_MMU_GATHER_NO_GATHER is *not* set. Note that: tlb_finish_mmu() will call free_pages_and_swap_cache()/lru_add_drain() only when the feature is on. Yes, this commit aims to skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER is *not* set (i.e. when the mmu_gather feature is on) because it is redundant. If CONFIG_MMU_GATHER_NO_GATHER is set, pages will be released in unmap_vmas(). tlb_finish_mmu() will not call lru_add_drain(). So, it is still necessary to keep the lru_add_drain() call to clear cached pages before unmap_vmas(), as folio_batchs hold a reference count for pages in them. The same applies to the other case. Thanks, - Jianfeng >> lru_add_drain(); >> +#endif >> tlb_gather_mmu(&tlb, mm); >> update_hiwater_rss(mm); >> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); >> @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm) >> return; >> } >> >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER > > same question as above. > >> lru_add_drain(); >> +#endif >> flush_cache_mm(mm); >> tlb_gather_mmu_fullmm(&tlb, mm); >> /* update_hiwater_rss(mm) here? but nobody should be looking */ >
On Wed, 2023-12-13 at 17:03 -0800, Jianfeng Wang wrote: > On 12/13/23 2:57 PM, Tim Chen wrote: > > On Tue, 2023-12-12 at 23:28 -0800, Jianfeng Wang wrote: > > > When unmapping VMA pages, pages will be gathered in batch and released by > > > tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function > > > tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), > > > which calls lru_add_drain() to drain cached pages in folio_batch before > > > releasing gathered pages. Thus, it is redundant to call lru_add_drain() > > > before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. > > > > > > Remove lru_add_drain() prior to gathering and unmapping pages in > > > exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. > > > > > > Note that the page unmapping process in oom_killer (e.g., in > > > __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have > > > redundant lru_add_drain(). So, this commit makes the code more consistent. > > > > > > Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> > > > --- > > > mm/mmap.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index 1971bfffcc03..0451285dee4f 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, > > > struct mmu_gather tlb; > > > unsigned long mt_start = mas->index; > > > > > > +#ifdef CONFIG_MMU_GATHER_NO_GATHER > > > > In your comment you say skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER > > is *not* set. So shouldn't this be > > > > #ifndef CONFIG_MMU_GATHER_NO_GATHER ? > > > Hi Tim, > > The mmu_gather feature is used to gather pages produced by unmap_vmas() and > release them in batch in tlb_finish_mmu(). The feature is *on* if > CONFIG_MMU_GATHER_NO_GATHER is *not* set. Note that: tlb_finish_mmu() will call > free_pages_and_swap_cache()/lru_add_drain() only when the feature is on. Thanks for the explanation. Looking at the code, lru_add_drain() is executed for #ifndef CONFIG_MMU_GATHER_NO_GATHER in tlb_finish_mmu(). So the logic of your patch is fine. The #ifndef CONFIG_MMU_GATHER_NO_GATHER means mmu_gather feature is on. The double negative throws me off on in my first read of your commit log. Suggest that you add a comment in code to make it easier for future code maintenence: /* defer lru_add_drain() to tlb_finish_mmu() for ifndef CONFIG_MMU_GATHER_NO_GATHER */ Is your change of skipping the extra lru_add_drain() motivated by some performance reason in a workload? Wonder whether it is worth adding an extra ifdef in the code. Tim > > Yes, this commit aims to skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER > is *not* set (i.e. when the mmu_gather feature is on) because it is redundant. > > If CONFIG_MMU_GATHER_NO_GATHER is set, pages will be released in unmap_vmas(). > tlb_finish_mmu() will not call lru_add_drain(). So, it is still necessary to > keep the lru_add_drain() call to clear cached pages before unmap_vmas(), as > folio_batchs hold a reference count for pages in them. > > The same applies to the other case. > > Thanks, > - Jianfeng > > > > lru_add_drain(); > > > +#endif > > > tlb_gather_mmu(&tlb, mm); > > > update_hiwater_rss(mm); > > > unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); > > > @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm) > > > return; > > > } > > > > > > +#ifdef CONFIG_MMU_GATHER_NO_GATHER > > > > same question as above. > > > > > lru_add_drain(); > > > +#endif > > > flush_cache_mm(mm); > > > tlb_gather_mmu_fullmm(&tlb, mm); > > > /* update_hiwater_rss(mm) here? but nobody should be looking */ > >
On 12/14/23 9:57 AM, Tim Chen wrote: > On Wed, 2023-12-13 at 17:03 -0800, Jianfeng Wang wrote: >> On 12/13/23 2:57 PM, Tim Chen wrote: >>> On Tue, 2023-12-12 at 23:28 -0800, Jianfeng Wang wrote: >>>> When unmapping VMA pages, pages will be gathered in batch and released by >>>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function >>>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), >>>> which calls lru_add_drain() to drain cached pages in folio_batch before >>>> releasing gathered pages. Thus, it is redundant to call lru_add_drain() >>>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. >>>> >>>> Remove lru_add_drain() prior to gathering and unmapping pages in >>>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. >>>> >>>> Note that the page unmapping process in oom_killer (e.g., in >>>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have >>>> redundant lru_add_drain(). So, this commit makes the code more consistent. >>>> >>>> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> >>>> --- >>>> mm/mmap.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/mm/mmap.c b/mm/mmap.c >>>> index 1971bfffcc03..0451285dee4f 100644 >>>> --- a/mm/mmap.c >>>> +++ b/mm/mmap.c >>>> @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, >>>> struct mmu_gather tlb; >>>> unsigned long mt_start = mas->index; >>>> >>>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >>> >>> In your comment you say skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER >>> is *not* set. So shouldn't this be >>> >>> #ifndef CONFIG_MMU_GATHER_NO_GATHER ? >>> >> Hi Tim, >> >> The mmu_gather feature is used to gather pages produced by unmap_vmas() and >> release them in batch in tlb_finish_mmu(). The feature is *on* if >> CONFIG_MMU_GATHER_NO_GATHER is *not* set. Note that: tlb_finish_mmu() will call >> free_pages_and_swap_cache()/lru_add_drain() only when the feature is on. > > Thanks for the explanation. > > Looking at the code, lru_add_drain() is executed for #ifndef CONFIG_MMU_GATHER_NO_GATHER > in tlb_finish_mmu(). So the logic of your patch is fine. > > The #ifndef CONFIG_MMU_GATHER_NO_GATHER means > mmu_gather feature is on. The double negative throws me off on in my first read > of your commit log. > > Suggest that you add a comment in code to make it easier for > future code maintenence: > > /* defer lru_add_drain() to tlb_finish_mmu() for ifndef CONFIG_MMU_GATHER_NO_GATHER */ > > Is your change of skipping the extra lru_add_drain() motivated by some performance reason > in a workload? Wonder whether it is worth adding an extra ifdef in the code. > > Tim > Okay, great suggestion. We observe heavy contention on the LRU lock, introduced by lru_add_drain() and release_pages() for a prod workload, and we're trying to reduce the level of contention. lru_add_drain() is a complex function that first takes a local CPU lock and iterate through *all* folio_batches to see if there are pages to be moved to and between LRU lists. At that point, any page in these folio_batches will trigger acquiring the per-LRU spin lock and increase the level of lock contention. Applying the change can avoid calling lru_add_drain() unnecessarily, which is a source of lock contention. Together with the comment line suggested by you, I believe this also increases code readability to clarify the mmu_gather feature. - Jianfeng >> >> Yes, this commit aims to skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER >> is *not* set (i.e. when the mmu_gather feature is on) because it is redundant. >> >> If CONFIG_MMU_GATHER_NO_GATHER is set, pages will be released in unmap_vmas(). >> tlb_finish_mmu() will not call lru_add_drain(). So, it is still necessary to >> keep the lru_add_drain() call to clear cached pages before unmap_vmas(), as >> folio_batchs hold a reference count for pages in them. >> >> The same applies to the other case. >> >> Thanks, >> - Jianfeng >> >>>> lru_add_drain(); >>>> +#endif >>>> tlb_gather_mmu(&tlb, mm); >>>> update_hiwater_rss(mm); >>>> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); >>>> @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm) >>>> return; >>>> } >>>> >>>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >>> >>> same question as above. >>> >>>> lru_add_drain(); >>>> +#endif >>>> flush_cache_mm(mm); >>>> tlb_gather_mmu_fullmm(&tlb, mm); >>>> /* update_hiwater_rss(mm) here? but nobody should be looking */ >>> >
© 2016 - 2025 Red Hat, Inc.