From nobody Wed Apr 15 00:02:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B05BC04A68 for ; Thu, 28 Jul 2022 20:45:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232532AbiG1Upc (ORCPT ); Thu, 28 Jul 2022 16:45:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232467AbiG1UpW (ORCPT ); Thu, 28 Jul 2022 16:45:22 -0400 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5AC84E60F for ; Thu, 28 Jul 2022 13:45:20 -0700 (PDT) Received: by mail-pf1-x435.google.com with SMTP id w205so2897492pfc.8 for ; Thu, 28 Jul 2022 13:45:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=ycW0mhERWN+p0KVMgOpcoUTig1KmOvE2fu+KZQruV5Q=; b=enzG2ODUUsfYcRd6l9GSOTO9h/lNV8Ef4uIhHU6yji7Llr1yXRk3yu8Jvp3eGFtyvk GuacXnsG0xrlW8P8MKMl3E3zGDpphKxKDGWUicOyhyzhjvNKOJ5O/+6eGpDxn0PicZ1V TRa/lGa/fMEZsrkrEAVncXaTZo1oYW9zat+vY1xOm2f+7x+0Oxd/wuCtuHT+++nhi5fN P3aqzJfa/8dpzuEeqaT1QfwnZSq1teXtWEgL6iLWtf1zsU83oj0jmVJPCWHBGvdXSPiz oUzYdURjUrByQA5sleCoOy4v8itR0JknzX0G/p8AaYur1CAJ5lz1Jzn1s2MjJIRXXode FoXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=ycW0mhERWN+p0KVMgOpcoUTig1KmOvE2fu+KZQruV5Q=; b=zMcDmiinDih1XZ9ZvyC4QutovYG1FMgX7sj2pZqFXd5H18R3lAcvhYsu8zd0YlSRsJ ng0NWprxjoveeJD1eT8gd7cOY6YNm3wZjY//+Nsp6+WKRG+JJUUX2bUs4HK1o8Uhh5aR ThuqYWmxdnQT/a9qvmdNTnBju8aiVQjFPSlJ7jbvaB/z367ZFdpqpoqrqZSCQsfP4oOA sdpgCpQbiKoUPknVgjH9etAhD9706YvckN6JVru+IndV6pTKX81DKNRxz+k01TGsXGQM CgZmxq/fNrMd+tz4LGT9rJJhNirE2IQC7C2d7g0PQ+Bn5oVHaffQs7y1TTyct3+iVJR9 XyAA== X-Gm-Message-State: AJIora9SINgIXQA1bzNUd5pHXFlZXtP3A8jQfqUw4NndUTCN9mLMTVXV Yb1bdyADIQ8+LEcglfnuzAU= X-Google-Smtp-Source: AGRyM1tYmO8wnuamcnLViWW3Uo2TdiQiGqhQMiU8SvJkvB0urbOMvzbIUQjzsRT1aMImvlLKOkCoyQ== X-Received: by 2002:a63:b50b:0:b0:412:b42c:6940 with SMTP id y11-20020a63b50b000000b00412b42c6940mr406365pge.460.1659041120407; Thu, 28 Jul 2022 13:45:20 -0700 (PDT) Received: from KASONG-MB0.tencent.com ([114.254.3.190]) by smtp.gmail.com with ESMTPSA id 21-20020a170902c11500b0016c40f8cb58sm1787304pli.81.2022.07.28.13.45.18 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 28 Jul 2022 13:45:19 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Kairui Song Subject: [RFC PATCH 1/7] mm: remove the per-task RSS counter cache Date: Fri, 29 Jul 2022 04:45:05 +0800 Message-Id: <20220728204511.56348-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.35.2 In-Reply-To: <20220728204511.56348-1-ryncsn@gmail.com> References: <20220728204511.56348-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kairui Song The RSS counter cached as introduced in commit 34e55232e59f ("mm: avoid false sharing of mm_counter") to ease the contention of the RSS counters of a mm_struct. There are several problems with this, and the 64 events threshold might not be an optimal value. It makes the RSS value inaccurate, in the worst case, RSS value is not accounted until 64 pages are allocated. With common tools like `top`, there could be hundreds of MBs of error of the RSS value being reported by kernel. And since 4 counters share the same event threshold, in the worst case, each counter will do a global sync every 16 events, which still raises some contention. Remove this cache for now, and prepare for a different approach. Some helper macros are kept since they will come in handy later. Signed-off-by: Kairui Song --- Documentation/filesystems/proc.rst | 7 ----- fs/exec.c | 2 -- include/linux/mm.h | 20 +----------- include/linux/mm_types_task.h | 9 ------ include/linux/sched.h | 3 -- kernel/exit.c | 5 --- kernel/fork.c | 4 --- kernel/kthread.c | 1 - mm/madvise.c | 7 ++--- mm/memory.c | 49 ------------------------------ 10 files changed, 3 insertions(+), 104 deletions(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems= /proc.rst index 1bc91fb8c321..04a0a18da262 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -224,13 +224,6 @@ memory usage. Its seven fields are explained in Table = 1-3. The stat file contains detailed information about the process itself. Its fields are explained in Table 1-4. =20 -(for SMP CONFIG users) - -For making accounting scalable, RSS related information are handled in an -asynchronous manner and the value may not be very precise. To see a precise -snapshot of a moment, you can see /proc//smaps file and scan page tab= le. -It's slow but very precise. - .. table:: Table 1-2: Contents of the status files (as of 4.19) =20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D diff --git a/fs/exec.c b/fs/exec.c index 778123259e42..3c787ca8c68e 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -988,8 +988,6 @@ static int exec_mmap(struct mm_struct *mm) tsk =3D current; old_mm =3D current->mm; exec_mm_release(tsk, old_mm); - if (old_mm) - sync_mm_rss(old_mm); =20 ret =3D down_write_killable(&tsk->signal->exec_update_lock); if (ret) diff --git a/include/linux/mm.h b/include/linux/mm.h index cf3d0d673f6b..6346f7e77dc7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1998,17 +1998,7 @@ static inline bool get_user_page_fast_only(unsigned = long addr, */ static inline unsigned long get_mm_counter(struct mm_struct *mm, int membe= r) { - long val =3D atomic_long_read(&mm->rss_stat.count[member]); - -#ifdef SPLIT_RSS_COUNTING - /* - * counter is updated in asynchronous manner and may go to minus. - * But it's never be expected number for users. - */ - if (val < 0) - val =3D 0; -#endif - return (unsigned long)val; + return atomic_long_read(&mm->rss_stat.count[member]); } =20 void mm_trace_rss_stat(struct mm_struct *mm, int member, long count); @@ -2094,14 +2084,6 @@ static inline void setmax_mm_hiwater_rss(unsigned lo= ng *maxrss, *maxrss =3D hiwater_rss; } =20 -#if defined(SPLIT_RSS_COUNTING) -void sync_mm_rss(struct mm_struct *mm); -#else -static inline void sync_mm_rss(struct mm_struct *mm) -{ -} -#endif - #ifndef CONFIG_ARCH_HAS_PTE_SPECIAL static inline int pte_special(pte_t pte) { diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index c1bc6731125c..a00327c663db 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -48,15 +48,6 @@ enum { NR_MM_COUNTERS }; =20 -#if USE_SPLIT_PTE_PTLOCKS && defined(CONFIG_MMU) -#define SPLIT_RSS_COUNTING -/* per-thread cached information, */ -struct task_rss_stat { - int events; /* for synchronization threshold */ - int count[NR_MM_COUNTERS]; -}; -#endif /* USE_SPLIT_PTE_PTLOCKS */ - struct mm_rss_stat { atomic_long_t count[NR_MM_COUNTERS]; }; diff --git a/include/linux/sched.h b/include/linux/sched.h index c46f3a63b758..11d3e1a95302 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -861,9 +861,6 @@ struct task_struct { /* Per-thread vma caching: */ struct vmacache vmacache; =20 -#ifdef SPLIT_RSS_COUNTING - struct task_rss_stat rss_stat; -#endif int exit_state; int exit_code; int exit_signal; diff --git a/kernel/exit.c b/kernel/exit.c index 64c938ce36fe..8c55cda5136f 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -482,7 +482,6 @@ static void exit_mm(void) exit_mm_release(current, mm); if (!mm) return; - sync_mm_rss(mm); mmap_read_lock(mm); mmgrab(mm); BUG_ON(mm !=3D current->active_mm); @@ -749,10 +748,6 @@ void __noreturn do_exit(long code) =20 io_uring_files_cancel(); exit_signals(tsk); /* sets PF_EXITING */ - - /* sync mm's RSS info before statistics gathering */ - if (tsk->mm) - sync_mm_rss(tsk->mm); acct_update_integrals(tsk); group_dead =3D atomic_dec_and_test(&tsk->signal->live); if (group_dead) { diff --git a/kernel/fork.c b/kernel/fork.c index 9d44f2d46c69..c090ebd55063 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2145,10 +2145,6 @@ static __latent_entropy struct task_struct *copy_pro= cess( p->io_uring =3D NULL; #endif =20 -#if defined(SPLIT_RSS_COUNTING) - memset(&p->rss_stat, 0, sizeof(p->rss_stat)); -#endif - p->default_timer_slack_ns =3D current->timer_slack_ns; =20 #ifdef CONFIG_PSI diff --git a/kernel/kthread.c b/kernel/kthread.c index 3c677918d8f2..6bfbab4e2103 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -1463,7 +1463,6 @@ void kthread_unuse_mm(struct mm_struct *mm) * clearing tsk->mm. */ smp_mb__after_spinlock(); - sync_mm_rss(mm); local_irq_disable(); tsk->mm =3D NULL; membarrier_update_current_mm(NULL); diff --git a/mm/madvise.c b/mm/madvise.c index 0316bbc6441b..48cb9e5f92d2 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -711,12 +711,9 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned= long addr, mark_page_lazyfree(page); } out: - if (nr_swap) { - if (current->mm =3D=3D mm) - sync_mm_rss(mm); - + if (nr_swap) add_mm_counter(mm, MM_SWAPENTS, nr_swap); - } + arch_leave_lazy_mmu_mode(); pte_unmap_unlock(orig_pte, ptl); cond_resched(); diff --git a/mm/memory.c b/mm/memory.c index 4cf7d4b6c950..6bf7826e666b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -176,53 +176,9 @@ void mm_trace_rss_stat(struct mm_struct *mm, int membe= r, long count) trace_rss_stat(mm, member, count); } =20 -#if defined(SPLIT_RSS_COUNTING) - -void sync_mm_rss(struct mm_struct *mm) -{ - int i; - - for (i =3D 0; i < NR_MM_COUNTERS; i++) { - if (current->rss_stat.count[i]) { - add_mm_counter(mm, i, current->rss_stat.count[i]); - current->rss_stat.count[i] =3D 0; - } - } - current->rss_stat.events =3D 0; -} - -static void add_mm_counter_fast(struct mm_struct *mm, int member, int val) -{ - struct task_struct *task =3D current; - - if (likely(task->mm =3D=3D mm)) - task->rss_stat.count[member] +=3D val; - else - add_mm_counter(mm, member, val); -} -#define inc_mm_counter_fast(mm, member) add_mm_counter_fast(mm, member, 1) -#define dec_mm_counter_fast(mm, member) add_mm_counter_fast(mm, member, -1) - -/* sync counter once per 64 page faults */ -#define TASK_RSS_EVENTS_THRESH (64) -static void check_sync_rss_stat(struct task_struct *task) -{ - if (unlikely(task !=3D current)) - return; - if (unlikely(task->rss_stat.events++ > TASK_RSS_EVENTS_THRESH)) - sync_mm_rss(task->mm); -} -#else /* SPLIT_RSS_COUNTING */ - #define inc_mm_counter_fast(mm, member) inc_mm_counter(mm, member) #define dec_mm_counter_fast(mm, member) dec_mm_counter(mm, member) =20 -static void check_sync_rss_stat(struct task_struct *task) -{ -} - -#endif /* SPLIT_RSS_COUNTING */ - /* * Note: this doesn't free the actual pages themselves. That * has been handled earlier when unmapping all the memory regions. @@ -502,8 +458,6 @@ static inline void add_mm_rss_vec(struct mm_struct *mm,= int *rss) { int i; =20 - if (current->mm =3D=3D mm) - sync_mm_rss(mm); for (i =3D 0; i < NR_MM_COUNTERS; i++) if (rss[i]) add_mm_counter(mm, i, rss[i]); @@ -5120,9 +5074,6 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma= , unsigned long address, count_vm_event(PGFAULT); count_memcg_event_mm(vma->vm_mm, PGFAULT); =20 - /* do counter updates before entering really critical section. */ - check_sync_rss_stat(current); - if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, flags & FAULT_FLAG_INSTRUCTION, flags & FAULT_FLAG_REMOTE)) --=20 2.35.2 From nobody Wed Apr 15 00:02:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75C98C19F2B for ; Thu, 28 Jul 2022 20:45:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232817AbiG1Upg (ORCPT ); Thu, 28 Jul 2022 16:45:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232506AbiG1Up0 (ORCPT ); Thu, 28 Jul 2022 16:45:26 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 757B552E7D for ; Thu, 28 Jul 2022 13:45:23 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id ha11so3058024pjb.2 for ; Thu, 28 Jul 2022 13:45:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=fEJL5O/8Mtexvm+J+r1cHRwPR897J/MIFhWHBeDpdlE=; b=LJO5mQsP4juN5xvNbTYM56FRLCmFDkgErNYmMEF/d4UxE93cf83tbCdmy5z9y3tTVw zvtIXcmCccQhzroNCgyJKAXd9xJJQsAEqwOqLtqm9LlRTlYodX54TFGQXZVrPzhKisjf ChA407NCENtAqKPBqnbCxq/18lGmRNyh0MQK7F5s22aUwo2e0C6+KqbziljzayQo10gM TF/fBk6+QiGIeXVyuO+eA5L3yFx/xmDbeudT6Ro6V7LI7awXDV/TljUZmi1Q/LkVlJUZ h/FvZe9wQP0L7CXI7zUqjl+XOisurJZgww1fpxRVLGX4AZDvUy5D45vYb6oeAUKy0Orr varA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=fEJL5O/8Mtexvm+J+r1cHRwPR897J/MIFhWHBeDpdlE=; b=4Zt6OzofHlT7hfwYxFxiGgSFsv+uUcmIpBctF2ITPxqtCR71ab+senNPZUioL72sw4 T7XXsu7uNrWe1WgHoaMyao51zbqP/8mQdgw1Rwsp/QoISTB4NALATXFPTRaTyjxMQUVX 9VtMM9P07EgYAxAo/iC5Cb6Zz8dAUseye3Q2z6ySyAn2+votP8nA8q1o3PT+y16UMcCy sfGT92RAARkr8BXf62y9kQRMfyPThJlRB7G4HugoqUXSpLU/ft1LSPpNlXCG2VZmnzKT TUhHS8T6l6PkDOcxfyyUzMJhYep7n6eLYm/W0MmZL4iWb8Md5KSJj5OTwUsGmehav4R3 9Z+w== X-Gm-Message-State: ACgBeo3cJeJBqPyz4dUT1ARTrXExgOpwPfVukFzBngjxcVYEtix2pRSI /T3Gm0KZquPk4rrPNlGIlac= X-Google-Smtp-Source: AA6agR7gaJDHwuNty0e2Q772Que215+AfrAqz2a4Nol+uHZaE2IVzWbEJW8m+7P6ryUuMNqvlr67dA== X-Received: by 2002:a17:90b:4c03:b0:1f2:b977:c64e with SMTP id na3-20020a17090b4c0300b001f2b977c64emr473596pjb.211.1659041122887; Thu, 28 Jul 2022 13:45:22 -0700 (PDT) Received: from KASONG-MB0.tencent.com ([114.254.3.190]) by smtp.gmail.com with ESMTPSA id 21-20020a170902c11500b0016c40f8cb58sm1787304pli.81.2022.07.28.13.45.20 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 28 Jul 2022 13:45:22 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Kairui Song Subject: [RFC PATCH 2/7] mm: move check_mm to memory.c Date: Fri, 29 Jul 2022 04:45:06 +0800 Message-Id: <20220728204511.56348-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.35.2 In-Reply-To: <20220728204511.56348-1-ryncsn@gmail.com> References: <20220728204511.56348-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kairui Song No function change, make it possible to do extra mm operation on mm exit, prepare for following commits. Signed-off-by: Kairui Song --- include/linux/mm.h | 3 +++ kernel/fork.c | 33 --------------------------------- mm/memory.c | 32 ++++++++++++++++++++++++++++++++ 3 files changed, 35 insertions(+), 33 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 6346f7e77dc7..81ad91621078 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1993,6 +1993,9 @@ static inline bool get_user_page_fast_only(unsigned l= ong addr, { return get_user_pages_fast_only(addr, 1, gup_flags, pagep) =3D=3D 1; } + +void check_mm(struct mm_struct *mm); + /* * per-process(per-mm_struct) statistics. */ diff --git a/kernel/fork.c b/kernel/fork.c index c090ebd55063..86a239772208 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -128,15 +128,6 @@ int nr_threads; /* The idle threads do not count.. */ =20 static int max_threads; /* tunable limit on nr_threads */ =20 -#define NAMED_ARRAY_INDEX(x) [x] =3D __stringify(x) - -static const char * const resident_page_types[] =3D { - NAMED_ARRAY_INDEX(MM_FILEPAGES), - NAMED_ARRAY_INDEX(MM_ANONPAGES), - NAMED_ARRAY_INDEX(MM_SWAPENTS), - NAMED_ARRAY_INDEX(MM_SHMEMPAGES), -}; - DEFINE_PER_CPU(unsigned long, process_counts) =3D 0; =20 __cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */ @@ -748,30 +739,6 @@ static int dup_mmap(struct mm_struct *mm, struct mm_st= ruct *oldmm) #define mm_free_pgd(mm) #endif /* CONFIG_MMU */ =20 -static void check_mm(struct mm_struct *mm) -{ - int i; - - BUILD_BUG_ON_MSG(ARRAY_SIZE(resident_page_types) !=3D NR_MM_COUNTERS, - "Please make sure 'struct resident_page_types[]' is updated as well"); - - for (i =3D 0; i < NR_MM_COUNTERS; i++) { - long x =3D atomic_long_read(&mm->rss_stat.count[i]); - - if (unlikely(x)) - pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld\n", - mm, resident_page_types[i], x); - } - - if (mm_pgtables_bytes(mm)) - pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n", - mm_pgtables_bytes(mm)); - -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS - VM_BUG_ON_MM(mm->pmd_huge_pte, mm); -#endif -} - #define allocate_mm() (kmem_cache_alloc(mm_cachep, GFP_KERNEL)) #define free_mm(mm) (kmem_cache_free(mm_cachep, (mm))) =20 diff --git a/mm/memory.c b/mm/memory.c index 6bf7826e666b..c0597214f9b3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -179,6 +179,38 @@ void mm_trace_rss_stat(struct mm_struct *mm, int membe= r, long count) #define inc_mm_counter_fast(mm, member) inc_mm_counter(mm, member) #define dec_mm_counter_fast(mm, member) dec_mm_counter(mm, member) =20 +#define NAMED_ARRAY_INDEX(x) [x] =3D __stringify(x) +static const char * const resident_page_types[] =3D { + NAMED_ARRAY_INDEX(MM_FILEPAGES), + NAMED_ARRAY_INDEX(MM_ANONPAGES), + NAMED_ARRAY_INDEX(MM_SWAPENTS), + NAMED_ARRAY_INDEX(MM_SHMEMPAGES), +}; + +void check_mm(struct mm_struct *mm) +{ + int i; + + BUILD_BUG_ON_MSG(ARRAY_SIZE(resident_page_types) !=3D NR_MM_COUNTERS, + "Please make sure 'struct resident_page_types[]' is updated as well"); + + for (i =3D 0; i < NR_MM_COUNTERS; i++) { + long x =3D atomic_long_read(&mm->rss_stat.count[i]); + + if (unlikely(x)) + pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld\n", + mm, resident_page_types[i], x); + } + + if (mm_pgtables_bytes(mm)) + pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n", + mm_pgtables_bytes(mm)); + +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS + VM_BUG_ON_MM(mm->pmd_huge_pte, mm); +#endif +} + /* * Note: this doesn't free the actual pages themselves. That * has been handled earlier when unmapping all the memory regions. --=20 2.35.2 From nobody Wed Apr 15 00:02:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7FAEC04A68 for ; Thu, 28 Jul 2022 20:45:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232946AbiG1Upk (ORCPT ); Thu, 28 Jul 2022 16:45:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232644AbiG1Upb (ORCPT ); Thu, 28 Jul 2022 16:45:31 -0400 Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D3285A3EF for ; Thu, 28 Jul 2022 13:45:25 -0700 (PDT) Received: by mail-pj1-x102c.google.com with SMTP id o5-20020a17090a3d4500b001ef76490983so3349851pjf.2 for ; Thu, 28 Jul 2022 13:45:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=3VreCZyHHh+WzAldTMIrv+f0+ZRGj5qK/tBAl8Rh9YU=; b=FwRN+Dz/EkdantL+2MuehBxRu34vE2XcB9JJwAKV1+CF9A8W4YNymfzAbHnrbk+L2U fR4wNZ+kuA+iAYL+rFBDO2riXI3h0eYhrjVGXQyktY2XHPh3TfoRUdsV8h5ekA0eKkdv z7kpAxEMJG2bbBCVPlhHWuM7BvjkVok8aSEVxBSDukYAzf+WgLda4xfSw/OT3A291gdg ou48O7q9cfHkv/RQWombHOHxI6eODfqLFNqZtaJQOzuI2qjxcEstXuCibnuZX36hWZMm e5EfiwYzC/IamfAWRITnYup9JSvjEVdahnDfoxZD7ktMOsUQNCdQme/sfeOx+rupiVkJ +jPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=3VreCZyHHh+WzAldTMIrv+f0+ZRGj5qK/tBAl8Rh9YU=; b=dq7OP2YboSivZbH8/Xm25yQaNqrXwOtFHzI+IAmwEWuEHe4uOV2V0wkPays+UWN+yI SUz4fVNqVm3OIWlduoa29anG7kviwQmjDeIc1JYDEIZ4KlrxJsocpqx1dd9oJ4CY5lpX A4kGezypBSotMultXXp7iP1scKLO+NjvAza4JIrW1tj4dFkOsTpa59Be48wSkkGXgKtt nWlBRANscBuKQbKkxEWoso4Q5dxB2vX6J/giW7XcOwO70GWGaVNutIaZNcJEXI2w22ms BWAgwOFxouNxp2B6Dx4fRnPoGN31a8KVgah1kQwMWIBRc+zsvOQt70FJRWaKxCztMJf8 pxvw== X-Gm-Message-State: ACgBeo0/8eod8C672L9C1qkHGzGtBqq9HtjLx/Jx/lUI9igtkxrjSaf7 UuabUYtqyomceO88X8iI54jLKsgBhQk3Nhk3uFQ= X-Google-Smtp-Source: AA6agR40gp1BRqPQFl5cX7Tmvlv/+x70OfYLiw3CrWsQmCd4n0R6k3976B5hXnfHCoAHyxuIlaWKiw== X-Received: by 2002:a17:90b:b03:b0:1f3:6fb:bd20 with SMTP id bf3-20020a17090b0b0300b001f306fbbd20mr1215431pjb.38.1659041125076; Thu, 28 Jul 2022 13:45:25 -0700 (PDT) Received: from KASONG-MB0.tencent.com ([114.254.3.190]) by smtp.gmail.com with ESMTPSA id 21-20020a170902c11500b0016c40f8cb58sm1787304pli.81.2022.07.28.13.45.23 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 28 Jul 2022 13:45:24 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Kairui Song Subject: [RFC PATCH 3/7] mm/headers: change emun order of MM_COUNTERS Date: Fri, 29 Jul 2022 04:45:07 +0800 Message-Id: <20220728204511.56348-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.35.2 In-Reply-To: <20220728204511.56348-1-ryncsn@gmail.com> References: <20220728204511.56348-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kairui Song get_rss reads MM_FILEPAGES, MM_ANONPAGES, MM_SHMEMPAGES. Make them continues so it's easier to read in a loop in following commits. Signed-off-by: Kairui Song --- include/linux/mm_types_task.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index a00327c663db..14182ded3fda 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -43,8 +43,8 @@ struct vmacache { enum { MM_FILEPAGES, /* Resident file mapping pages */ MM_ANONPAGES, /* Resident anonymous pages */ - MM_SWAPENTS, /* Anonymous swap entries */ MM_SHMEMPAGES, /* Resident shared memory pages */ + MM_SWAPENTS, /* Anonymous swap entries */ NR_MM_COUNTERS }; =20 --=20 2.35.2 From nobody Wed Apr 15 00:02:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 127CBC04A68 for ; Thu, 28 Jul 2022 20:45:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232659AbiG1Upn (ORCPT ); Thu, 28 Jul 2022 16:45:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232686AbiG1Upc (ORCPT ); Thu, 28 Jul 2022 16:45:32 -0400 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 043C1655A5 for ; Thu, 28 Jul 2022 13:45:28 -0700 (PDT) Received: by mail-pf1-x42b.google.com with SMTP id c3so2881204pfb.13 for ; Thu, 28 Jul 2022 13:45:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=SGkgqy0QLVnzdid2tkzB0s+fC8K9ka6NmIFVVWD2xYk=; b=D9EKylMTSsghnjGXBfCgCbvVNT/Rb65B+zsOKJ1byGzpWm2z65aE1EXzzx/Scfhj1j y2tQeGMLlyvbAhkbfJKOXECF9lyzXm3NXCnFUVCnh3x/7JHguG9iSpiwcz3RI/SShalX lyU1nhy4uxZWrFwXZsD9kwF3TJ53txtAi8BkMTqVc9ZDh5/ahzqsPBKF6eywhir6Gn8J Z9z/xMTXBCiJscT764TuHzpDskjTwlhGIQqaP+/hDN6SLFhnLb7tvulx1GNuJbjfn8Ow tjZmi2oPiHV7UnpjBSfZB9reP9y//41DnRFPCWIG2YA466fl411OrgzxKV+/SCNaHwHn 55dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=SGkgqy0QLVnzdid2tkzB0s+fC8K9ka6NmIFVVWD2xYk=; b=jsd+JBOVXmlwf0z6ROgMZXNATsTGKbMcDnvB3HhfPm5i8TVZ06+6JCOqJvVv7hTJA7 zYAyZncMAAWk4IgHIBrDsCw5pwK04QRb4mTbT3zV2hE8CaLmsCzPh23/TvqeouDmZ75V z7pkleYfv3KYwL2d2XitKuWU5ISdk5jfHSa0d7ocrIxI4O1lpWc8MzLhfNo1BlCNBbfp 7QoCP4C/0W278SjZIdX3+KlsNgBnt93vjtyvDSDoD76f3cAtB3lYFqli+eH82U9lb2+1 jZlJrc4xDuUkCzbQjFd0JJeEpHZ3H7tW3d13Yj84CV+KMc8aBmBjnJAEOiNgesQAnYe7 J+bw== X-Gm-Message-State: AJIora8i1e8ITt+HvlJx7yTOK8l4C5gC6YkT+tH32/2JksbPAnuQ58+A rtFGJ3RMkjtWKd61QR1prHmlUU5EKga/XG9oxrg= X-Google-Smtp-Source: AGRyM1skRuRihGKUIV+KmimC7TjXwxVcHbd9Pp0zhfBbeZxDqTaXXoGkFBmLZFsjmEUj/eJ601FKUg== X-Received: by 2002:a63:5f56:0:b0:412:9907:ec0d with SMTP id t83-20020a635f56000000b004129907ec0dmr443467pgb.18.1659041127458; Thu, 28 Jul 2022 13:45:27 -0700 (PDT) Received: from KASONG-MB0.tencent.com ([114.254.3.190]) by smtp.gmail.com with ESMTPSA id 21-20020a170902c11500b0016c40f8cb58sm1787304pli.81.2022.07.28.13.45.25 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 28 Jul 2022 13:45:26 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Kairui Song Subject: [RFC PATCH 4/7] mm: introduce a generic per-CPU RSS cache Date: Fri, 29 Jul 2022 04:45:08 +0800 Message-Id: <20220728204511.56348-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.35.2 In-Reply-To: <20220728204511.56348-1-ryncsn@gmail.com> References: <20220728204511.56348-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kairui Song The RSS cache used to be a per-task cache, and it's batched into 64 events for each atomic sync. The problems is 64 events is too small for contention reducing, and too large for an accurate RSS accounting. This per-cpu RSS cache assumes one mm_struct tends to stay on the same CPU, so if the mm_struct be accounted matches current active_mm, keep the RSS accounting CPU local until the mm_struct is switched out, and do an atomic update only upon switch out. The fast path of CPU local RSS accounting is extremely lightweight, only set preemption off and then do a CPU local counter increase. One major effect is that now RSS reading is much more accurate than before, but also slower. It needs to iterate all possible CPUs that have cached the RSS and collect the un-committed caches. With a lockless reader design, this never blocks the RSS accounting fast path, which ensures a good updater performance. And considering RSS updating is much more common than reading, this should improve the performance overall. This CPU iteration can be avoided by using CPU mask to mark the CPUs that cached the mm_struct and only read from these CPUs. It can leverage the existing mm_cpumask used for TLB shootdown, this has to be done arch by arch in later commits. This commit provides a baseline version that works on all arch, but with a performance drop for RSS syncing upon read/invalidation. Signed-off-by: Kairui Song --- include/linux/mm.h | 15 +-- include/linux/mm_types_task.h | 38 +++++++ kernel/fork.c | 2 +- kernel/sched/core.c | 3 + mm/memory.c | 201 ++++++++++++++++++++++++++++++++-- 5 files changed, 236 insertions(+), 23 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 81ad91621078..47b8552b1b04 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1994,15 +1994,13 @@ static inline bool get_user_page_fast_only(unsigned= long addr, return get_user_pages_fast_only(addr, 1, gup_flags, pagep) =3D=3D 1; } =20 -void check_mm(struct mm_struct *mm); +void check_discard_mm(struct mm_struct *mm); =20 /* * per-process(per-mm_struct) statistics. */ -static inline unsigned long get_mm_counter(struct mm_struct *mm, int membe= r) -{ - return atomic_long_read(&mm->rss_stat.count[member]); -} +unsigned long get_mm_counter(struct mm_struct *mm, int member); +unsigned long get_mm_rss(struct mm_struct *mm); =20 void mm_trace_rss_stat(struct mm_struct *mm, int member, long count); =20 @@ -2042,13 +2040,6 @@ static inline int mm_counter(struct page *page) return mm_counter_file(page); } =20 -static inline unsigned long get_mm_rss(struct mm_struct *mm) -{ - return get_mm_counter(mm, MM_FILEPAGES) + - get_mm_counter(mm, MM_ANONPAGES) + - get_mm_counter(mm, MM_SHMEMPAGES); -} - static inline unsigned long get_mm_hiwater_rss(struct mm_struct *mm) { return max(mm->hiwater_rss, get_mm_rss(mm)); diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index 14182ded3fda..d5d3fbece174 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -12,6 +12,7 @@ #include #include #include +#include =20 #include =20 @@ -52,6 +53,43 @@ struct mm_rss_stat { atomic_long_t count[NR_MM_COUNTERS]; }; =20 +struct mm_rss_cache { + /* + * CPU local only variables, hot path for RSS caching. Readonly for other= CPUs. + */ + unsigned long in_use; + long count[NR_MM_COUNTERS]; + + /* Avoid false sharing when other CPUs collect RSS counter */ + struct mm_struct *mm ____cacheline_aligned; + /* Avoid ABA problem and RSS being accounted for wrong mm */ + unsigned long sync_count; +}; + +/* lowest bit of *mm is never used, so use it as a syncing flag */ +#define RSS_CACHE_MM_SYNCING_MASK 1UL + +/* mark the mm as being synced on that cache */ +static __always_inline struct mm_struct *__pcp_rss_mm_mark(struct mm_struc= t *mm) +{ + unsigned long val =3D (unsigned long)mm; + + val |=3D RSS_CACHE_MM_SYNCING_MASK; + + return (struct mm_struct *) val; +} + +static __always_inline struct mm_struct *__pcp_rss_mm_unmark(struct mm_str= uct *mm) +{ + unsigned long val =3D (unsigned long)mm; + + val &=3D ~RSS_CACHE_MM_SYNCING_MASK; + + return (struct mm_struct *) val; +} + +void switch_pcp_rss_cache_no_irq(struct mm_struct *next_mm); + struct page_frag { struct page *page; #if (BITS_PER_LONG > 32) || (PAGE_SIZE >=3D 65536) diff --git a/kernel/fork.c b/kernel/fork.c index 86a239772208..c2f5f6eef6a6 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -755,9 +755,9 @@ void __mmdrop(struct mm_struct *mm) mm_free_pgd(mm); destroy_context(mm); mmu_notifier_subscriptions_destroy(mm); - check_mm(mm); put_user_ns(mm->user_ns); mm_pasid_drop(mm); + check_discard_mm(mm); free_mm(mm); } EXPORT_SYMBOL_GPL(__mmdrop); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index da0bf6fe9ecd..11df67bb52ee 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5142,6 +5142,9 @@ context_switch(struct rq *rq, struct task_struct *pre= v, =20 prepare_lock_switch(rq, next, rf); =20 + /* Cache new active_mm */ + switch_pcp_rss_cache_no_irq(next->active_mm); + /* Here we just switch the register state and the stack. */ switch_to(prev, next, prev); barrier(); diff --git a/mm/memory.c b/mm/memory.c index c0597214f9b3..f00f302143b6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -176,8 +176,143 @@ void mm_trace_rss_stat(struct mm_struct *mm, int memb= er, long count) trace_rss_stat(mm, member, count); } =20 -#define inc_mm_counter_fast(mm, member) inc_mm_counter(mm, member) -#define dec_mm_counter_fast(mm, member) dec_mm_counter(mm, member) +static DEFINE_PER_CPU_SHARED_ALIGNED(struct mm_rss_cache, cpu_rss_cache); + +/* + * get_mm_counter and get_mm_rss try to read the RSS cache of each + * CPU that cached target mm. If the cache is flushed while being read, + * skip it. May lead to rare and little bit of accuracy loss, but flushed + * cache will surely be accounted in the next read. + */ +unsigned long get_mm_counter(struct mm_struct *mm, int member) +{ + int cpu; + long ret, update, sync_count; + + ret =3D atomic_long_read(&mm->rss_stat.count[member]); + for_each_possible_cpu(cpu) { + if (READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)) !=3D mm) + continue; + sync_count =3D READ_ONCE(per_cpu(cpu_rss_cache.sync_count, cpu)); + /* see smp_mb in switch_pcp_rss_cache_no_irq */ + smp_rmb(); + + update =3D READ_ONCE(per_cpu(cpu_rss_cache.count[member], cpu)); + + /* same as above */ + smp_rmb(); + if (READ_ONCE(per_cpu(cpu_rss_cache.sync_count, cpu)) =3D=3D sync_count = && + READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)) =3D=3D mm) + ret +=3D update; + } + + if (ret < 0) + ret =3D 0; + + return ret; +} + +/* see comment for get_mm_counter */ +unsigned long get_mm_rss(struct mm_struct *mm) +{ + int cpu; + long ret, update, sync_count; + + ret =3D atomic_long_read(&mm->rss_stat.count[MM_FILEPAGES]), + + atomic_long_read(&mm->rss_stat.count[MM_ANONPAGES]), + + atomic_long_read(&mm->rss_stat.count[MM_SHMEMPAGES]); + + for_each_possible_cpu(cpu) { + if (READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)) !=3D mm) + continue; + sync_count =3D READ_ONCE(per_cpu(cpu_rss_cache.sync_count, cpu)); + /* see smp_mb in switch_pcp_rss_cache_no_irq */ + smp_rmb(); + + /* Reads MM_FILEPAGES, MM_ANONPAGES, MM_SHMEMPAGES */ + for (int i =3D MM_FILEPAGES; i < MM_SWAPENTS; i++) + update +=3D READ_ONCE(per_cpu(cpu_rss_cache.count[i], cpu)); + + /* same as above */ + smp_rmb(); + if (READ_ONCE(per_cpu(cpu_rss_cache.sync_count, cpu)) =3D=3D sync_count = && + READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)) =3D=3D mm) + ret +=3D update; + } + + if (ret < 0) + ret =3D 0; + + return ret; +} + +/* flush the rss cache of current CPU with IRQ disabled, and switch to new= mm */ +void switch_pcp_rss_cache_no_irq(struct mm_struct *next_mm) +{ + long count; + struct mm_struct *cpu_mm; + + cpu_mm =3D this_cpu_read(cpu_rss_cache.mm); + if (cpu_mm =3D=3D next_mm) + return; + + /* + * `in_use` counter is hold with preempt disabled, if non-zero, this woul= d be a + * interrupt switching the mm, just ignore it. + */ + if (this_cpu_read(cpu_rss_cache.in_use)) + return; + + if (cpu_mm =3D=3D NULL) + goto commit_done; + + /* Race with check_discard_rss_cache */ + if (cpu_mm !=3D cmpxchg(this_cpu_ptr(&cpu_rss_cache.mm), cpu_mm, + __pcp_rss_mm_mark(cpu_mm))) + goto commit_done; + + for (int i =3D 0; i < NR_MM_COUNTERS; i++) { + count =3D this_cpu_read(cpu_rss_cache.count[i]); + if (count) + add_mm_counter(cpu_mm, i, count); + } + +commit_done: + for (int i =3D 0; i < NR_MM_COUNTERS; i++) + this_cpu_write(cpu_rss_cache.count[i], 0); + + /* + * For remote reading in get_mm_{rss,counter}, + * ensure new mm and sync counter have zero'ed counters + */ + smp_wmb(); + this_cpu_write(cpu_rss_cache.mm, next_mm); + this_cpu_inc(cpu_rss_cache.sync_count); +} + +static void add_mm_counter_fast(struct mm_struct *mm, int member, int val) +{ + /* + * Disable preempt so task is pinned, and the mm is pinned on this CPU + * since caller must be holding a reference. + */ + preempt_disable(); + this_cpu_inc(cpu_rss_cache.in_use); + + if (likely(mm =3D=3D this_cpu_read(cpu_rss_cache.mm))) { + this_cpu_add(cpu_rss_cache.count[member], val); + this_cpu_dec(cpu_rss_cache.in_use); + /* Avoid the resched checking oveahead for fast path */ + preempt_enable_no_resched(); + } else { + this_cpu_dec(cpu_rss_cache.in_use); + preempt_enable_no_resched(); + add_mm_counter(mm, member, val); + } +} + +#define inc_mm_counter_fast(mm, member) add_mm_counter_fast(mm, member, 1) +#define dec_mm_counter_fast(mm, member) add_mm_counter_fast(mm, member, -1) =20 #define NAMED_ARRAY_INDEX(x) [x] =3D __stringify(x) static const char * const resident_page_types[] =3D { @@ -187,20 +322,64 @@ static const char * const resident_page_types[] =3D { NAMED_ARRAY_INDEX(MM_SHMEMPAGES), }; =20 -void check_mm(struct mm_struct *mm) +static void check_discard_rss_cache(struct mm_struct *mm) { - int i; + int cpu; + long cached_count[NR_MM_COUNTERS] =3D { 0 }; + struct mm_struct *cpu_mm; =20 - BUILD_BUG_ON_MSG(ARRAY_SIZE(resident_page_types) !=3D NR_MM_COUNTERS, - "Please make sure 'struct resident_page_types[]' is updated as well"); + /* Invalidate the RSS cache on every CPU */ + for_each_possible_cpu(cpu) { + cpu_mm =3D READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)); + if (__pcp_rss_mm_unmark(cpu_mm) !=3D mm) + continue; + + /* + * If not being flusehd, try read-in the counter and mark it NULL, + * once cache's mm is set NULL, counter are considered invalided + */ + if (cpu_mm !=3D __pcp_rss_mm_mark(cpu_mm)) { + long count[NR_MM_COUNTERS]; =20 - for (i =3D 0; i < NR_MM_COUNTERS; i++) { - long x =3D atomic_long_read(&mm->rss_stat.count[i]); + for (int i =3D 0; i < NR_MM_COUNTERS; i++) + count[i] =3D READ_ONCE(per_cpu(cpu_rss_cache.count[i], cpu)); =20 - if (unlikely(x)) + /* + * If successfully set to NULL, the owner CPU is not flushing it, count= ers + * are uncommiteed and untouched during this period, since a dying mm w= on't + * be accouted anymore + */ + cpu_mm =3D cmpxchg(&per_cpu(cpu_rss_cache.mm, cpu), mm, NULL); + if (cpu_mm =3D=3D mm) { + for (int i =3D 0; i < NR_MM_COUNTERS; i++) + cached_count[i] +=3D count[i]; + continue; + } + } + + /* It's being flushed, just busy wait as the critial section is really s= hort */ + do { + cpu_relax(); + cpu_mm =3D READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)); + } while (cpu_mm =3D=3D __pcp_rss_mm_mark(mm)); + } + + for (int i =3D 0; i < NR_MM_COUNTERS; i++) { + long val =3D atomic_long_read(&mm->rss_stat.count[i]); + + val +=3D cached_count[i]; + + if (unlikely(val)) { pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld\n", - mm, resident_page_types[i], x); + mm, resident_page_types[i], val); + } } +} + +void check_discard_mm(struct mm_struct *mm) +{ + BUILD_BUG_ON_MSG(ARRAY_SIZE(resident_page_types) !=3D NR_MM_COUNTERS, + "Please make sure 'struct resident_page_types[]' is updated as well"); =20 if (mm_pgtables_bytes(mm)) pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n", @@ -209,6 +388,8 @@ void check_mm(struct mm_struct *mm) #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS VM_BUG_ON_MM(mm->pmd_huge_pte, mm); #endif + + check_discard_rss_cache(mm); } =20 /* --=20 2.35.2 From nobody Wed Apr 15 00:02:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A45E2C04A68 for ; Thu, 28 Jul 2022 20:45:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233147AbiG1Upq (ORCPT ); Thu, 28 Jul 2022 16:45:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232760AbiG1Upe (ORCPT ); Thu, 28 Jul 2022 16:45:34 -0400 Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 39BCD6B248 for ; Thu, 28 Jul 2022 13:45:30 -0700 (PDT) Received: by mail-pg1-x532.google.com with SMTP id 72so2469930pge.0 for ; Thu, 28 Jul 2022 13:45:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=PxrkjbDitHfD1d97Y8UhQKa79eOpeGRQUks9O/4b63c=; b=n16KoPlvBzd95MUn2w14Fg7+fI/TRVKK4xnSvA1Ck0fepbOeZl/M1zr2wzFKrwzRoq hgiUJNCFAhbYqljsgcuOFgUMTqg2+GGDVdxORVxkRtRM5ukuktVVziFfmfOiI1dDDILQ n308kWjnEH/SwP73po2piDXPGtObpbKY0zQLIu0bNzlHTs+TIt4545L7oDmfDKOmsb0C XTM7XFMODx5xPDR18Ao7XwDiOQ51/ooNyCvekRdjy0WNutaBFsNfvk+XVar9aHhoib09 FfYr3yRD0X8UgeNpPGScXUD9sMNcjHqLVvRQ4o3MhqABKodGl6F3wZKKcJ8vytC4mWjU LGDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=PxrkjbDitHfD1d97Y8UhQKa79eOpeGRQUks9O/4b63c=; b=rQjgAk1+S7EKmt1/Df3muvdtzuOTBr4b65z350Sn7VIoDwRYfTr43bcDO1AfRb85wY px5i8moZ8BgTdc/c47Q1A6a0lxZq1k7cqZbHVrOkTw1bqIUOsJDsDytyec2OirIfZvc7 kBrYj5YFo2RK8hIlx1uzqY6j8ZrriUz5g//bGXuMq5quwqSOIC2lAmyodUEKHN5hG6aU zvQUDwe+rMVbIoY4LBEuJjXzrpgair+C9hRzhmA5JMR6jODGCuUKbLbML7eYFUO1RBtC B0aU8L4pr2pU1QzXcDlXs3gFGkK4oTmz0ARwOBMTdSUolFUD4HnN1j1cI84hBQshg+iz eokw== X-Gm-Message-State: AJIora+/I3lpTPsdkF2i7NhSStqtlZ/iJ8qsCwAcEaAbBDiRnITx5Es8 45p2YJs2N4iPQ+i7IKIjbZw= X-Google-Smtp-Source: AGRyM1tXOVFNxoBFYbFBHG8eOFa0LJHrKzg+SUCMnYmaLeI2rBEw1bFu/iUgd6WGY5hnPF0K96m9AQ== X-Received: by 2002:a63:4004:0:b0:41b:64ff:7fe2 with SMTP id n4-20020a634004000000b0041b64ff7fe2mr419118pga.172.1659041129567; Thu, 28 Jul 2022 13:45:29 -0700 (PDT) Received: from KASONG-MB0.tencent.com ([114.254.3.190]) by smtp.gmail.com with ESMTPSA id 21-20020a170902c11500b0016c40f8cb58sm1787304pli.81.2022.07.28.13.45.27 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 28 Jul 2022 13:45:29 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Kairui Song Subject: [RFC PATCH 5/7] mm: try use fast path for pmd setting as well Date: Fri, 29 Jul 2022 04:45:09 +0800 Message-Id: <20220728204511.56348-6-ryncsn@gmail.com> X-Mailer: git-send-email 2.35.2 In-Reply-To: <20220728204511.56348-1-ryncsn@gmail.com> References: <20220728204511.56348-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kairui Song Use the per-CPU RSS cache helper as much as possible. Signed-off-by: Kairui Song --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index f00f302143b6..09d7d193da51 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4419,7 +4419,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct pa= ge *page) if (write) entry =3D maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); =20 - add_mm_counter(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR); + add_mm_counter_fast(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR); page_add_file_rmap(page, vma, true); =20 /* --=20 2.35.2 From nobody Wed Apr 15 00:02:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE74CC04A68 for ; Thu, 28 Jul 2022 20:45:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233225AbiG1Upt (ORCPT ); Thu, 28 Jul 2022 16:45:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232683AbiG1Upj (ORCPT ); Thu, 28 Jul 2022 16:45:39 -0400 Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B8196C110 for ; Thu, 28 Jul 2022 13:45:33 -0700 (PDT) Received: by mail-pg1-x531.google.com with SMTP id e132so2436638pgc.5 for ; Thu, 28 Jul 2022 13:45:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=o9jWwsLlYZnS171yrNbtFKKGeaY8IhQRPNlofd9rtnc=; b=PC9ISe/Tv2ntpg8FscV4ebolxpt+AhXtqalCdJ/HM/qaDIdf1rxD2vBYVhSyfRLxvQ GHNNXjMEXEikSN1uQJEUc4Zhe57ufRlQnqw9TudVYtELg/+5ehUxd4ndJQJ5R6PRGeeT r+ufm10YR1mOAqOW40t42Qr6bFNcdDvwRbzl9OkfnTCh+iEnGxt/DT6muqSIGo96CuOf et+sFqcvlWWFb9bXbnS+xT11zv2W1sj53OfXveVBtRe+8b7qrKDj27Z/dZd3xwMT1a5G exxkBIZstIbrCsALmdrvZOuGiqwEUnFgdG1pJx0C5F7hTIdLNEc4O30Ottfvj3r5Ihoq Gx8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=o9jWwsLlYZnS171yrNbtFKKGeaY8IhQRPNlofd9rtnc=; b=MrlQwgAmALJonkpvHpWC/Jug+K7irc3YKLAyH9ulLX/VfYujHvw+4eEEqMOaaTBlgr 7FUrWGaVlDLRayHPfJatWgr3HOpSub8fJP+SeJKVS7ZctRPHArH1alAlDky2Kyg3HYbw l71/AHex4awtqIMZFtr8E5fJLJp9gLnvdxJZR5kkpr0clF7zCWaMs1h8LPgPPd+M0abE SlqjBR/Zr4uehYa2OzMW+0FdjXEAQZrMa1Rm8NHWSPttIwtW1gL/fubwxJTBWHlSTrmh MMOd6SULoUuPWF5PTKpjLHGbV9r7UAtcxaagpjL+s8DSQY5AkzMII8gY9fE7xbggmCJc aWPg== X-Gm-Message-State: AJIora8/Xpkn4+zIUCsvuo5LtcdDa2RrZwFoFaB4dH6nIiXthSFsnArV +EB61X1YWFlXgUts8Mtvwbg= X-Google-Smtp-Source: AGRyM1vsVwYmFD3XfCQrIG8QN1BCM3jR8q0DPVN7U5JQfZ3X2kYLzuo6hZlQfhMP+vyClpJD2IEHiw== X-Received: by 2002:a05:6a00:1d26:b0:52b:f8ab:6265 with SMTP id a38-20020a056a001d2600b0052bf8ab6265mr340380pfx.54.1659041131938; Thu, 28 Jul 2022 13:45:31 -0700 (PDT) Received: from KASONG-MB0.tencent.com ([114.254.3.190]) by smtp.gmail.com with ESMTPSA id 21-20020a170902c11500b0016c40f8cb58sm1787304pli.81.2022.07.28.13.45.30 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 28 Jul 2022 13:45:31 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Kairui Song Subject: [RFC PATCH 6/7] mm: introduce CONFIG_ARCH_PCP_RSS_USE_CPUMASK Date: Fri, 29 Jul 2022 04:45:10 +0800 Message-Id: <20220728204511.56348-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.35.2 In-Reply-To: <20220728204511.56348-1-ryncsn@gmail.com> References: <20220728204511.56348-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kairui Song If the arch related code can provide helpers to bind the RSS cache to mm_cpumask, then the syncing code can just rely on that instead of doing full CPU synchronization. This speed up the reading/mm_exit by a lot. Signed-off-by: Kairui Song --- arch/Kconfig | 3 ++ kernel/sched/core.c | 3 +- mm/memory.c | 94 ++++++++++++++++++++++++++++----------------- 3 files changed, 64 insertions(+), 36 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 71b9272acb28..8df45b6346ae 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1403,6 +1403,9 @@ config ARCH_HAS_ELFCORE_COMPAT config ARCH_HAS_PARANOID_L1D_FLUSH bool =20 +config ARCH_PCP_RSS_USE_CPUMASK + bool + config DYNAMIC_SIGFRAME bool =20 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 11df67bb52ee..6f7991caf24b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5143,7 +5143,8 @@ context_switch(struct rq *rq, struct task_struct *pre= v, prepare_lock_switch(rq, next, rf); =20 /* Cache new active_mm */ - switch_pcp_rss_cache_no_irq(next->active_mm); + if (!IS_ENABLED(CONFIG_ARCH_PCP_RSS_USE_CPUMASK)) + switch_pcp_rss_cache_no_irq(next->active_mm); =20 /* Here we just switch the register state and the stack. */ switch_to(prev, next, prev); diff --git a/mm/memory.c b/mm/memory.c index 09d7d193da51..a819009aa3e0 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -188,9 +188,16 @@ unsigned long get_mm_counter(struct mm_struct *mm, int= member) { int cpu; long ret, update, sync_count; + const struct cpumask *mm_mask; =20 ret =3D atomic_long_read(&mm->rss_stat.count[member]); - for_each_possible_cpu(cpu) { + + if (IS_ENABLED(CONFIG_ARCH_PCP_RSS_USE_CPUMASK)) + mm_mask =3D mm_cpumask(mm); + else + mm_mask =3D cpu_possible_mask; + + for_each_cpu(cpu, mm_mask) { if (READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)) !=3D mm) continue; sync_count =3D READ_ONCE(per_cpu(cpu_rss_cache.sync_count, cpu)); @@ -217,12 +224,18 @@ unsigned long get_mm_rss(struct mm_struct *mm) { int cpu; long ret, update, sync_count; + const struct cpumask *mm_mask; =20 ret =3D atomic_long_read(&mm->rss_stat.count[MM_FILEPAGES]), + atomic_long_read(&mm->rss_stat.count[MM_ANONPAGES]), + atomic_long_read(&mm->rss_stat.count[MM_SHMEMPAGES]); =20 - for_each_possible_cpu(cpu) { + if (IS_ENABLED(CONFIG_ARCH_PCP_RSS_USE_CPUMASK)) + mm_mask =3D mm_cpumask(mm); + else + mm_mask =3D cpu_possible_mask; + + for_each_cpu(cpu, mm_mask) { if (READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)) !=3D mm) continue; sync_count =3D READ_ONCE(per_cpu(cpu_rss_cache.sync_count, cpu)); @@ -266,10 +279,13 @@ void switch_pcp_rss_cache_no_irq(struct mm_struct *ne= xt_mm) if (cpu_mm =3D=3D NULL) goto commit_done; =20 - /* Race with check_discard_rss_cache */ - if (cpu_mm !=3D cmpxchg(this_cpu_ptr(&cpu_rss_cache.mm), cpu_mm, - __pcp_rss_mm_mark(cpu_mm))) - goto commit_done; + /* Arch will take care of cache invalidation */ + if (!IS_ENABLED(CONFIG_ARCH_PCP_RSS_USE_CPUMASK)) { + /* Race with check_discard_rss_cache */ + if (cpu_mm !=3D cmpxchg(this_cpu_ptr(&cpu_rss_cache.mm), cpu_mm, + __pcp_rss_mm_mark(cpu_mm))) + goto commit_done; + } =20 for (int i =3D 0; i < NR_MM_COUNTERS; i++) { count =3D this_cpu_read(cpu_rss_cache.count[i]); @@ -328,46 +344,54 @@ static void check_discard_rss_cache(struct mm_struct = *mm) long cached_count[NR_MM_COUNTERS] =3D { 0 }; struct mm_struct *cpu_mm; =20 - /* Invalidate the RSS cache on every CPU */ - for_each_possible_cpu(cpu) { - cpu_mm =3D READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)); - if (__pcp_rss_mm_unmark(cpu_mm) !=3D mm) - continue; - - /* - * If not being flusehd, try read-in the counter and mark it NULL, - * once cache's mm is set NULL, counter are considered invalided - */ - if (cpu_mm !=3D __pcp_rss_mm_mark(cpu_mm)) { - long count[NR_MM_COUNTERS]; - - for (int i =3D 0; i < NR_MM_COUNTERS; i++) - count[i] =3D READ_ONCE(per_cpu(cpu_rss_cache.count[i], cpu)); + /* Arch will take care of cache invalidation */ + if (!IS_ENABLED(CONFIG_ARCH_PCP_RSS_USE_CPUMASK)) { + /* Invalidate the RSS cache on every CPU */ + for_each_possible_cpu(cpu) { + cpu_mm =3D READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)); + if (__pcp_rss_mm_unmark(cpu_mm) !=3D mm) + continue; =20 /* - * If successfully set to NULL, the owner CPU is not flushing it, count= ers - * are uncommiteed and untouched during this period, since a dying mm w= on't - * be accouted anymore + * If not being flusehd, try read-in the counter and mark it NULL, + * once cache's mm is set NULL, counter are considered invalided. */ - cpu_mm =3D cmpxchg(&per_cpu(cpu_rss_cache.mm, cpu), mm, NULL); - if (cpu_mm =3D=3D mm) { + if (cpu_mm !=3D __pcp_rss_mm_mark(cpu_mm)) { + long count[NR_MM_COUNTERS]; + for (int i =3D 0; i < NR_MM_COUNTERS; i++) - cached_count[i] +=3D count[i]; - continue; + count[i] =3D READ_ONCE(per_cpu(cpu_rss_cache.count[i], cpu)); + + /* + * If successfully set to NULL, the owner CPU is not flushing it, + * counters are uncommitted and untouched during this period, since + * a dying mm won't be accouted anymore. + */ + cpu_mm =3D cmpxchg(&per_cpu(cpu_rss_cache.mm, cpu), mm, NULL); + if (cpu_mm =3D=3D mm) { + for (int i =3D 0; i < NR_MM_COUNTERS; i++) + cached_count[i] +=3D count[i]; + continue; + } } - } =20 - /* It's being flushed, just busy wait as the critial section is really s= hort */ - do { - cpu_relax(); - cpu_mm =3D READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)); - } while (cpu_mm =3D=3D __pcp_rss_mm_mark(mm)); + /* + * It's being flushed, just busy wait as the critial section + * is really short. + */ + do { + cpu_relax(); + cpu_mm =3D READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)); + } while (cpu_mm =3D=3D __pcp_rss_mm_mark(mm)); + } } =20 for (int i =3D 0; i < NR_MM_COUNTERS; i++) { long val =3D atomic_long_read(&mm->rss_stat.count[i]); =20 - val +=3D cached_count[i]; + if (!IS_ENABLED(CONFIG_ARCH_PCP_RSS_USE_CPUMASK)) { + val +=3D cached_count[i]; + } =20 if (unlikely(val)) { pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld\n", --=20 2.35.2 From nobody Wed Apr 15 00:02:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD091C04A68 for ; Thu, 28 Jul 2022 20:46:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233404AbiG1UqJ (ORCPT ); Thu, 28 Jul 2022 16:46:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232986AbiG1Upl (ORCPT ); Thu, 28 Jul 2022 16:45:41 -0400 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D136D3A49A for ; Thu, 28 Jul 2022 13:45:34 -0700 (PDT) Received: by mail-pl1-x630.google.com with SMTP id t2so2814295ply.2 for ; Thu, 28 Jul 2022 13:45:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=gfO26u2xpyVmEWxjONFXEHf40ylM6r3kRxMGtINMS28=; b=CqUpKZVcHUdGhidYMYjokkBi17q5SK18jd7MhWe5pwlpQrL2nQqeN19YTR0WYDAFbP CMXhllFoexuWnVPLj4OetWCIXeI9nvG6zUlojn7oUojsxyNpgy7iF7r1rfVjsYa0ze71 xnb7t4vhiEViH6GAuV16Z31aSTVXUb5Gw00YwzsqcVrbtgZvjF3zYtKzugy+OFOVYce+ XTYTpiiV5Uqu8E6hZ+/VnjofZ9wyy0blnM4F2o3or+2R2ISS15tzVuSyLZVzFLexsber PTxe/D7DN/AFX9JYEBisHhtlBriUSIZuluLk2DWot1+vXvb6yB/2/At1UPMkYKaOwcLm PJmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=gfO26u2xpyVmEWxjONFXEHf40ylM6r3kRxMGtINMS28=; b=rsbrXMRyv+3XHimQxr8lVsKB7iMOtZ4jcSmk2cfPoIlkz3U9RaN+j4DZCFv2ZalpLs 9jkAR5ZACTZFCxZm3j/Sm6ZIIHGdP0lqFzHlXY/WPcNb9f8CdRkyxsZL69hEnfpIARak xvepQcFNWoB+PlbrKCWSL1z4aZAeI7k1g0sSMV8hrCt8Ao5nz73YHVrR6jCkiWKtksXs 7BsFxW6lg+8HAdng7fiqx0Q3NQ+dv9cM/R8fUce1H+wmRxTSY6uFhJu9RQjXjKo1B3v8 TcWLwQHqqOLmW03QJCkUZlTAjaH4FUua2kAAE+b5LX/QEC9OR6yq9R0ke3tewy6nGnm4 9XEA== X-Gm-Message-State: ACgBeo3+5auZg/vkoVjZDVS6zHVNcdxoHWG5o/UqIPetTKUih+QHRaJG NAC95y+23ZgTBU8QTBDnVh0= X-Google-Smtp-Source: AA6agR65b9BvIPJJ2aIA34xLNFyBah7YuqJvpKELG7QwV25Vml1sxFR036NxwJ4Xi5LcGcZNJFfljw== X-Received: by 2002:a17:90a:17e1:b0:1f2:2ff2:6cae with SMTP id q88-20020a17090a17e100b001f22ff26caemr1175172pja.196.1659041134124; Thu, 28 Jul 2022 13:45:34 -0700 (PDT) Received: from KASONG-MB0.tencent.com ([114.254.3.190]) by smtp.gmail.com with ESMTPSA id 21-20020a170902c11500b0016c40f8cb58sm1787304pli.81.2022.07.28.13.45.32 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 28 Jul 2022 13:45:33 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Kairui Song Subject: [RFC PATCH 7/7] x86_64/tlb, mm: enable cpumask optimzation for RSS cache Date: Fri, 29 Jul 2022 04:45:11 +0800 Message-Id: <20220728204511.56348-8-ryncsn@gmail.com> X-Mailer: git-send-email 2.35.2 In-Reply-To: <20220728204511.56348-1-ryncsn@gmail.com> References: <20220728204511.56348-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kairui Song Enable CONFIG_ARCH_PCP_RSS_USE_CPUMASK for x86_64, we do a RSS cache switch in switch_mm_irqs_off. On x86_64 this is the unified routine for switching a mm, so hook into this can make sure any dead mm will have their cache invalidated in time, and cpumask is synced with cache state. Signed-off-by: Kairui Song --- arch/x86/Kconfig | 1 + arch/x86/mm/tlb.c | 5 +++++ 2 files changed, 6 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 52a7f91527fe..15e2b29ba972 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -125,6 +125,7 @@ config X86 select ARCH_WANT_LD_ORPHAN_WARN select ARCH_WANTS_THP_SWAP if X86_64 select ARCH_HAS_PARANOID_L1D_FLUSH + select ARCH_PCP_RSS_USE_CPUMASK if X86_64 select BUILDTIME_TABLE_SORT select CLKEVT_I8253 select CLOCKSOURCE_VALIDATE_LAST_CYCLE diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index d400b6d9d246..614865f94d85 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -597,6 +597,11 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct= mm_struct *next, */ cond_mitigation(tsk); =20 + /* + * Flush RSS cache before clear up the bitmask + */ + switch_pcp_rss_cache_no_irq(next); + /* * Stop remote flushes for the previous mm. * Skip kernel threads; we never send init_mm TLB flushing IPIs, --=20 2.35.2