From nobody Fri May 8 03:09:18 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33808C433F5 for ; Thu, 12 May 2022 04:48:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348350AbiELEsJ (ORCPT ); Thu, 12 May 2022 00:48:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348062AbiELEr0 (ORCPT ); Thu, 12 May 2022 00:47:26 -0400 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65AE327CCF for ; Wed, 11 May 2022 21:47:23 -0700 (PDT) Received: by mail-pf1-x432.google.com with SMTP id d25so3739441pfo.10 for ; Wed, 11 May 2022 21:47:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Ct/r82Wcunw1PERZ1nRBl5tbVAIAGrsJEk78L35mllQ=; b=3Kt71VQFt5+H6wMDXcQrWrnCxnzCZBjHtsxKRzNB95j7znLjPuLK++5D8uvybr8MS+ tcc3KXE7GoRh+BYYlP4dodIS35VgiHCRoXByb7ba9ptdngmSirBxM33auyqfJvgBJgSg A32CLH27+38VKji38lVGWGUkvCPdZllFtgyWKPvaPtKguChdZjnwUzbxqYK9efIp+oNE L1u6yAx31be4uNEXI8EBzO2Y0DB6Zu15TNRu0sKpor01fcVnsbVvI31LY7CZDvU2gh+/ rHnjg6EarIgSP5eXgQzkTBXy8lghXGpVijuvWN70AD1d4ViAYgkVB55tgSHh3G1sihWY EEjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Ct/r82Wcunw1PERZ1nRBl5tbVAIAGrsJEk78L35mllQ=; b=0gj1g7g4FVh/SJwyteLXWARMd6o5SAuAwGkzvCeDQfutCvpuJU8Ed1lOFGrS8jSUAk J5pxM+M6M5IZbUNcFU4r0+vh/UUTQ83IW0K5cJZQqMknpH3ParxobAsCTnl62EIua+yk eDMDjt97a4xK1SY6975KT1+3Z0hP/njQzjw6f/3dP6CHLAKD2Vo0lkfdYpiOV66JgV2P JnFY5cbR2zkK1oARXc/YQYJXSNqMso5jzCUv4UpEZI5RtjKBKQE6p6+xbS2DtgmgTHiB Tk+43EZURx/9OrCbuxwsCPHcSv5ENA1NFvIk6cW1BJh8hoLyy/m29Di+T3bc3HqLkVb3 5uyg== X-Gm-Message-State: AOAM533Vk77uKHlQ3K7KpV74GYHBuU4n+uVDTr0DLhVICxukCJ293ac6 b0dsgQbWSOvOOwpAwMpSKJ+0Ug== X-Google-Smtp-Source: ABdhPJxpeZVYA4mVH6kYEhlOoPntmKhHa7TvT04NF4ot6lh5IboMSrPfSEIpfVDX8vyYODG9jd6Ddg== X-Received: by 2002:a63:5264:0:b0:3db:6ee:7c0a with SMTP id s36-20020a635264000000b003db06ee7c0amr7652894pgl.100.1652330842750; Wed, 11 May 2022 21:47:22 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id 5-20020a170902e9c500b0015edc07dcf3sm2790824plk.21.2022.05.11.21.47.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 May 2022 21:47:22 -0700 (PDT) From: Gang Li To: akpm@linux-foundation.org Cc: songmuchun@bytedance.com, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, ebiederm@xmission.com, keescook@chromium.org, viro@zeniv.linux.org.uk, rostedt@goodmis.org, mingo@redhat.com, peterz@infradead.org, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, david@redhat.com, imbrenda@linux.ibm.com, apopple@nvidia.com, adobriyan@gmail.com, stephen.s.brennan@oracle.com, ohoono.kwon@samsung.com, haolee.swjtu@gmail.com, kaleshsingh@google.com, zhengqi.arch@bytedance.com, peterx@redhat.com, shy828301@gmail.com, surenb@google.com, ccross@google.com, vincent.whitchurch@axis.com, tglx@linutronix.de, bigeasy@linutronix.de, fenghua.yu@intel.com, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, Gang Li Subject: [PATCH 1/5 v1] mm: add a new parameter `node` to `get/add/inc/dec_mm_counter` Date: Thu, 12 May 2022 12:46:30 +0800 Message-Id: <20220512044634.63586-2-ligang.bdlg@bytedance.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220512044634.63586-1-ligang.bdlg@bytedance.com> References: <20220512044634.63586-1-ligang.bdlg@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add a new parameter `node` to mm_counter for counting per process per node rss. Since pages can be migrated between nodes, `remove_migration_pte` also needs to call `add_mm_counter`. Notice that the `MM_SWAPENTS` doesn't exist on any node. So when add_mm_counter is used to modify rss_stat.count[MM_SWAPENTS], its `node` field should be `NUMA_NO_NODE`. And there is no need to modify `resident_page_types`, because `MM_NO_TYPE` is not used in `check_mm`. Signed-off-by: Gang Li --- arch/s390/mm/pgtable.c | 4 +- fs/exec.c | 2 +- fs/proc/task_mmu.c | 14 +++--- include/linux/mm.h | 14 +++--- include/linux/mm_types_task.h | 10 ++++ kernel/events/uprobes.c | 6 +-- mm/huge_memory.c | 13 ++--- mm/khugepaged.c | 4 +- mm/ksm.c | 2 +- mm/madvise.c | 2 +- mm/memory.c | 91 +++++++++++++++++++++++------------ mm/migrate.c | 2 + mm/migrate_device.c | 2 +- mm/oom_kill.c | 16 +++--- mm/rmap.c | 16 +++--- mm/swapfile.c | 4 +- mm/userfaultfd.c | 2 +- 17 files changed, 124 insertions(+), 80 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 697df02362af..d44198c5929f 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -703,11 +703,11 @@ void ptep_unshadow_pte(struct mm_struct *mm, unsigned= long saddr, pte_t *ptep) static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry) { if (!non_swap_entry(entry)) - dec_mm_counter(mm, MM_SWAPENTS); + dec_mm_counter(mm, MM_SWAPENTS, NUMA_NO_NODE); else if (is_migration_entry(entry)) { struct page *page =3D pfn_swap_entry_to_page(entry); =20 - dec_mm_counter(mm, mm_counter(page)); + dec_mm_counter(mm, mm_counter(page), page_to_nid(page)); } free_swap_and_cache(entry); } diff --git a/fs/exec.c b/fs/exec.c index e3e55d5e0be1..6c82393b1720 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -192,7 +192,7 @@ static void acct_arg_size(struct linux_binprm *bprm, un= signed long pages) return; =20 bprm->vma_pages =3D pages; - add_mm_counter(mm, MM_ANONPAGES, diff); + add_mm_counter(mm, MM_ANONPAGES, diff, NUMA_NO_NODE); } =20 static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long = pos, diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f46060eb91b5..5cf65327fa6d 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -33,9 +33,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) unsigned long text, lib, swap, anon, file, shmem; unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss; =20 - anon =3D get_mm_counter(mm, MM_ANONPAGES); - file =3D get_mm_counter(mm, MM_FILEPAGES); - shmem =3D get_mm_counter(mm, MM_SHMEMPAGES); + anon =3D get_mm_counter(mm, MM_ANONPAGES, NUMA_NO_NODE); + file =3D get_mm_counter(mm, MM_FILEPAGES, NUMA_NO_NODE); + shmem =3D get_mm_counter(mm, MM_SHMEMPAGES, NUMA_NO_NODE); =20 /* * Note: to minimize their overhead, mm maintains hiwater_vm and @@ -56,7 +56,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) text =3D min(text, mm->exec_vm << PAGE_SHIFT); lib =3D (mm->exec_vm << PAGE_SHIFT) - text; =20 - swap =3D get_mm_counter(mm, MM_SWAPENTS); + swap =3D get_mm_counter(mm, MM_SWAPENTS, NUMA_NO_NODE); SEQ_PUT_DEC("VmPeak:\t", hiwater_vm); SEQ_PUT_DEC(" kB\nVmSize:\t", total_vm); SEQ_PUT_DEC(" kB\nVmLck:\t", mm->locked_vm); @@ -89,12 +89,12 @@ unsigned long task_statm(struct mm_struct *mm, unsigned long *shared, unsigned long *text, unsigned long *data, unsigned long *resident) { - *shared =3D get_mm_counter(mm, MM_FILEPAGES) + - get_mm_counter(mm, MM_SHMEMPAGES); + *shared =3D get_mm_counter(mm, MM_FILEPAGES, NUMA_NO_NODE) + + get_mm_counter(mm, MM_SHMEMPAGES, NUMA_NO_NODE); *text =3D (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> PAGE_SHIFT; *data =3D mm->data_vm + mm->stack_vm; - *resident =3D *shared + get_mm_counter(mm, MM_ANONPAGES); + *resident =3D *shared + get_mm_counter(mm, MM_ANONPAGES, NUMA_NO_NODE); return mm->total_vm; } =20 diff --git a/include/linux/mm.h b/include/linux/mm.h index 9f44254af8ce..1b6c2e912ec8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1992,7 +1992,7 @@ static inline bool get_user_page_fast_only(unsigned l= ong addr, /* * per-process(per-mm_struct) statistics. */ -static inline unsigned long get_mm_counter(struct mm_struct *mm, int membe= r) +static inline unsigned long get_mm_counter(struct mm_struct *mm, int membe= r, int node) { long val =3D atomic_long_read(&mm->rss_stat.count[member]); =20 @@ -2009,21 +2009,21 @@ static inline unsigned long get_mm_counter(struct m= m_struct *mm, int member) =20 void mm_trace_rss_stat(struct mm_struct *mm, int member, long count); =20 -static inline void add_mm_counter(struct mm_struct *mm, int member, long v= alue) +static inline void add_mm_counter(struct mm_struct *mm, int member, long v= alue, int node) { long count =3D atomic_long_add_return(value, &mm->rss_stat.count[member]); =20 mm_trace_rss_stat(mm, member, count); } =20 -static inline void inc_mm_counter(struct mm_struct *mm, int member) +static inline void inc_mm_counter(struct mm_struct *mm, int member, int no= de) { long count =3D atomic_long_inc_return(&mm->rss_stat.count[member]); =20 mm_trace_rss_stat(mm, member, count); } =20 -static inline void dec_mm_counter(struct mm_struct *mm, int member) +static inline void dec_mm_counter(struct mm_struct *mm, int member, int no= de) { long count =3D atomic_long_dec_return(&mm->rss_stat.count[member]); =20 @@ -2047,9 +2047,9 @@ static inline int mm_counter(struct page *page) =20 static inline unsigned long get_mm_rss(struct mm_struct *mm) { - return get_mm_counter(mm, MM_FILEPAGES) + - get_mm_counter(mm, MM_ANONPAGES) + - get_mm_counter(mm, MM_SHMEMPAGES); + return get_mm_counter(mm, MM_FILEPAGES, NUMA_NO_NODE) + + get_mm_counter(mm, MM_ANONPAGES, NUMA_NO_NODE) + + get_mm_counter(mm, MM_SHMEMPAGES, NUMA_NO_NODE); } =20 static inline unsigned long get_mm_hiwater_rss(struct mm_struct *mm) diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index c1bc6731125c..3e7da8c7ab95 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -48,6 +48,16 @@ enum { NR_MM_COUNTERS }; =20 +/*=20 + * This macro should only be used in committing local values, like sync_mm= _rss, + * add_mm_rss_vec. It means don't count per-mm-type, only count per-node in + * mm_stat. + *=20 + * `MM_NO_TYPE` must equals to `NR_MM_COUNTERS`, since we will use it in=20 + * `TRACE_MM_PAGES`. + */ +#define MM_NO_TYPE NR_MM_COUNTERS + #if USE_SPLIT_PTE_PTLOCKS && defined(CONFIG_MMU) #define SPLIT_RSS_COUNTING /* per-thread cached information, */ diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 6418083901d4..f8cd234084fe 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -184,11 +184,11 @@ static int __replace_page(struct vm_area_struct *vma,= unsigned long addr, lru_cache_add_inactive_or_unevictable(new_page, vma); } else /* no new page, just dec_mm_counter for old_page */ - dec_mm_counter(mm, MM_ANONPAGES); + dec_mm_counter(mm, MM_ANONPAGES, page_to_nid(old_page)); =20 if (!PageAnon(old_page)) { - dec_mm_counter(mm, mm_counter_file(old_page)); - inc_mm_counter(mm, MM_ANONPAGES); + dec_mm_counter(mm, mm_counter_file(old_page), page_to_nid(old_page)); + inc_mm_counter(mm, MM_ANONPAGES, page_to_nid(new_page)); } =20 flush_cache_page(vma, addr, pte_pfn(*pvmw.pte)); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c468fee595ff..b2c0fd668d01 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -652,7 +652,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct v= m_fault *vmf, pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable); set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry); update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); - add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR, page_to_nid(page)= ); mm_inc_nr_ptes(vma->vm_mm); spin_unlock(vmf->ptl); count_vm_event(THP_FAULT_ALLOC); @@ -1064,7 +1064,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm= _struct *src_mm, pmd =3D pmd_swp_mkuffd_wp(pmd); set_pmd_at(src_mm, addr, src_pmd, pmd); } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR, page_to_nid(pmd_page(= *dst_pmd))); mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); if (!userfaultfd_wp(dst_vma)) @@ -1114,7 +1114,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm= _struct *src_mm, =20 get_page(src_page); page_dup_rmap(src_page, true); - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR, page_to_nid(src_page)); out_zero_page: mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); @@ -1597,11 +1597,12 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_= area_struct *vma, =20 if (PageAnon(page)) { zap_deposited_table(tlb->mm, pmd); - add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); + add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR, page_to_nid(page)); } else { if (arch_needs_pgtable_deposit()) zap_deposited_table(tlb->mm, pmd); - add_mm_counter(tlb->mm, mm_counter_file(page), -HPAGE_PMD_NR); + add_mm_counter(tlb->mm, mm_counter_file(page), -HPAGE_PMD_NR, + page_to_nid(page)); } =20 spin_unlock(ptl); @@ -1981,7 +1982,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, page_remove_rmap(page, vma, true); put_page(page); } - add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); + add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR, page_to_nid(pag= e)); return; } =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a4e5eaf3eb01..3ceaae2c24c0 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -742,7 +742,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struc= t page *page, =20 if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { clear_user_highpage(page, address); - add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1, page_to_nid(page)); if (is_zero_pfn(pte_pfn(pteval))) { /* * ptl mostly unnecessary. @@ -1510,7 +1510,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, un= signed long addr) /* step 3: set proper refcount and mm_counters. */ if (count) { page_ref_sub(hpage, count); - add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count); + add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count, page_to_nid(h= page)); } =20 /* step 4: collapse pmd */ diff --git a/mm/ksm.c b/mm/ksm.c index 063a48eeb5ee..1185fa086a31 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1161,7 +1161,7 @@ static int replace_page(struct vm_area_struct *vma, s= truct page *page, * will get wrong values in /proc, and a BUG message in dmesg * when tearing down the mm. */ - dec_mm_counter(mm, MM_ANONPAGES); + dec_mm_counter(mm, MM_ANONPAGES, page_to_nid(page)); } =20 flush_cache_page(vma, addr, pte_pfn(*ptep)); diff --git a/mm/madvise.c b/mm/madvise.c index 1873616a37d2..819a1cf47d7d 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -704,7 +704,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, if (current->mm =3D=3D mm) sync_mm_rss(mm); =20 - add_mm_counter(mm, MM_SWAPENTS, nr_swap); + add_mm_counter(mm, MM_SWAPENTS, nr_swap, NUMA_NO_NODE); } arch_leave_lazy_mmu_mode(); pte_unmap_unlock(orig_pte, ptl); diff --git a/mm/memory.c b/mm/memory.c index 76e3af9639d9..adb07fb0b483 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -158,6 +158,8 @@ EXPORT_SYMBOL(zero_pfn); =20 unsigned long highest_memmap_pfn __read_mostly; =20 +static DEFINE_PER_CPU(int, percpu_numa_rss[MAX_NUMNODES]); + /* * CONFIG_MMU architectures set up ZERO_PAGE in their paging_init() */ @@ -181,24 +183,24 @@ void sync_mm_rss(struct mm_struct *mm) =20 for (i =3D 0; i < NR_MM_COUNTERS; i++) { if (current->rss_stat.count[i]) { - add_mm_counter(mm, i, current->rss_stat.count[i]); + add_mm_counter(mm, i, current->rss_stat.count[i], NUMA_NO_NODE); current->rss_stat.count[i] =3D 0; } } current->rss_stat.events =3D 0; } =20 -static void add_mm_counter_fast(struct mm_struct *mm, int member, int val) +static void add_mm_counter_fast(struct mm_struct *mm, int member, int val,= int node) { struct task_struct *task =3D current; =20 if (likely(task->mm =3D=3D mm)) task->rss_stat.count[member] +=3D val; else - add_mm_counter(mm, member, val); + add_mm_counter(mm, member, val, node); } -#define inc_mm_counter_fast(mm, member) add_mm_counter_fast(mm, member, 1) -#define dec_mm_counter_fast(mm, member) add_mm_counter_fast(mm, member, -1) +#define inc_mm_counter_fast(mm, member, node) add_mm_counter_fast(mm, memb= er, 1, node) +#define dec_mm_counter_fast(mm, member, node) add_mm_counter_fast(mm, memb= er, -1, node) =20 /* sync counter once per 64 page faults */ #define TASK_RSS_EVENTS_THRESH (64) @@ -211,8 +213,8 @@ static void check_sync_rss_stat(struct task_struct *tas= k) } #else /* SPLIT_RSS_COUNTING */ =20 -#define inc_mm_counter_fast(mm, member) inc_mm_counter(mm, member) -#define dec_mm_counter_fast(mm, member) dec_mm_counter(mm, member) +#define inc_mm_counter_fast(mm, member, node) inc_mm_counter(mm, member, n= ode) +#define dec_mm_counter_fast(mm, member, node) dec_mm_counter(mm, member, n= ode) =20 static void check_sync_rss_stat(struct task_struct *task) { @@ -490,12 +492,13 @@ int __pte_alloc_kernel(pmd_t *pmd) return 0; } =20 -static inline void init_rss_vec(int *rss) +static inline void init_rss_vec(int *rss, int *numa_rss) { memset(rss, 0, sizeof(int) * NR_MM_COUNTERS); + memset(numa_rss, 0, sizeof(int) * num_possible_nodes()); } =20 -static inline void add_mm_rss_vec(struct mm_struct *mm, int *rss) +static inline void add_mm_rss_vec(struct mm_struct *mm, int *rss, int *num= a_rss) { int i; =20 @@ -503,7 +506,7 @@ static inline void add_mm_rss_vec(struct mm_struct *mm,= int *rss) sync_mm_rss(mm); for (i =3D 0; i < NR_MM_COUNTERS; i++) if (rss[i]) - add_mm_counter(mm, i, rss[i]); + add_mm_counter(mm, i, rss[i], NUMA_NO_NODE); } =20 /* @@ -771,7 +774,8 @@ try_restore_exclusive_pte(pte_t *src_pte, struct vm_are= a_struct *vma, static unsigned long copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *dst_vma, - struct vm_area_struct *src_vma, unsigned long addr, int *rss) + struct vm_area_struct *src_vma, unsigned long addr, int *rss, + int *numa_rss) { unsigned long vm_flags =3D dst_vma->vm_flags; pte_t pte =3D *src_pte; @@ -791,10 +795,12 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct = mm_struct *src_mm, spin_unlock(&mmlist_lock); } rss[MM_SWAPENTS]++; + numa_rss[page_to_nid(pte_page(*dst_pte))]++; } else if (is_migration_entry(entry)) { page =3D pfn_swap_entry_to_page(entry); =20 rss[mm_counter(page)]++; + numa_rss[page_to_nid(page)]++; =20 if (is_writable_migration_entry(entry) && is_cow_mapping(vm_flags)) { @@ -825,6 +831,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm= _struct *src_mm, */ get_page(page); rss[mm_counter(page)]++; + numa_rss[page_to_nid(page)]++; page_dup_rmap(page, false); =20 /* @@ -884,7 +891,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm= _struct *src_mm, static inline int copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *s= rc_vma, pte_t *dst_pte, pte_t *src_pte, unsigned long addr, int *rss, - struct page **prealloc, pte_t pte, struct page *page) + struct page **prealloc, pte_t pte, struct page *page, int *numa_rss) { struct page *new_page; =20 @@ -918,6 +925,7 @@ copy_present_page(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma page_add_new_anon_rmap(new_page, dst_vma, addr, false); lru_cache_add_inactive_or_unevictable(new_page, dst_vma); rss[mm_counter(new_page)]++; + rss[page_to_nid(new_page)]++; =20 /* All done, just insert the new page copy in the child */ pte =3D mk_pte(new_page, dst_vma->vm_page_prot); @@ -936,7 +944,7 @@ copy_present_page(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma static inline int copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *sr= c_vma, pte_t *dst_pte, pte_t *src_pte, unsigned long addr, int *rss, - struct page **prealloc) + struct page **prealloc, int *numa_rss) { struct mm_struct *src_mm =3D src_vma->vm_mm; unsigned long vm_flags =3D src_vma->vm_flags; @@ -948,13 +956,14 @@ copy_present_pte(struct vm_area_struct *dst_vma, stru= ct vm_area_struct *src_vma, int retval; =20 retval =3D copy_present_page(dst_vma, src_vma, dst_pte, src_pte, - addr, rss, prealloc, pte, page); + addr, rss, prealloc, pte, page, numa_rss); if (retval <=3D 0) return retval; =20 get_page(page); page_dup_rmap(page, false); rss[mm_counter(page)]++; + numa_rss[page_to_nid(page)]++; } =20 /* @@ -1012,12 +1021,16 @@ copy_pte_range(struct vm_area_struct *dst_vma, stru= ct vm_area_struct *src_vma, spinlock_t *src_ptl, *dst_ptl; int progress, ret =3D 0; int rss[NR_MM_COUNTERS]; + int *numa_rss; swp_entry_t entry =3D (swp_entry_t){0}; struct page *prealloc =3D NULL; + numa_rss =3D kcalloc(num_possible_nodes(), sizeof(int), GFP_KERNEL); + if (unlikely(!numa_rss)) + numa_rss =3D (int *)get_cpu_ptr(&percpu_numa_rss); =20 again: progress =3D 0; - init_rss_vec(rss); + init_rss_vec(rss, numa_rss); =20 dst_pte =3D pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl); if (!dst_pte) { @@ -1050,7 +1063,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, ret =3D copy_nonpresent_pte(dst_mm, src_mm, dst_pte, src_pte, dst_vma, src_vma, - addr, rss); + addr, rss, numa_rss); if (ret =3D=3D -EIO) { entry =3D pte_to_swp_entry(*src_pte); break; @@ -1069,7 +1082,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, } /* copy_present_pte() will clear `*prealloc' if consumed */ ret =3D copy_present_pte(dst_vma, src_vma, dst_pte, src_pte, - addr, rss, &prealloc); + addr, rss, &prealloc, numa_rss); /* * If we need a pre-allocated page for this pte, drop the * locks, allocate, and try again. @@ -1092,7 +1105,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, arch_leave_lazy_mmu_mode(); spin_unlock(src_ptl); pte_unmap(orig_src_pte); - add_mm_rss_vec(dst_mm, rss); + add_mm_rss_vec(dst_mm, rss, numa_rss); pte_unmap_unlock(orig_dst_pte, dst_ptl); cond_resched(); =20 @@ -1121,6 +1134,10 @@ copy_pte_range(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma, out: if (unlikely(prealloc)) put_page(prealloc); + if (unlikely(numa_rss =3D=3D (int *)raw_cpu_ptr(&percpu_numa_rss))) + put_cpu_ptr(numa_rss); + else + kfree(numa_rss); return ret; } =20 @@ -1344,14 +1361,18 @@ static unsigned long zap_pte_range(struct mmu_gathe= r *tlb, struct mm_struct *mm =3D tlb->mm; int force_flush =3D 0; int rss[NR_MM_COUNTERS]; + int *numa_rss; spinlock_t *ptl; pte_t *start_pte; pte_t *pte; swp_entry_t entry; + numa_rss =3D kcalloc(num_possible_nodes(), sizeof(int), GFP_KERNEL); + if (unlikely(!numa_rss)) + numa_rss =3D (int *)get_cpu_ptr(&percpu_numa_rss); =20 tlb_change_page_size(tlb, PAGE_SIZE); again: - init_rss_vec(rss); + init_rss_vec(rss, numa_rss); start_pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); pte =3D start_pte; flush_tlb_batched_pending(mm); @@ -1386,6 +1407,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, mark_page_accessed(page); } rss[mm_counter(page)]--; + numa_rss[page_to_nid(page)]--; page_remove_rmap(page, vma, false); if (unlikely(page_mapcount(page) < 0)) print_bad_pte(vma, addr, ptent, page); @@ -1404,6 +1426,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, if (unlikely(!should_zap_page(details, page))) continue; rss[mm_counter(page)]--; + numa_rss[page_to_nid(page)]--; if (is_device_private_entry(entry)) page_remove_rmap(page, vma, false); put_page(page); @@ -1419,6 +1442,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, if (!should_zap_page(details, page)) continue; rss[mm_counter(page)]--; + numa_rss[page_to_nid(page)]--; } else if (is_hwpoison_entry(entry)) { if (!should_zap_cows(details)) continue; @@ -1429,7 +1453,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); =20 - add_mm_rss_vec(mm, rss); + add_mm_rss_vec(mm, rss, numa_rss); arch_leave_lazy_mmu_mode(); =20 /* Do the actual TLB flush before dropping ptl */ @@ -1453,6 +1477,10 @@ static unsigned long zap_pte_range(struct mmu_gather= *tlb, goto again; } =20 + if (unlikely(numa_rss =3D=3D (int *)raw_cpu_ptr(&percpu_numa_rss))) + put_cpu_ptr(numa_rss); + else + kfree(numa_rss); return addr; } =20 @@ -1767,7 +1795,7 @@ static int insert_page_into_pte_locked(struct vm_area= _struct *vma, pte_t *pte, return -EBUSY; /* Ok, finally just insert the thing.. */ get_page(page); - inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page)); + inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page), page_to_nid(page)); page_add_file_rmap(page, vma, false); set_pte_at(vma->vm_mm, addr, pte, mk_pte(page, prot)); return 0; @@ -3053,11 +3081,14 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) if (old_page) { if (!PageAnon(old_page)) { dec_mm_counter_fast(mm, - mm_counter_file(old_page)); - inc_mm_counter_fast(mm, MM_ANONPAGES); + mm_counter_file(old_page), page_to_nid(old_page)); + inc_mm_counter_fast(mm, MM_ANONPAGES, page_to_nid(new_page)); + } else { + dec_mm_counter_fast(mm, MM_ANONPAGES, page_to_nid(old_page)); + inc_mm_counter_fast(mm, MM_ANONPAGES, page_to_nid(new_page)); } } else { - inc_mm_counter_fast(mm, MM_ANONPAGES); + inc_mm_counter_fast(mm, MM_ANONPAGES, page_to_nid(new_page)); } flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); entry =3D mk_pte(new_page, vma->vm_page_prot); @@ -3685,8 +3716,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (should_try_to_free_swap(page, vma, vmf->flags)) try_to_free_swap(page); =20 - inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); - dec_mm_counter_fast(vma->vm_mm, MM_SWAPENTS); + inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES, page_to_nid(page)); + dec_mm_counter_fast(vma->vm_mm, MM_SWAPENTS, NUMA_NO_NODE); pte =3D mk_pte(page, vma->vm_page_prot); =20 /* @@ -3861,7 +3892,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) return handle_userfault(vmf, VM_UFFD_MISSING); } =20 - inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); + inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES, page_to_nid(page)); page_add_new_anon_rmap(page, vma, vmf->address, false); lru_cache_add_inactive_or_unevictable(page, vma); setpte: @@ -4002,7 +4033,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct pa= ge *page) if (write) entry =3D maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); =20 - add_mm_counter(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR, page_to_n= id(page)); page_add_file_rmap(page, vma, true); =20 /* @@ -4048,11 +4079,11 @@ void do_set_pte(struct vm_fault *vmf, struct page *= page, unsigned long addr) entry =3D maybe_mkwrite(pte_mkdirty(entry), vma); /* copy-on-write page */ if (write && !(vma->vm_flags & VM_SHARED)) { - inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); + inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES, page_to_nid(page)); page_add_new_anon_rmap(page, vma, addr, false); lru_cache_add_inactive_or_unevictable(page, vma); } else { - inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page)); + inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page), page_to_nid(page)= ); page_add_file_rmap(page, vma, false); } set_pte_at(vma->vm_mm, addr, vmf->pte, entry); diff --git a/mm/migrate.c b/mm/migrate.c index 6c31ee1e1c9b..8554c7a64928 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -253,6 +253,8 @@ static bool remove_migration_pte(struct folio *folio, =20 /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, pvmw.address, pvmw.pte); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, -compound_nr(old), page_to_nid(= old)); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, compound_nr(&folio->page), page= _to_nid(&folio->page)); } =20 return true; diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 70c7dc05bbfc..eedd053febd8 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -609,7 +609,7 @@ static void migrate_vma_insert_page(struct migrate_vma = *migrate, if (userfaultfd_missing(vma)) goto unlock_abort; =20 - inc_mm_counter(mm, MM_ANONPAGES); + inc_mm_counter(mm, MM_ANONPAGES, page_to_nid(page)); page_add_new_anon_rmap(page, vma, addr, false); if (!is_zone_device_page(page)) lru_cache_add_inactive_or_unevictable(page, vma); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 49d7df39b02d..757f5665ae94 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -227,7 +227,7 @@ long oom_badness(struct task_struct *p, unsigned long t= otalpages) * The baseline for the badness score is the proportion of RAM that each * task's rss, pagetable and swap space use. */ - points =3D get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + + points =3D get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS, NUMA_NO= _NODE) + mm_pgtables_bytes(p->mm) / PAGE_SIZE; task_unlock(p); =20 @@ -403,7 +403,7 @@ static int dump_task(struct task_struct *p, void *arg) task->pid, from_kuid(&init_user_ns, task_uid(task)), task->tgid, task->mm->total_vm, get_mm_rss(task->mm), mm_pgtables_bytes(task->mm), - get_mm_counter(task->mm, MM_SWAPENTS), + get_mm_counter(task->mm, MM_SWAPENTS, NUMA_NO_NODE), task->signal->oom_score_adj, task->comm); task_unlock(task); =20 @@ -593,9 +593,9 @@ static bool oom_reap_task_mm(struct task_struct *tsk, s= truct mm_struct *mm) =20 pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss= :%lukB, shmem-rss:%lukB\n", task_pid_nr(tsk), tsk->comm, - K(get_mm_counter(mm, MM_ANONPAGES)), - K(get_mm_counter(mm, MM_FILEPAGES)), - K(get_mm_counter(mm, MM_SHMEMPAGES))); + K(get_mm_counter(mm, MM_ANONPAGES, NUMA_NO_NODE)), + K(get_mm_counter(mm, MM_FILEPAGES, NUMA_NO_NODE)), + K(get_mm_counter(mm, MM_SHMEMPAGES, NUMA_NO_NODE))); out_finish: trace_finish_task_reaping(tsk->pid); out_unlock: @@ -917,9 +917,9 @@ static void __oom_kill_process(struct task_struct *vict= im, const char *message) mark_oom_victim(victim); pr_err("%s: Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-r= ss:%lukB, shmem-rss:%lukB, UID:%u pgtables:%lukB oom_score_adj:%hd\n", message, task_pid_nr(victim), victim->comm, K(mm->total_vm), - K(get_mm_counter(mm, MM_ANONPAGES)), - K(get_mm_counter(mm, MM_FILEPAGES)), - K(get_mm_counter(mm, MM_SHMEMPAGES)), + K(get_mm_counter(mm, MM_ANONPAGES, NUMA_NO_NODE)), + K(get_mm_counter(mm, MM_FILEPAGES, NUMA_NO_NODE)), + K(get_mm_counter(mm, MM_SHMEMPAGES, NUMA_NO_NODE)), from_kuid(&init_user_ns, task_uid(victim)), mm_pgtables_bytes(mm) >> 10, victim->signal->oom_score_adj); task_unlock(victim); diff --git a/mm/rmap.c b/mm/rmap.c index fedb82371efe..1566689476fc 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1549,7 +1549,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, pvmw.pte, pteval, vma_mmu_pagesize(vma)); } else { - dec_mm_counter(mm, mm_counter(&folio->page)); + dec_mm_counter(mm, mm_counter(&folio->page), page_to_nid(&folio->page)= ); set_pte_at(mm, address, pvmw.pte, pteval); } =20 @@ -1564,7 +1564,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, * migration) will not expect userfaults on already * copied pages. */ - dec_mm_counter(mm, mm_counter(&folio->page)); + dec_mm_counter(mm, mm_counter(&folio->page), page_to_nid(&folio->page)); /* We have to invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); @@ -1615,7 +1615,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, /* Invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); - dec_mm_counter(mm, MM_ANONPAGES); + dec_mm_counter(mm, MM_ANONPAGES, page_to_nid(&folio->page)); goto discard; } =20 @@ -1648,8 +1648,8 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, list_add(&mm->mmlist, &init_mm.mmlist); spin_unlock(&mmlist_lock); } - dec_mm_counter(mm, MM_ANONPAGES); - inc_mm_counter(mm, MM_SWAPENTS); + dec_mm_counter(mm, MM_ANONPAGES, page_to_nid(&folio->page)); + inc_mm_counter(mm, MM_SWAPENTS, NUMA_NO_NODE); swp_pte =3D swp_entry_to_pte(entry); if (pte_soft_dirty(pteval)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); @@ -1671,7 +1671,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, * * See Documentation/vm/mmu_notifier.rst */ - dec_mm_counter(mm, mm_counter_file(&folio->page)); + dec_mm_counter(mm, mm_counter_file(&folio->page), page_to_nid(&folio->p= age)); } discard: /* @@ -1896,7 +1896,7 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, pvmw.pte, pteval, vma_mmu_pagesize(vma)); } else { - dec_mm_counter(mm, mm_counter(&folio->page)); + dec_mm_counter(mm, mm_counter(&folio->page), page_to_nid(&folio->page)= ); set_pte_at(mm, address, pvmw.pte, pteval); } =20 @@ -1911,7 +1911,7 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, * migration) will not expect userfaults on already * copied pages. */ - dec_mm_counter(mm, mm_counter(&folio->page)); + dec_mm_counter(mm, mm_counter(&folio->page), page_to_nid(&folio->page)); /* We have to invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); diff --git a/mm/swapfile.c b/mm/swapfile.c index 63c61f8b2611..098bdb58109a 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1796,8 +1796,8 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_= t *pmd, goto out; } =20 - dec_mm_counter(vma->vm_mm, MM_SWAPENTS); - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); + dec_mm_counter(vma->vm_mm, MM_SWAPENTS, NUMA_NO_NODE); + inc_mm_counter(vma->vm_mm, MM_ANONPAGES, page_to_nid(page)); get_page(page); if (page =3D=3D swapcache) { page_add_anon_rmap(page, vma, addr, false); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e9bb6db002aa..0355285a3d6f 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -112,7 +112,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, = pmd_t *dst_pmd, * Must happen after rmap, as mm_counter() checks mapping (via * PageAnon()), which is set by __page_set_anon_rmap(). */ - inc_mm_counter(dst_mm, mm_counter(page)); + inc_mm_counter(dst_mm, mm_counter(page), page_to_nid(page)); =20 set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); =20 --=20 2.20.1 From nobody Fri May 8 03:09:18 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F922C433F5 for ; Thu, 12 May 2022 04:48:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348127AbiELEr7 (ORCPT ); Thu, 12 May 2022 00:47:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348122AbiELErj (ORCPT ); Thu, 12 May 2022 00:47:39 -0400 Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D317224F3F for ; Wed, 11 May 2022 21:47:37 -0700 (PDT) Received: by mail-pl1-x636.google.com with SMTP id i1so3803301plg.7 for ; Wed, 11 May 2022 21:47:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=mu0oeSRdhbh9VeZjyF6ZLYtewD09HIHqh0x41OUf5r4=; b=qnE+4npLf7k/552JUhxPioA2V3oZwkdvutxLoX/n6XSuqRN7fIBAHvmd53SRC1gfco rzzLHz3zgQAJDwh2IFamU7U2v5I3rRTL8xwlFFlTuF+kBCUIew/y0SfK5rHvdwXr90wR xn0+PpzAPDTbU/Kg/E8MP0GcLWP3e51IvM4fId1s0w2zJstOZt3rcXs6ZdSVEULGPIYf 7Y+tv4cAwzZ9E+P3NfR1YGOqdD5fPgADfJMaid5eRiVUMgWKVLWcs4IF8hPC43IAsqtd Ecjws63bHi/APHivVWVCT4UWPSPT4Ji/V32qShK/kqf/IzZGjtw1W/l1ZZxR0bgT8LkX kikw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mu0oeSRdhbh9VeZjyF6ZLYtewD09HIHqh0x41OUf5r4=; b=jlC+Np5Gol+ouGRn7gmpGA21tXUnMazJqiBHGTA35wz072CrxRlbLqbyA97jHGrjPc Jq/Fd1J5mTvTQaEHtRq3Q1j7gd+PS4iBPOCY9B22wIKUF0rEcNtH1F+1TQGM6YLEpWCl phAPc3Y44ZwXlyK6xwnj/E9XB82WnQ81MZ2zwgGHjBzdbhMruAUwK4p82kS3IcmYM8O+ vc+GMN39wFo0w6YMK5kcRdrzUBvM6/32VTiO0fo3BQivImS+tUuLEwegFduCeWcuH3Oc /LyP6IrkMR6ye85koy5ES2Ni/VHOOnul1vTZFjqGGQSnYme7+xwJCnXdZTpUf+r+sDCO K3DQ== X-Gm-Message-State: AOAM531oehxTEQr0yJB3OgtR0wUfNXBb60+2kktuUUqvrH6tA+CO68Yo 3+xBSIpw4OOq3HHKMW8rabkPOw== X-Google-Smtp-Source: ABdhPJzUuiCelUq6G3b+1zXrlgNS09+gkvww/VC4mZYDFyo1AFj5M1cF8Up8xWtVli0E/gbLSTNFag== X-Received: by 2002:a17:902:9a4c:b0:158:b6f0:4aa2 with SMTP id x12-20020a1709029a4c00b00158b6f04aa2mr28430242plv.163.1652330857366; Wed, 11 May 2022 21:47:37 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id 5-20020a170902e9c500b0015edc07dcf3sm2790824plk.21.2022.05.11.21.47.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 May 2022 21:47:36 -0700 (PDT) From: Gang Li To: akpm@linux-foundation.org Cc: songmuchun@bytedance.com, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, ebiederm@xmission.com, keescook@chromium.org, viro@zeniv.linux.org.uk, rostedt@goodmis.org, mingo@redhat.com, peterz@infradead.org, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, david@redhat.com, imbrenda@linux.ibm.com, apopple@nvidia.com, adobriyan@gmail.com, stephen.s.brennan@oracle.com, ohoono.kwon@samsung.com, haolee.swjtu@gmail.com, kaleshsingh@google.com, zhengqi.arch@bytedance.com, peterx@redhat.com, shy828301@gmail.com, surenb@google.com, ccross@google.com, vincent.whitchurch@axis.com, tglx@linutronix.de, bigeasy@linutronix.de, fenghua.yu@intel.com, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, Gang Li Subject: [PATCH 2/5 v1] mm: add numa_count field for rss_stat Date: Thu, 12 May 2022 12:46:31 +0800 Message-Id: <20220512044634.63586-3-ligang.bdlg@bytedance.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220512044634.63586-1-ligang.bdlg@bytedance.com> References: <20220512044634.63586-1-ligang.bdlg@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch add new fields `numa_count` for mm_rss_stat and task_rss_stat. `numa_count` are in the size of `sizeof(long) * num_possible_numa()`. To reduce mem consumption, they only contain the sum of rss which is needed by `oom_badness` instead of recording different kinds of rss sepratly. Signed-off-by: Gang Li --- include/linux/mm_types_task.h | 6 +++ kernel/fork.c | 70 +++++++++++++++++++++++++++++++++-- 2 files changed, 73 insertions(+), 3 deletions(-) diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index 3e7da8c7ab95..c1ac2a33b697 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -64,11 +64,17 @@ enum { struct task_rss_stat { int events; /* for synchronization threshold */ int count[NR_MM_COUNTERS]; +#ifdef CONFIG_NUMA + int *numa_count; +#endif }; #endif /* USE_SPLIT_PTE_PTLOCKS */ =20 struct mm_rss_stat { atomic_long_t count[NR_MM_COUNTERS]; +#ifdef CONFIG_NUMA + atomic_long_t *numa_count; +#endif }; =20 struct page_frag { diff --git a/kernel/fork.c b/kernel/fork.c index 9796897560ab..e549e0b30e2b 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -141,6 +141,10 @@ DEFINE_PER_CPU(unsigned long, process_counts) =3D 0; =20 __cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */ =20 +#if (defined SPLIT_RSS_COUNTING) && (defined CONFIG_NUMA) +#define SPLIT_RSS_NUMA_COUNTING +#endif + #ifdef CONFIG_PROVE_RCU int lockdep_tasklist_lock_is_held(void) { @@ -765,6 +769,16 @@ static void check_mm(struct mm_struct *mm) mm, resident_page_types[i], x); } =20 +#ifdef CONFIG_NUMA + for (i =3D 0; i < num_possible_nodes(); i++) { + long x =3D atomic_long_read(&mm->rss_stat.numa_count[i]); + + if (unlikely(x)) + pr_alert("BUG: Bad rss-counter state mm:%p node:%d val:%ld\n", + mm, i, x); + } +#endif + if (mm_pgtables_bytes(mm)) pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n", mm_pgtables_bytes(mm)); @@ -777,6 +791,29 @@ static void check_mm(struct mm_struct *mm) #define allocate_mm() (kmem_cache_alloc(mm_cachep, GFP_KERNEL)) #define free_mm(mm) (kmem_cache_free(mm_cachep, (mm))) =20 +#ifdef CONFIG_NUMA +static inline void mm_free_rss_stat(struct mm_struct *mm) +{ + kfree(mm->rss_stat.numa_count); +} + +static inline int mm_init_rss_stat(struct mm_struct *mm) +{ + memset(&mm->rss_stat.count, 0, sizeof(mm->rss_stat.count)); + mm->rss_stat.numa_count =3D kcalloc(num_possible_nodes(), sizeof(atomic_l= ong_t), GFP_KERNEL); + if (unlikely(!mm->rss_stat.numa_count)) + return -ENOMEM; + return 0; +} +#else +static inline void mm_free_rss_stat(struct mm_struct *mm) {} +static inline int mm_init_rss_stat(struct mm_struct *mm) +{ + memset(&mm->rss_stat.count, 0, sizeof(mm->rss_stat.count)); + return 0; +} +#endif + /* * Called when the last reference to the mm * is dropped: either by a lazy thread or by @@ -791,6 +828,7 @@ void __mmdrop(struct mm_struct *mm) destroy_context(mm); mmu_notifier_subscriptions_destroy(mm); check_mm(mm); + mm_free_rss_stat(mm); put_user_ns(mm->user_ns); free_mm(mm); } @@ -831,12 +869,22 @@ static inline void put_signal_struct(struct signal_st= ruct *sig) free_signal_struct(sig); } =20 +#ifdef SPLIT_RSS_NUMA_COUNTING +void rss_stat_free(struct task_struct *p) +{ + kfree(p->rss_stat.numa_count); +} +#else +void rss_stat_free(struct task_struct *p) {} +#endif + void __put_task_struct(struct task_struct *tsk) { WARN_ON(!tsk->exit_state); WARN_ON(refcount_read(&tsk->usage)); WARN_ON(tsk =3D=3D current); =20 + rss_stat_free(tsk); io_uring_free(tsk); cgroup_free(tsk); task_numa_free(tsk, true); @@ -963,6 +1011,7 @@ void set_task_stack_end_magic(struct task_struct *tsk) static struct task_struct *dup_task_struct(struct task_struct *orig, int n= ode) { struct task_struct *tsk; + int *numa_count __maybe_unused; int err; =20 if (node =3D=3D NUMA_NO_NODE) @@ -984,9 +1033,16 @@ static struct task_struct *dup_task_struct(struct tas= k_struct *orig, int node) #endif account_kernel_stack(tsk, 1); =20 +#ifdef SPLIT_RSS_NUMA_COUNTING + numa_count =3D kcalloc(num_possible_nodes(), sizeof(int), GFP_KERNEL); + if (!numa_count) + goto free_stack; + tsk->rss_stat.numa_count =3D numa_count; +#endif + err =3D scs_prepare(tsk, node); if (err) - goto free_stack; + goto free_rss_stat; =20 #ifdef CONFIG_SECCOMP /* @@ -1047,6 +1103,10 @@ static struct task_struct *dup_task_struct(struct ta= sk_struct *orig, int node) #endif return tsk; =20 +free_rss_stat: +#ifdef SPLIT_RSS_NUMA_COUNTING + kfree(numa_count); +#endif free_stack: exit_task_stack_account(tsk); free_thread_stack(tsk); @@ -1117,7 +1177,6 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, mm->map_count =3D 0; mm->locked_vm =3D 0; atomic64_set(&mm->pinned_vm, 0); - memset(&mm->rss_stat, 0, sizeof(mm->rss_stat)); spin_lock_init(&mm->page_table_lock); spin_lock_init(&mm->arg_lock); mm_init_cpumask(mm); @@ -1144,6 +1203,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, if (mm_alloc_pgd(mm)) goto fail_nopgd; =20 + if (mm_init_rss_stat(mm)) + goto fail_nocontext; + if (init_new_context(p, mm)) goto fail_nocontext; =20 @@ -2139,7 +2201,9 @@ static __latent_entropy struct task_struct *copy_proc= ess( p->io_uring =3D NULL; #endif =20 -#if defined(SPLIT_RSS_COUNTING) +#ifdef SPLIT_RSS_NUMA_COUNTING + memset(&p->rss_stat, 0, sizeof(p->rss_stat) - sizeof(p->rss_stat.numa_cou= nt)); +#else memset(&p->rss_stat, 0, sizeof(p->rss_stat)); #endif =20 --=20 2.20.1 From nobody Fri May 8 03:09:18 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2181C433EF for ; Thu, 12 May 2022 04:48:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348398AbiELEs3 (ORCPT ); Thu, 12 May 2022 00:48:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348518AbiELEsJ (ORCPT ); Thu, 12 May 2022 00:48:09 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C6B42CCA6 for ; Wed, 11 May 2022 21:47:56 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id l11-20020a17090a49cb00b001d923a9ca99so3858269pjm.1 for ; Wed, 11 May 2022 21:47:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=lV5EqWUJeu9NiPA/kLK56rAUvj5r/7m7kXc0Lxi4HQU=; b=vEdnQCvwGKZH3KBEgV/hEl8AX4D2GxUvAZ3VQiUrbIDlifvTSa/BIuTjsBv2jTELlY 4gxguKg5L2J3PbyDN18HSD8su+QORRgVq0wuXWZ6rgrPaxpkOr6Ad24cbhTCNqpLXief FOreoW9B88ZO+wClsHBKA8YWZFWhmS3POOsIAbrasVsr+UgOvyj6Uk/+KhhgT4J2aDeI bf5r5DBpbyTcy5DzBciwgV+kNc+qQIDVHHi2ZMf6qQJQZjZE0djDQhIb+SqiZMmfDib4 wvHQNPHZk/a+jj9cDrtQdRcgaXjHDAYUPMXZaN3riW7m45C/iFAhUFWlv1Q0ktu1dqDZ Y/QA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lV5EqWUJeu9NiPA/kLK56rAUvj5r/7m7kXc0Lxi4HQU=; b=3dbQDKUlNEx+VlkgnpdjEC2KrQNEC4aBKt/dRAgJPKSx6p6G30Xe9y7gagM160huRD kFEcM2LGgK5eXl7pIrj4ElbJTou75csKutoFrVyGmYiZPPWXWItue5FEV9WkwW36tlDk /UCqtZvwvMYcT3fFv2G3/IRZExmBEo7ZEn8J+djRgE9CpcXeV1NGRaetiJom+rJC+JxE XzzqBUsOXhgb/RI1SPVtFzCbU0IdjDMmr2cgNbfeOJZSDKolJxMszsqS6mfqgD94seN/ 6+wt6h9XvHVhaRgmjXtWM8Feu6moPjaNubwEyyG2sZOEn9CK6bEtXVDm219b8nFcp/7s KtrQ== X-Gm-Message-State: AOAM532v44jW/a56YQaRBPkxjVGPCCzq7sv0Z6IUIbfaVh8GuhgFjqzW 0rwSXT/QzpmOu2dC+F084wdGXw== X-Google-Smtp-Source: ABdhPJwzSzpqvu17RyZaImuVVXfRwCqv4PsquKh6+mxBIzxD361DDWk3HS5eFZVma3nchUiQEDg2Hw== X-Received: by 2002:a17:903:1051:b0:15c:f02f:cd0e with SMTP id f17-20020a170903105100b0015cf02fcd0emr28805111plc.81.1652330875710; Wed, 11 May 2022 21:47:55 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id 5-20020a170902e9c500b0015edc07dcf3sm2790824plk.21.2022.05.11.21.47.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 May 2022 21:47:55 -0700 (PDT) From: Gang Li To: akpm@linux-foundation.org Cc: songmuchun@bytedance.com, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, ebiederm@xmission.com, keescook@chromium.org, viro@zeniv.linux.org.uk, rostedt@goodmis.org, mingo@redhat.com, peterz@infradead.org, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, david@redhat.com, imbrenda@linux.ibm.com, apopple@nvidia.com, adobriyan@gmail.com, stephen.s.brennan@oracle.com, ohoono.kwon@samsung.com, haolee.swjtu@gmail.com, kaleshsingh@google.com, zhengqi.arch@bytedance.com, peterx@redhat.com, shy828301@gmail.com, surenb@google.com, ccross@google.com, vincent.whitchurch@axis.com, tglx@linutronix.de, bigeasy@linutronix.de, fenghua.yu@intel.com, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, Gang Li Subject: [PATCH 3/5 v1] mm: add numa fields for tracepoint rss_stat Date: Thu, 12 May 2022 12:46:32 +0800 Message-Id: <20220512044634.63586-4-ligang.bdlg@bytedance.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220512044634.63586-1-ligang.bdlg@bytedance.com> References: <20220512044634.63586-1-ligang.bdlg@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Since we add numa_count for mm->rss_stat, the tracepoint should also be modified. Now the output looks like this: ``` sleep-660 [002] 918.524333: rss_stat: mm_id=3D1539334265 cu= rr=3D0 type=3DMM_NO_TYPE type_size=3D0B node=3D2 node_size=3D32768B diff_si= ze=3D-8192B sleep-660 [002] 918.524333: rss_stat: mm_id=3D1539334265 cu= rr=3D0 type=3DMM_FILEPAGES type_size=3D4096B node=3D-1 node_size=3D0B diff_= size=3D-4096B sleep-660 [002] 918.524333: rss_stat: mm_id=3D1539334265 cu= rr=3D0 type=3DMM_NO_TYPE type_size=3D0B node=3D1 node_size=3D0B diff_size= =3D-4096B ``` Signed-off-by: Gang Li --- include/linux/mm.h | 9 +++++---- include/trace/events/kmem.h | 27 ++++++++++++++++++++------- mm/memory.c | 5 +++-- 3 files changed, 28 insertions(+), 13 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1b6c2e912ec8..cde5529285d6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2007,27 +2007,28 @@ static inline unsigned long get_mm_counter(struct m= m_struct *mm, int member, int return (unsigned long)val; } =20 -void mm_trace_rss_stat(struct mm_struct *mm, int member, long count); +void mm_trace_rss_stat(struct mm_struct *mm, int member, long member_count= , int node, + long numa_count, long diff_count); =20 static inline void add_mm_counter(struct mm_struct *mm, int member, long v= alue, int node) { long count =3D atomic_long_add_return(value, &mm->rss_stat.count[member]); =20 - mm_trace_rss_stat(mm, member, count); + mm_trace_rss_stat(mm, member, count, NUMA_NO_NODE, 0, value); } =20 static inline void inc_mm_counter(struct mm_struct *mm, int member, int no= de) { long count =3D atomic_long_inc_return(&mm->rss_stat.count[member]); =20 - mm_trace_rss_stat(mm, member, count); + mm_trace_rss_stat(mm, member, count, NUMA_NO_NODE, 0, 1); } =20 static inline void dec_mm_counter(struct mm_struct *mm, int member, int no= de) { long count =3D atomic_long_dec_return(&mm->rss_stat.count[member]); =20 - mm_trace_rss_stat(mm, member, count); + mm_trace_rss_stat(mm, member, count, NUMA_NO_NODE, 0, -1); } =20 /* Optimized variant when page is already known not to be PageAnon */ diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h index ddc8c944f417..2f4707d94624 100644 --- a/include/trace/events/kmem.h +++ b/include/trace/events/kmem.h @@ -347,7 +347,8 @@ static unsigned int __maybe_unused mm_ptr_to_hash(const= void *ptr) EM(MM_FILEPAGES) \ EM(MM_ANONPAGES) \ EM(MM_SWAPENTS) \ - EMe(MM_SHMEMPAGES) + EM(MM_SHMEMPAGES) \ + EMe(MM_NO_TYPE) =20 #undef EM #undef EMe @@ -367,29 +368,41 @@ TRACE_EVENT(rss_stat, =20 TP_PROTO(struct mm_struct *mm, int member, - long count), + long member_count, + int node, + long node_count, + long diff_count), =20 - TP_ARGS(mm, member, count), + TP_ARGS(mm, member, member_count, node, node_count, diff_count), =20 TP_STRUCT__entry( __field(unsigned int, mm_id) __field(unsigned int, curr) __field(int, member) - __field(long, size) + __field(long, member_size) + __field(int, node) + __field(long, node_size) + __field(long, diff_size) ), =20 TP_fast_assign( __entry->mm_id =3D mm_ptr_to_hash(mm); __entry->curr =3D !!(current->mm =3D=3D mm); __entry->member =3D member; - __entry->size =3D (count << PAGE_SHIFT); + __entry->member_size =3D (member_count << PAGE_SHIFT); + __entry->node =3D node; + __entry->node_size =3D (node_count << PAGE_SHIFT); + __entry->diff_size =3D (diff_count << PAGE_SHIFT); ), =20 - TP_printk("mm_id=3D%u curr=3D%d type=3D%s size=3D%ldB", + TP_printk("mm_id=3D%u curr=3D%d type=3D%s type_size=3D%ldB node=3D%d node= _size=3D%ldB diff_size=3D%ldB", __entry->mm_id, __entry->curr, __print_symbolic(__entry->member, TRACE_MM_PAGES), - __entry->size) + __entry->member_size, + __entry->node, + __entry->node_size, + __entry->diff_size) ); #endif /* _TRACE_KMEM_H */ =20 diff --git a/mm/memory.c b/mm/memory.c index adb07fb0b483..2d3040a190f6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -170,9 +170,10 @@ static int __init init_zero_pfn(void) } early_initcall(init_zero_pfn); =20 -void mm_trace_rss_stat(struct mm_struct *mm, int member, long count) +void mm_trace_rss_stat(struct mm_struct *mm, int member, long member_count= , int node, + long numa_count, long diff_count) { - trace_rss_stat(mm, member, count); + trace_rss_stat(mm, member, member_count, node, numa_count, diff_count); } =20 #if defined(SPLIT_RSS_COUNTING) --=20 2.20.1 From nobody Fri May 8 03:09:18 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E03EC433F5 for ; Thu, 12 May 2022 04:49:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348329AbiELEtV (ORCPT ); Thu, 12 May 2022 00:49:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348630AbiELEsO (ORCPT ); Thu, 12 May 2022 00:48:14 -0400 Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A25C12E6BB for ; Wed, 11 May 2022 21:48:12 -0700 (PDT) Received: by mail-pj1-x102d.google.com with SMTP id fv2so4084842pjb.4 for ; Wed, 11 May 2022 21:48:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PAkVk8BocAZi8xx5Bbj7xf1f4YzNOFw5okdpKa0iEI4=; b=4bvOJsiJ7H+iMwovOmEJ5MKB1IDLljeiHiJiueX8hJKXz5TRqTUl6cs8YEF5NwkuBJ y/Q476QhqxIldjs/AjjGbcleQyJW7zI5MJOwIcluYLylkV+G5mDdMb4Rwv3732ttjN1H nkkHr15gXUytyH61fQ3gR2anIT8xtgAsjlGqO6IoH9B9A4FvV+ZD5gr9x6WFG64JyrSD HMXC0xlyyXElKGcmcTWZG5JuxuKg1+q6aChs29bRPgrR6ILFtmfzFo/Y9JStvIUNx8Vl UfFlivFPn5pW5Gmr1/B0HO4TNaZtyo6LVt4B4K2nQ3uxbBY2F/2TBU+ebJFIYC8ym15D qgaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PAkVk8BocAZi8xx5Bbj7xf1f4YzNOFw5okdpKa0iEI4=; b=67r70554dDYYO1mRNkqVnkHHWiCsOf2xGbUjptcJO60TCk70bcyt+MNEedJ6pYwGVt TN5Cgl7r78htmCwlC52qqBzgIvQbYfhSrJCprfQOXBBImvejuI79KrXiKbTF07S5welO TujCpX+NWZgjpcQnuZ9xLZeY1V77AF1gC/T0hNOQ1untmpxn62rcdqg0nuRmvTJ9dRen BfisPrYFQCcrepTk5ctZPEoqVex8OEzmZxNXiXNYVbAtUxBVSjHZSWXByqQ3a3oc5ssx JYZQCQAb1IR76uRJjThCAqdJPyTIVTKvJljDDQCfPAhr+1C2wvR8pH6kgookxUnY+YI6 J3xw== X-Gm-Message-State: AOAM530O4J235aBYYQVDoK2axscqiaO/KmQfp1wfAgUyvT9OqDi0Wyyh ym9LQcOO32hWR8U0kiuWFq+zSw== X-Google-Smtp-Source: ABdhPJwtlc8Y2778hioM4oj564Xo0tmJ8W2pJn6pGvZxfL4ENesOAjEz5OnGrxBVyxU0aRv0Y9pRzA== X-Received: by 2002:a17:902:cec7:b0:15e:b8b0:b9c8 with SMTP id d7-20020a170902cec700b0015eb8b0b9c8mr29329217plg.155.1652330891750; Wed, 11 May 2022 21:48:11 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id 5-20020a170902e9c500b0015edc07dcf3sm2790824plk.21.2022.05.11.21.47.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 May 2022 21:48:11 -0700 (PDT) From: Gang Li To: akpm@linux-foundation.org Cc: songmuchun@bytedance.com, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, ebiederm@xmission.com, keescook@chromium.org, viro@zeniv.linux.org.uk, rostedt@goodmis.org, mingo@redhat.com, peterz@infradead.org, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, david@redhat.com, imbrenda@linux.ibm.com, apopple@nvidia.com, adobriyan@gmail.com, stephen.s.brennan@oracle.com, ohoono.kwon@samsung.com, haolee.swjtu@gmail.com, kaleshsingh@google.com, zhengqi.arch@bytedance.com, peterx@redhat.com, shy828301@gmail.com, surenb@google.com, ccross@google.com, vincent.whitchurch@axis.com, tglx@linutronix.de, bigeasy@linutronix.de, fenghua.yu@intel.com, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, Gang Li Subject: [PATCH 4/5 v1] mm: enable per numa node rss_stat count Date: Thu, 12 May 2022 12:46:33 +0800 Message-Id: <20220512044634.63586-5-ligang.bdlg@bytedance.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220512044634.63586-1-ligang.bdlg@bytedance.com> References: <20220512044634.63586-1-ligang.bdlg@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Now we have all the infrastructure ready. Modify `get/add/inc/dec_mm_counte= r`, `sync_mm_rss`, `add_mm_counter_fast` and `add_mm_rss_vec` to enable per numa node rss_stat count. Signed-off-by: Gang Li Reported-by: kernel test robot --- include/linux/mm.h | 42 +++++++++++++++++++++++++++++++++++------- mm/memory.c | 20 ++++++++++++++++++-- 2 files changed, 53 insertions(+), 9 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cde5529285d6..f0f21065b81b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1994,8 +1994,18 @@ static inline bool get_user_page_fast_only(unsigned = long addr, */ static inline unsigned long get_mm_counter(struct mm_struct *mm, int membe= r, int node) { - long val =3D atomic_long_read(&mm->rss_stat.count[member]); + long val; =20 + WARN_ON(node =3D=3D NUMA_NO_NODE && member =3D=3D MM_NO_TYPE); + + if (node =3D=3D NUMA_NO_NODE) + val =3D atomic_long_read(&mm->rss_stat.count[member]); + else +#ifdef CONFIG_NUMA + val =3D atomic_long_read(&mm->rss_stat.numa_count[node]); +#else + val =3D 0; +#endif #ifdef SPLIT_RSS_COUNTING /* * counter is updated in asynchronous manner and may go to minus. @@ -2012,23 +2022,41 @@ void mm_trace_rss_stat(struct mm_struct *mm, int me= mber, long member_count, int =20 static inline void add_mm_counter(struct mm_struct *mm, int member, long v= alue, int node) { - long count =3D atomic_long_add_return(value, &mm->rss_stat.count[member]); + long member_count =3D 0, numa_count =3D 0; =20 - mm_trace_rss_stat(mm, member, count, NUMA_NO_NODE, 0, value); + if (member !=3D MM_NO_TYPE) + member_count =3D atomic_long_add_return(value, &mm->rss_stat.count[membe= r]); +#ifdef CONFIG_NUMA + if (node !=3D NUMA_NO_NODE) + numa_count =3D atomic_long_add_return(value, &mm->rss_stat.numa_count[no= de]); +#endif + mm_trace_rss_stat(mm, member, member_count, node, numa_count, value); } =20 static inline void inc_mm_counter(struct mm_struct *mm, int member, int no= de) { - long count =3D atomic_long_inc_return(&mm->rss_stat.count[member]); + long member_count =3D 0, numa_count =3D 0; =20 - mm_trace_rss_stat(mm, member, count, NUMA_NO_NODE, 0, 1); + if (member !=3D MM_NO_TYPE) + member_count =3D atomic_long_inc_return(&mm->rss_stat.count[member]); +#ifdef CONFIG_NUMA + if (node !=3D NUMA_NO_NODE) + numa_count =3D atomic_long_inc_return(&mm->rss_stat.numa_count[node]); +#endif + mm_trace_rss_stat(mm, member, member_count, node, numa_count, 1); } =20 static inline void dec_mm_counter(struct mm_struct *mm, int member, int no= de) { - long count =3D atomic_long_dec_return(&mm->rss_stat.count[member]); + long member_count =3D 0, numa_count =3D 0; =20 - mm_trace_rss_stat(mm, member, count, NUMA_NO_NODE, 0, -1); + if (member !=3D MM_NO_TYPE) + member_count =3D atomic_long_dec_return(&mm->rss_stat.count[member]); +#ifdef CONFIG_NUMA + if (node !=3D NUMA_NO_NODE) + numa_count =3D atomic_long_dec_return(&mm->rss_stat.numa_count[node]); +#endif + mm_trace_rss_stat(mm, member, member_count, node, numa_count, -1); } =20 /* Optimized variant when page is already known not to be PageAnon */ diff --git a/mm/memory.c b/mm/memory.c index 2d3040a190f6..f7b67da772b2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -188,6 +188,14 @@ void sync_mm_rss(struct mm_struct *mm) current->rss_stat.count[i] =3D 0; } } +#ifdef CONFIG_NUMA + for_each_node(i) { + if (current->rss_stat.numa_count[i]) { + add_mm_counter(mm, MM_NO_TYPE, current->rss_stat.numa_count[i], i); + current->rss_stat.numa_count[i] =3D 0; + } + } +#endif current->rss_stat.events =3D 0; } =20 @@ -195,9 +203,12 @@ static void add_mm_counter_fast(struct mm_struct *mm, = int member, int val, int n { struct task_struct *task =3D current; =20 - if (likely(task->mm =3D=3D mm)) + if (likely(task->mm =3D=3D mm)) { task->rss_stat.count[member] +=3D val; - else +#ifdef CONFIG_NUMA + task->rss_stat.numa_count[node] +=3D val; +#endif + } else add_mm_counter(mm, member, val, node); } #define inc_mm_counter_fast(mm, member, node) add_mm_counter_fast(mm, memb= er, 1, node) @@ -508,6 +519,11 @@ static inline void add_mm_rss_vec(struct mm_struct *mm= , int *rss, int *numa_rss) for (i =3D 0; i < NR_MM_COUNTERS; i++) if (rss[i]) add_mm_counter(mm, i, rss[i], NUMA_NO_NODE); +#ifdef CONFIG_NUMA + for_each_node(i) + if (numa_rss[i] !=3D 0) + add_mm_counter(mm, MM_NO_TYPE, numa_rss[i], i); +#endif } =20 /* --=20 2.20.1 From nobody Fri May 8 03:09:18 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73527C433EF for ; Thu, 12 May 2022 04:49:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349009AbiELEtK (ORCPT ); Thu, 12 May 2022 00:49:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39012 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348424AbiELEsx (ORCPT ); Thu, 12 May 2022 00:48:53 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 846F5590AC for ; Wed, 11 May 2022 21:48:25 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id c1-20020a17090a558100b001dca2694f23so3840786pji.3 for ; Wed, 11 May 2022 21:48:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1y9RB53i1x0EnR6VDo0VxXEpi1Ea3k/4xASHr+wAeNs=; b=ED7QnqXmFcsjrevZkcWDo0daj/zPdRmxqxw6xqqtvnvDuttbp2MpTSwPHwW489+bPL tB2RsBSPFwNhwz+fNmjN4qvKAgOfLFtB5kthM6noguSm7/A0Jw5ssNcpY5CjB+606hcH pewICu6aMniVo+HL84uLSI9AaeaFHqIt3hOfgfqvPV4CG94WseiQCu8SlAefX7VFJW+H FBG9f+0eUIEeb7F1FJcSgHYoxl/DL+S5r5j6BwERrGcyrtfhyms1ezRISWLyARZ1qAcU ID/PEGkY1BIvSemcCVHvQ9gLJiTPPgf0KTaKEbygy7NAVTuLD4SXv/2Yqc/vIoDte3jD f9Yg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1y9RB53i1x0EnR6VDo0VxXEpi1Ea3k/4xASHr+wAeNs=; b=DzQaYR76CmjyjyO5S6GRMnba/UtGZy/2FiCrbDLPn2X7CwRUte1nsyVzJ5ZXN00F3h LEQETxJXgp67uWrbLasbyXcn3KuOyTD++5CLVnej3uiRyX5//4EUElCHoP66AMH8/rso tp+/H31Y08pa3evUAyuJerdz0ezk73srh4wUaeM+2ItdPRLhNIbpPsMfdikNhr/Nri8i SXYqwDbW+pn8xihA/+RkU/U3m6K2kLiJCtrNs4/ySjWmBesyJwljy4K7tZsf8CdTVE/4 SuDC9+JCvzvCcqJNwOeuyhAcAry+OmLZhiIYGaml+KBZApSqYs3DIvJ9CJTscPFhXG7c LQ/w== X-Gm-Message-State: AOAM531qyzTyBQE4RW6I+B/lIE8h9iSFykgkYAe6Mj47AwVIlGahdGK7 378AKtc9408B1wMJOPFIfp/pqw== X-Google-Smtp-Source: ABdhPJwLDnrD49kx6eQ6RD4rGeXO3R/bTrPriNq6SzUBXg+93UiGtEb7CcypgjNj2aanDmD/Vwv6LQ== X-Received: by 2002:a17:903:292:b0:15f:171:e794 with SMTP id j18-20020a170903029200b0015f0171e794mr21359273plr.107.1652330904985; Wed, 11 May 2022 21:48:24 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id 5-20020a170902e9c500b0015edc07dcf3sm2790824plk.21.2022.05.11.21.48.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 May 2022 21:48:24 -0700 (PDT) From: Gang Li To: akpm@linux-foundation.org Cc: songmuchun@bytedance.com, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, ebiederm@xmission.com, keescook@chromium.org, viro@zeniv.linux.org.uk, rostedt@goodmis.org, mingo@redhat.com, peterz@infradead.org, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, david@redhat.com, imbrenda@linux.ibm.com, apopple@nvidia.com, adobriyan@gmail.com, stephen.s.brennan@oracle.com, ohoono.kwon@samsung.com, haolee.swjtu@gmail.com, kaleshsingh@google.com, zhengqi.arch@bytedance.com, peterx@redhat.com, shy828301@gmail.com, surenb@google.com, ccross@google.com, vincent.whitchurch@axis.com, tglx@linutronix.de, bigeasy@linutronix.de, fenghua.yu@intel.com, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, Gang Li Subject: [PATCH 5/5 v1] mm, oom: enable per numa node oom for CONSTRAINT_MEMORY_POLICY Date: Thu, 12 May 2022 12:46:34 +0800 Message-Id: <20220512044634.63586-6-ligang.bdlg@bytedance.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220512044634.63586-1-ligang.bdlg@bytedance.com> References: <20220512044634.63586-1-ligang.bdlg@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Page allocator will only alloc pages on node indicated by `nodemask`. But oom will still select bad process by total rss usage which may reclam nothing on the node indicated by `nodemask`. This patch let oom only calculate rss on the given node when oc->constraint equals to CONSTRAINT_MEMORY_POLICY. If `nodemask` is asigned, the process with the highest memory consumption on the specific node will be killed. oom_kill dmesg will looks like this: ``` [ 1471.436027] Tasks state (memory values in pages): [ 1471.438518] [ pid ] uid tgid total_vm rss (01)nrss pgtables_b= ytes swapents oom_score_adj name [ 1471.554703] [ 1011] 0 1011 220005 8589 1872 823296 = 0 0 node [ 1471.707912] [ 12399] 0 12399 1311306 1311056 262170 10534912 = 0 0 a.out [ 1471.712429] [ 13135] 0 13135 787018 674666 674300 5439488 = 0 0 a.out [ 1471.721506] [ 13295] 0 13295 597 188 0 24576 = 0 0 sh [ 1471.734600] oom-kill:constraint=3DCONSTRAINT_MEMORY_POLICY,nodemask=3D1,= cpuset=3D/,mems_allowed=3D0-2,global_oom,task_memcg=3D/user.slice/user-0.sl= ice/session-3.scope,task=3Da.out,pid=3D13135,uid=3D0 [ 1471.742583] Out of memory: Killed process 13135 (a.out) total-vm:3148072= kB, anon-rss:2697304kB, file-rss:1360kB, shmem-rss:0kB, UID:0 pgtables:5312= kB oom_score_adj:0 [ 1471.849615] oom_reaper: reaped process 13135 (a.out), now anon-rss:0kB, = file-rss:0kB, shmem-rss:0kB ``` Signed-off-by: Gang Li --- fs/proc/base.c | 6 +++++- include/linux/oom.h | 2 +- mm/oom_kill.c | 45 +++++++++++++++++++++++++++++++++++++-------- 3 files changed, 43 insertions(+), 10 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index c1031843cc6a..caf0f51284d0 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -552,8 +552,12 @@ static int proc_oom_score(struct seq_file *m, struct p= id_namespace *ns, unsigned long totalpages =3D totalram_pages() + total_swap_pages; unsigned long points =3D 0; long badness; + struct oom_control oc =3D { + .totalpages =3D totalpages, + .gfp_mask =3D 0, + }; =20 - badness =3D oom_badness(task, totalpages); + badness =3D oom_badness(task, &oc); /* * Special case OOM_SCORE_ADJ_MIN for all others scale the * badness value into [0, 2000] range which we have been diff --git a/include/linux/oom.h b/include/linux/oom.h index 2db9a1432511..0cb6a60be776 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -109,7 +109,7 @@ static inline vm_fault_t check_stable_address_space(str= uct mm_struct *mm) bool __oom_reap_task_mm(struct mm_struct *mm); =20 long oom_badness(struct task_struct *p, - unsigned long totalpages); + struct oom_control *oc); =20 extern bool out_of_memory(struct oom_control *oc); =20 diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 757f5665ae94..75a80b5a63bf 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -198,7 +198,7 @@ static bool should_dump_unreclaim_slab(void) * predictable as possible. The goal is to return the highest value for t= he * task consuming the most memory to avoid subsequent oom failures. */ -long oom_badness(struct task_struct *p, unsigned long totalpages) +long oom_badness(struct task_struct *p, struct oom_control *oc) { long points; long adj; @@ -227,12 +227,22 @@ long oom_badness(struct task_struct *p, unsigned long= totalpages) * The baseline for the badness score is the proportion of RAM that each * task's rss, pagetable and swap space use. */ - points =3D get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS, NUMA_NO= _NODE) + - mm_pgtables_bytes(p->mm) / PAGE_SIZE; + if (unlikely(oc->constraint =3D=3D CONSTRAINT_MEMORY_POLICY)) { + struct zoneref *zoneref =3D first_zones_zonelist(oc->zonelist, gfp_zone(= oc->gfp_mask), + oc->nodemask); + int nid_to_find_victim =3D zone_to_nid(zoneref->zone); + + points =3D get_mm_counter(p->mm, -1, nid_to_find_victim) + + get_mm_counter(p->mm, MM_SWAPENTS, NUMA_NO_NODE) + + mm_pgtables_bytes(p->mm) / PAGE_SIZE; + } else { + points =3D get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS, NUMA_N= O_NODE) + + mm_pgtables_bytes(p->mm) / PAGE_SIZE; + } task_unlock(p); =20 /* Normalize to oom_score_adj units */ - adj *=3D totalpages / 1000; + adj *=3D oc->totalpages / 1000; points +=3D adj; =20 return points; @@ -338,7 +348,7 @@ static int oom_evaluate_task(struct task_struct *task, = void *arg) goto select; } =20 - points =3D oom_badness(task, oc->totalpages); + points =3D oom_badness(task, oc); if (points =3D=3D LONG_MIN || points < oc->chosen_points) goto next; =20 @@ -382,6 +392,7 @@ static int dump_task(struct task_struct *p, void *arg) { struct oom_control *oc =3D arg; struct task_struct *task; + unsigned long node_mm_rss; =20 if (oom_unkillable_task(p)) return 0; @@ -399,9 +410,18 @@ static int dump_task(struct task_struct *p, void *arg) return 0; } =20 - pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu %5hd %s\n", + if (unlikely(oc->constraint =3D=3D CONSTRAINT_MEMORY_POLICY)) { + struct zoneref *zoneref =3D first_zones_zonelist(oc->zonelist, gfp_zone(= oc->gfp_mask), + oc->nodemask); + int nid_to_find_victim =3D zone_to_nid(zoneref->zone); + + node_mm_rss =3D get_mm_counter(p->mm, -1, nid_to_find_victim); + } else { + node_mm_rss =3D 0; + } + pr_info("[%7d] %5d %5d %8lu %8lu %8lu %8ld %8lu %5hd %s\n", task->pid, from_kuid(&init_user_ns, task_uid(task)), - task->tgid, task->mm->total_vm, get_mm_rss(task->mm), + task->tgid, task->mm->total_vm, get_mm_rss(task->mm), node_mm_rss, mm_pgtables_bytes(task->mm), get_mm_counter(task->mm, MM_SWAPENTS, NUMA_NO_NODE), task->signal->oom_score_adj, task->comm); @@ -422,8 +442,17 @@ static int dump_task(struct task_struct *p, void *arg) */ static void dump_tasks(struct oom_control *oc) { + int nid_to_find_victim; + + if (oc->nodemask) { + struct zoneref *zoneref =3D first_zones_zonelist(oc->zonelist, gfp_zone(= oc->gfp_mask), + oc->nodemask); + nid_to_find_victim =3D zone_to_nid(zoneref->zone); + } else { + nid_to_find_victim =3D -1; + } pr_info("Tasks state (memory values in pages):\n"); - pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents = oom_score_adj name\n"); + pr_info("[ pid ] uid tgid total_vm rss (%02d)nrss pgtables_byt= es swapents oom_score_adj name\n", nid_to_find_victim); =20 if (is_memcg_oom(oc)) mem_cgroup_scan_tasks(oc->memcg, dump_task, oc); --=20 2.20.1