From nobody Sun Feb 8 12:19:35 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAF3448CFC for ; Mon, 18 Mar 2024 13:24:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710768259; cv=none; b=GtjyeLszEb7GeVLM4PsYF6RWJNRI1HCc/A354KujeHkippxqlhs62Ab5ijvvt8F9d8QfA3fku1WETIbA8uJNYtSIC4DwlWdoD+qXUaPKsWJ8e4dRM3Wnjr4NDhSTNa/bw0fKLhS4OD5nFpPowCfb+2drT5nvw6OKALuxGv3sDLE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710768259; c=relaxed/simple; bh=/VuJKYILT3YfCYMBoMhHCBG0ry/0kkaiuCo4GcM3XUM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Q2uGSHda6fbu8LcQROvbu/lQzdKxdTQQa4vntBaLnKQOZVKihWiO/p7T4THYLLVxYI2phr1rIiutFyptz74lnmhPLYnyTYfHvKLDJ8oTQve4FNy2FBcvQFxija2GD2yo06ykcdR0FmkrCPP/h+mKMzZw9Ofd2OU89gL/FSZdqbE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=iIErSYoa; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="iIErSYoa" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1710768257; x=1742304257; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/VuJKYILT3YfCYMBoMhHCBG0ry/0kkaiuCo4GcM3XUM=; b=iIErSYoavemJ8s4uRbYC9Ss1phBm0bUXixntazvO/BKYXfuCiN8eh3d3 eu4e4wDjF0E78ImcTVaLrMn4mBDowsOr1QyO2UW5Rg3G8Uvbmk3sbGclZ 9jklzrJjwbBd7hJRwASPnzVlUTBAQZKT3wpD6weqY8VJlyaiwuYjyl9oV AyrM0ngLiZ26TdmQX9fYGAhKzRSDZHMD0VvUJDc9TCFJK2CZioosaZB2Y UU+L0dL+nlQsmwF6bl7VoofRsfd/AUR4Xxbw8e2e5+uMhYVZMvdbEnaFn yHpDYNTfMLPGnt63C+7ELWJu34a9i+I92gz/ZUCf3qBCJClajs8IZalhK w==; X-IronPort-AV: E=McAfee;i="6600,9927,11016"; a="23037926" X-IronPort-AV: E=Sophos;i="6.07,134,1708416000"; d="scan'208";a="23037926" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2024 06:24:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,134,1708416000"; d="scan'208";a="14102734" Received: from adr-par-inspur1.iind.intel.com ([10.223.93.209]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2024 06:24:14 -0700 From: Aravinda Prasad To: damon@lists.linux.dev, linux-mm@kvack.org, sj@kernel.org, linux-kernel@vger.kernel.org Cc: aravinda.prasad@intel.com, s2322819@ed.ac.uk, sandeep4.kumar@intel.com, ying.huang@intel.com, dave.hansen@intel.com, dan.j.williams@intel.com, sreenivas.subramoney@intel.com, antti.kervinen@intel.com, alexander.kanevskiy@intel.com Subject: [PATCH v2 1/3] mm/damon: mm infrastructure support Date: Mon, 18 Mar 2024 18:58:46 +0530 Message-Id: <20240318132848.82686-2-aravinda.prasad@intel.com> X-Mailer: git-send-email 2.21.3 In-Reply-To: <20240318132848.82686-1-aravinda.prasad@intel.com> References: <20240318132848.82686-1-aravinda.prasad@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds mm infrastructure support to set and test access bits at different levels of the page table tree. The patch also adds support to check if a give address is in the PMD/PUD/PGD address range. Signed-off-by: Alan Nair Signed-off-by: Aravinda Prasad --- arch/x86/include/asm/pgtable.h | 20 +++++++++ arch/x86/mm/pgtable.c | 28 +++++++++++- include/linux/mmu_notifier.h | 36 ++++++++++++++++ include/linux/pgtable.h | 79 ++++++++++++++++++++++++++++++++++ 4 files changed, 161 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 7621a5acb13e..b8d505194282 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -164,11 +164,24 @@ static inline bool pud_dirty(pud_t pud) return pud_flags(pud) & _PAGE_DIRTY_BITS; } =20 +#define pud_young pud_young static inline int pud_young(pud_t pud) { return pud_flags(pud) & _PAGE_ACCESSED; } =20 +#define p4d_young p4d_young +static inline int p4d_young(p4d_t p4d) +{ + return p4d_flags(p4d) & _PAGE_ACCESSED; +} + +#define pgd_young pgd_young +static inline int pgd_young(pgd_t pgd) +{ + return pgd_flags(pgd) & _PAGE_ACCESSED; +} + static inline int pte_write(pte_t pte) { /* @@ -1329,10 +1342,17 @@ extern int pudp_set_access_flags(struct vm_area_str= uct *vma, pud_t entry, int dirty); =20 #define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG +#define pudp_test_and_clear_young pudp_test_and_clear_young +#define p4dp_test_and_clear_young p4dp_test_and_clear_young +#define pgdp_test_and_clear_young pgdp_test_and_clear_young extern int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp); extern int pudp_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pud_t *pudp); +extern int p4dp_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, p4d_t *p4dp); +extern int pgdp_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, pgd_t *pgdp); =20 #define __HAVE_ARCH_PMDP_CLEAR_YOUNG_FLUSH extern int pmdp_clear_flush_young(struct vm_area_struct *vma, diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index ff690ddc2334..9f8e08326b43 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -578,9 +578,7 @@ int pmdp_test_and_clear_young(struct vm_area_struct *vm= a, =20 return ret; } -#endif =20 -#ifdef CONFIG_TRANSPARENT_HUGEPAGE int pudp_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pud_t *pudp) { @@ -594,6 +592,32 @@ int pudp_test_and_clear_young(struct vm_area_struct *v= ma, } #endif =20 +#ifdef CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG +int p4dp_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, p4d_t *p4dp) +{ + int ret =3D 0; + + if (p4d_young(*p4dp)) + ret =3D test_and_clear_bit(_PAGE_BIT_ACCESSED, + (unsigned long *)p4dp); + + return ret; +} + +int pgdp_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, pgd_t *pgdp) +{ + int ret =3D 0; + + if (pgd_young(*pgdp)) + ret =3D test_and_clear_bit(_PAGE_BIT_ACCESSED, + (unsigned long *)pgdp); + + return ret; +} +#endif + int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index f349e08a9dfe..ec7fc170882e 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -581,6 +581,39 @@ static inline void mmu_notifier_range_init_owner( __young; \ }) =20 +#define pudp_clear_young_notify(__vma, __address, __pudp) \ +({ \ + int __young; \ + struct vm_area_struct *___vma =3D __vma; \ + unsigned long ___address =3D __address; \ + __young =3D pudp_test_and_clear_young(___vma, ___address, __pudp);\ + __young |=3D mmu_notifier_clear_young(___vma->vm_mm, ___address, \ + ___address + PUD_SIZE); \ + __young; \ +}) + +#define p4dp_clear_young_notify(__vma, __address, __p4dp) \ +({ \ + int __young; \ + struct vm_area_struct *___vma =3D __vma; \ + unsigned long ___address =3D __address; \ + __young =3D p4dp_test_and_clear_young(___vma, ___address, __p4dp);\ + __young |=3D mmu_notifier_clear_young(___vma->vm_mm, ___address, \ + ___address + P4D_SIZE); \ + __young; \ +}) + +#define pgdp_clear_young_notify(__vma, __address, __pgdp) \ +({ \ + int __young; \ + struct vm_area_struct *___vma =3D __vma; \ + unsigned long ___address =3D __address; \ + __young =3D pgdp_test_and_clear_young(___vma, ___address, __pgdp);\ + __young |=3D mmu_notifier_clear_young(___vma->vm_mm, ___address, \ + ___address + PGDIR_SIZE); \ + __young; \ +}) + /* * set_pte_at_notify() sets the pte _after_ running the notifier. * This is safe to start by updating the secondary MMUs, because the prima= ry MMU @@ -690,6 +723,9 @@ static inline void mmu_notifier_subscriptions_destroy(s= truct mm_struct *mm) #define pmdp_clear_flush_young_notify pmdp_clear_flush_young #define ptep_clear_young_notify ptep_test_and_clear_young #define pmdp_clear_young_notify pmdp_test_and_clear_young +#define pudp_clear_young_notify pudp_test_and_clear_young +#define p4dp_clear_young_notify p4dp_test_and_clear_young +#define pgdp_clear_young_notify pgdp_test_and_clear_young #define ptep_clear_flush_notify ptep_clear_flush #define pmdp_huge_clear_flush_notify pmdp_huge_clear_flush #define pudp_huge_clear_flush_notify pudp_huge_clear_flush diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 85fc7554cd52..09c3e8bb11bf 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -184,6 +184,27 @@ static inline int pmd_young(pmd_t pmd) } #endif =20 +#ifndef pud_young +static inline int pud_young(pud_t pud) +{ + return 0; +} +#endif + +#ifndef p4d_young +static inline int p4d_young(p4d_t p4d) +{ + return 0; +} +#endif + +#ifndef pgd_young +static inline int pgd_young(pgd_t pgd) +{ + return 0; +} +#endif + #ifndef pmd_dirty static inline int pmd_dirty(pmd_t pmd) { @@ -386,6 +407,33 @@ static inline int pmdp_test_and_clear_young(struct vm_= area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG= */ #endif =20 +#ifndef pudp_test_and_clear_young +static inline int pudp_test_and_clear_young(struct vm_area_struct *vma, + unsigned long address, + pud_t *pudp) +{ + return 0; +} +#endif + +#ifndef p4dp_test_and_clear_young +static inline int p4dp_test_and_clear_young(struct vm_area_struct *vma, + unsigned long address, + p4d_t *p4dp) +{ + return 0; +} +#endif + +#ifndef pgdp_test_and_clear_young +static inline int pgdp_test_and_clear_young(struct vm_area_struct *vma, + unsigned long address, + pgd_t *pgdp) +{ + return 0; +} +#endif + #ifndef __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pte_t *ptep); @@ -1090,6 +1138,37 @@ static inline void arch_swap_restore(swp_entry_t ent= ry, struct folio *folio) #define flush_tlb_fix_spurious_fault(vma, address, ptep) flush_tlb_page(vm= a, address) #endif =20 +/* + * When walking page tables, get the address of the current boundary, + * or the start address of the range if that comes earlier. + */ + +#define pgd_addr_start(addr, start) \ +({ unsigned long __boundary =3D (addr) & PGDIR_MASK; \ + (__boundary > start) ? __boundary : (start); \ +}) + +#ifndef p4d_addr_start +#define p4d_addr_start(addr, start) \ +({ unsigned long __boundary =3D (addr) & P4D_MASK; \ + (__boundary > start) ? __boundary : (start); \ +}) +#endif + +#ifndef pud_addr_start +#define pud_addr_start(addr, start) \ +({ unsigned long __boundary =3D (addr) & PUD_MASK; \ + (__boundary > start) ? __boundary : (start); \ +}) +#endif + +#ifndef pmd_addr_start +#define pmd_addr_start(addr, start) \ +({ unsigned long __boundary =3D (addr) & PMD_MASK; \ + (__boundary > start) ? __boundary : (start); \ +}) +#endif + /* * When walking page tables, get the address of the next boundary, * or the end address of the range if that comes earlier. Although no --=20 2.21.3 From nobody Sun Feb 8 12:19:35 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0DF24CB20 for ; Mon, 18 Mar 2024 13:24:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710768267; cv=none; b=YxYvRnZux3CYb9N06oOEW59gRo/q75ifwIzDrv6eeLh6I1ngtpQOopzxaXqrX88McYf33e25KzbD+96qIer9LfpQ/eq5/TM3Hjs6w5ejDT1rer2jAzY4//FVlR1fmIvHHy51tTv/aGYFOd2JzcVm+Dr9fhyXHTo3LfiYLp2PhJY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710768267; c=relaxed/simple; bh=snDYfeLbVNxVF6bzeLRakha46vIlLpVRETc5gZIsPWM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=PC2qMv6XiZN1UTYcJt+fxdO1yKhgNNDp88XKqT6tvvM77JcRw/FnHxzir4Mx48c56W3jbowxI0Iu9pPNvwWCDJJweXUngRR3dHvIwT4UlDpaS76vMVIMq99WXFRxy1JLxYYo5bUpK7Ed5t7No7vEE3Yn7pLYyY5XnwJMZX3AFKk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KjppUe35; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KjppUe35" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1710768264; x=1742304264; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=snDYfeLbVNxVF6bzeLRakha46vIlLpVRETc5gZIsPWM=; b=KjppUe35BAFTSra2PDnQcQ5dZMH7HZOrGQZEi4GDd+uaX6UmB7GO2otY q/wP3f9UNuxqJ8U20DNvUJZK+VhszfFDD/UNtFW7bFW/tzexJZwrtMZLQ DRx/rZKi6tc3pG86/5WLfyHpqUOAS33tjS314ipPDCw5GSUZvf3SohzW1 Sqh2ilgIvS627o7wXeh5FW9aOvb+Ka0FTYixHX3xdQsqLhbsRyxvsyFvA VwzdvNH5RuZ7XOVPonVRjZFPWEr0STPG9G+1EXqeZPDOl3PnWb48NoX2O 6VfVaU794ZyoPlPpk6RcD8X5SQjlBnXAKZ3cZzL5OqI4JRvXVpJEDMuzu A==; X-IronPort-AV: E=McAfee;i="6600,9927,11016"; a="23037946" X-IronPort-AV: E=Sophos;i="6.07,134,1708416000"; d="scan'208";a="23037946" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2024 06:24:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,134,1708416000"; d="scan'208";a="14102758" Received: from adr-par-inspur1.iind.intel.com ([10.223.93.209]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2024 06:24:21 -0700 From: Aravinda Prasad To: damon@lists.linux.dev, linux-mm@kvack.org, sj@kernel.org, linux-kernel@vger.kernel.org Cc: aravinda.prasad@intel.com, s2322819@ed.ac.uk, sandeep4.kumar@intel.com, ying.huang@intel.com, dave.hansen@intel.com, dan.j.williams@intel.com, sreenivas.subramoney@intel.com, antti.kervinen@intel.com, alexander.kanevskiy@intel.com Subject: [PATCH v2 2/3] mm/damon: profiling enhancement Date: Mon, 18 Mar 2024 18:58:47 +0530 Message-Id: <20240318132848.82686-3-aravinda.prasad@intel.com> X-Mailer: git-send-email 2.21.3 In-Reply-To: <20240318132848.82686-1-aravinda.prasad@intel.com> References: <20240318132848.82686-1-aravinda.prasad@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds profiling enhancement for DAMON. Given the sampling_addr and its region bounds, this patch picks the highest possible page table tree level such that the address range covered by the picked page table level (P*D) is within the region's bounds. Once a page table level is picked, access bit setting and checking is done at that level. As the higher levels of the page table tree covers a larger address space, any accessed bit set implies one or more pages in the given region is accessed. This helps in quickly identifying hot regions when the region size is large (e.g., several GBs), which is common for large footprint applications. Signed-off-by: Alan Nair Signed-off-by: Sandeep Kumar Signed-off-by: Aravinda Prasad --- mm/damon/vaddr.c | 233 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 221 insertions(+), 12 deletions(-) diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 381559e4a1fa..daa1a2aedab6 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -52,6 +52,53 @@ static struct mm_struct *damon_get_mm(struct damon_targe= t *t) return mm; } =20 +/* Pick the highest possible page table profiling level for addr + * in the region defined by start and end + */ +static int pick_profile_level(unsigned long start, unsigned long end, + unsigned long addr) +{ + /* Start with PTE and check if higher levels can be picked */ + int level =3D 0; + + if (!arch_has_hw_nonleaf_pmd_young()) + return level; + + /* Check if PMD or higher can be picked, else use PTE */ + if (pmd_addr_start(addr, (start) - 1) < start + || pmd_addr_end(addr, (end) + 1) > end) + return level; + + level++; + /* Check if PUD or higher can be picked, else use PMD */ + if (pud_addr_start(addr, (start) - 1) < start + || pud_addr_end(addr, (end) + 1) > end) + return level; + + if (pgtable_l5_enabled()) { + level++; + /* Check if P4D or higher can be picked, else use PUD */ + if (p4d_addr_start(addr, (start) - 1) < start + || p4d_addr_end(addr, (end) + 1) > end) + return level; + } + + level++; + /* Check if PGD can be picked, else return PUD level */ + if (pgd_addr_start(addr, (start) - 1) < start + || pgd_addr_end(addr, (end) + 1) > end) + return level; + +#ifdef CONFIG_PAGE_TABLE_ISOLATION + /* Do not pick PGD level if PTI is enabled */ + if (static_cpu_has(X86_FEATURE_PTI)) + return level; +#endif + + /* Return PGD level */ + return ++level; +} + /* * Functions for the initial monitoring target regions construction */ @@ -387,16 +434,90 @@ static int damon_mkold_hugetlb_entry(pte_t *pte, unsi= gned long hmask, #define damon_mkold_hugetlb_entry NULL #endif /* CONFIG_HUGETLB_PAGE */ =20 -static const struct mm_walk_ops damon_mkold_ops =3D { - .pmd_entry =3D damon_mkold_pmd_entry, + +#ifdef CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG +static int damon_mkold_pmd(pmd_t *pmd, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + spinlock_t *ptl; + + if (!pmd_present(*pmd)) + return 0; + + ptl =3D pmd_lock(walk->mm, pmd); + pmdp_clear_young_notify(walk->vma, addr, pmd); + spin_unlock(ptl); + + return 0; +} + +static int damon_mkold_pud(pud_t *pud, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + spinlock_t *ptl; + + if (!pud_present(*pud)) + return 0; + + ptl =3D pud_lock(walk->mm, pud); + pudp_clear_young_notify(walk->vma, addr, pud); + spin_unlock(ptl); + + return 0; +} + +static int damon_mkold_p4d(p4d_t *p4d, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + struct mm_struct *mm =3D walk->mm; + + if (!p4d_present(*p4d)) + return 0; + + spin_lock(&mm->page_table_lock); + p4dp_clear_young_notify(walk->vma, addr, p4d); + spin_unlock(&mm->page_table_lock); + + return 0; +} + +static int damon_mkold_pgd(pgd_t *pgd, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + struct mm_struct *mm =3D walk->mm; + + if (!pgd_present(*pgd)) + return 0; + + spin_lock(&mm->page_table_lock); + pgdp_clear_young_notify(walk->vma, addr, pgd); + spin_unlock(&mm->page_table_lock); + + return 0; +} +#endif + +static const struct mm_walk_ops damon_mkold_ops[] =3D { + {.pmd_entry =3D damon_mkold_pmd_entry, .hugetlb_entry =3D damon_mkold_hugetlb_entry, - .walk_lock =3D PGWALK_RDLOCK, + .walk_lock =3D PGWALK_RDLOCK}, +#ifdef CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG + {.pmd_entry =3D damon_mkold_pmd}, + {.pud_entry =3D damon_mkold_pud}, + {.p4d_entry =3D damon_mkold_p4d}, + {.pgd_entry =3D damon_mkold_pgd}, +#endif }; =20 -static void damon_va_mkold(struct mm_struct *mm, unsigned long addr) +static void damon_va_mkold(struct mm_struct *mm, struct damon_region *r) { + unsigned long addr =3D r->sampling_addr; + int profile_level; + + profile_level =3D pick_profile_level(r->ar.start, r->ar.end, addr); + mmap_read_lock(mm); - walk_page_range(mm, addr, addr + 1, &damon_mkold_ops, NULL); + walk_page_range(mm, addr, addr + 1, &damon_mkold_ops[profile_level], NULL= ); mmap_read_unlock(mm); } =20 @@ -409,7 +530,7 @@ static void __damon_va_prepare_access_check(struct mm_s= truct *mm, { r->sampling_addr =3D damon_rand(r->ar.start, r->ar.end); =20 - damon_va_mkold(mm, r->sampling_addr); + damon_va_mkold(mm, r); } =20 static void damon_va_prepare_access_checks(struct damon_ctx *ctx) @@ -531,22 +652,110 @@ static int damon_young_hugetlb_entry(pte_t *pte, uns= igned long hmask, #define damon_young_hugetlb_entry NULL #endif /* CONFIG_HUGETLB_PAGE */ =20 -static const struct mm_walk_ops damon_young_ops =3D { - .pmd_entry =3D damon_young_pmd_entry, + +#ifdef CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG +static int damon_young_pmd(pmd_t *pmd, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + spinlock_t *ptl; + struct damon_young_walk_private *priv =3D walk->private; + + if (!pmd_present(*pmd)) + return 0; + + ptl =3D pmd_lock(walk->mm, pmd); + if (pmd_young(*pmd) || mmu_notifier_test_young(walk->mm, addr)) + priv->young =3D true; + + *priv->folio_sz =3D (1UL << PMD_SHIFT); + spin_unlock(ptl); + + return 0; +} + +static int damon_young_pud(pud_t *pud, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + spinlock_t *ptl; + struct damon_young_walk_private *priv =3D walk->private; + + if (!pud_present(*pud)) + return 0; + + ptl =3D pud_lock(walk->mm, pud); + if (pud_young(*pud) || mmu_notifier_test_young(walk->mm, addr)) + priv->young =3D true; + + *priv->folio_sz =3D (1UL << PUD_SHIFT); + spin_unlock(ptl); + + return 0; +} + +static int damon_young_p4d(p4d_t *p4d, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + struct mm_struct *mm =3D walk->mm; + struct damon_young_walk_private *priv =3D walk->private; + + if (!p4d_present(*p4d)) + return 0; + + spin_lock(&mm->page_table_lock); + if (p4d_young(*p4d) || mmu_notifier_test_young(walk->mm, addr)) + priv->young =3D true; + + *priv->folio_sz =3D (1UL << P4D_SHIFT); + spin_unlock(&mm->page_table_lock); + + return 0; +} + +static int damon_young_pgd(pgd_t *pgd, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + struct damon_young_walk_private *priv =3D walk->private; + + if (!pgd_present(*pgd)) + return 0; + + spin_lock(&pgd_lock); + if (pgd_young(*pgd) || mmu_notifier_test_young(walk->mm, addr)) + priv->young =3D true; + + *priv->folio_sz =3D (1UL << PGDIR_SHIFT); + spin_unlock(&pgd_lock); + + return 0; +} +#endif + +static const struct mm_walk_ops damon_young_ops[] =3D { + {.pmd_entry =3D damon_young_pmd_entry, .hugetlb_entry =3D damon_young_hugetlb_entry, - .walk_lock =3D PGWALK_RDLOCK, + .walk_lock =3D PGWALK_RDLOCK}, +#ifdef CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG + {.pmd_entry =3D damon_young_pmd}, + {.pud_entry =3D damon_young_pud}, + {.p4d_entry =3D damon_young_p4d}, + {.pgd_entry =3D damon_young_pgd}, +#endif }; =20 -static bool damon_va_young(struct mm_struct *mm, unsigned long addr, +static bool damon_va_young(struct mm_struct *mm, struct damon_region *r, unsigned long *folio_sz) { + unsigned long addr =3D r->sampling_addr; + int profile_level; struct damon_young_walk_private arg =3D { .folio_sz =3D folio_sz, .young =3D false, }; =20 + profile_level =3D pick_profile_level(r->ar.start, r->ar.end, addr); + mmap_read_lock(mm); - walk_page_range(mm, addr, addr + 1, &damon_young_ops, &arg); + walk_page_range(mm, addr, addr + 1, &damon_young_ops[profile_level], &arg= ); mmap_read_unlock(mm); return arg.young; } @@ -577,7 +786,7 @@ static void __damon_va_check_access(struct mm_struct *m= m, return; } =20 - last_accessed =3D damon_va_young(mm, r->sampling_addr, &last_folio_sz); + last_accessed =3D damon_va_young(mm, r, &last_folio_sz); damon_update_region_access_rate(r, last_accessed, attrs); =20 last_addr =3D r->sampling_addr; --=20 2.21.3 From nobody Sun Feb 8 12:19:35 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2D694DA15 for ; Mon, 18 Mar 2024 13:24:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710768277; cv=none; b=aeu99b7cYBZvv53UD5QLiaR1eU3vfIO34oVvah/9ld3pPYNuD8pEZHJabr9a7RazQAwyf5JA7VYTfH49gGjrBptiM86hj839YZai+78rPIGW3VyJ9rZx4IKmNwUAnqUHyd0lY9IWCxVjhXCYmjUKLq5ruBBlohnyp8KSiwYp0Ck= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710768277; c=relaxed/simple; bh=uGFUoH8Waa+J7z1fmC561WGAlEYBZGb45Er8B07BwhE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=m7dbQV/yPfyw437hsMT93MsGUgJgH83YtRYGkj2TD1KDO7EWHGOZB4uRNuK0MOvPff8e2sGHI+7JEXKPlDXwY4p+4srvIpfEZWzRw+uMWts33g9BZJDKKtNJXUQwB4Mx3d48Gc8AullcDR6dMpGd1VHRaGonnHwXWrGRpxRS0d4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OMtKch29; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OMtKch29" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1710768275; x=1742304275; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uGFUoH8Waa+J7z1fmC561WGAlEYBZGb45Er8B07BwhE=; b=OMtKch29PXKs7hfHqLHyg/4sjGnMpwJhUGQZLm7DAnFFvHplZT1SrCzY /6DAHE3oHOJ0Yjc25E1VQhfg6tAbKxHgh4OEFmS3FfyftrxekcTQz7T63 zA/561xjUIngvLjgf0vmqCt3a7HuIBuCdK8iTD8yQ0m10NPu0wfVh6zaA IXv7HhgDuJy3/uF0APSuN6bQyJnSurw8f5EQNnTPbfg/vDzU5tzdHvLhR 9+gp8sCuaPnCQhuIx+hzmapS01og2k6XRogIoBF4oeM3uda/PQiZiIznq US8/n08CqW01UYSW7PcLhpADlAj0yi03iKaLN32hxqPr1LGZlrvBP5Ck/ Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11016"; a="23037998" X-IronPort-AV: E=Sophos;i="6.07,134,1708416000"; d="scan'208";a="23037998" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2024 06:24:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,134,1708416000"; d="scan'208";a="14102868" Received: from adr-par-inspur1.iind.intel.com ([10.223.93.209]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2024 06:24:31 -0700 From: Aravinda Prasad To: damon@lists.linux.dev, linux-mm@kvack.org, sj@kernel.org, linux-kernel@vger.kernel.org Cc: aravinda.prasad@intel.com, s2322819@ed.ac.uk, sandeep4.kumar@intel.com, ying.huang@intel.com, dave.hansen@intel.com, dan.j.williams@intel.com, sreenivas.subramoney@intel.com, antti.kervinen@intel.com, alexander.kanevskiy@intel.com Subject: [PATCH v2 3/3] mm/damon: documentation updates Date: Mon, 18 Mar 2024 18:58:48 +0530 Message-Id: <20240318132848.82686-4-aravinda.prasad@intel.com> X-Mailer: git-send-email 2.21.3 In-Reply-To: <20240318132848.82686-1-aravinda.prasad@intel.com> References: <20240318132848.82686-1-aravinda.prasad@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch updates the kernel documentation. Signed-off-by: Aravinda Prasad --- Documentation/mm/damon/design.rst | 42 +++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/des= ign.rst index 5620aab9b385..59014ecbb551 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -139,6 +139,48 @@ the interference is the responsibility of sysadmins. = However, it solves the conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page fl= ags, as Idle page tracking does. =20 +Profiling enhancement for virtual address space +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For virtual address space tracking, relying on checking Accessed bit(s) on= ly +at the leaf level of the page table is inefficient. Hardware architectures +have supported Accessed bit(s) at all levels of the page table tree by +updating them during the page table walk. Hence, DAMON dynamically +profiles different levels (PMD/PUD/P4D) of a multi-level page table tree. + +DAMON leverages the following key insight: a data page that is accessed +should also have the Accessed bit set at PMD, PUD, P4D, and PGD entry. +Similarly, if the Accessed bit in a PGD entry (or a PUD/PMD entry) is +not set, then none of the data pages under the PGD entry (or PUD/PMD +entry) subtree are accessed. DAMON profiles Accessed bits at the highest +possible level of the page table tree to identify the regions that are +accessed. + +For example, consider a region and the sampling address (SA) in the below +figure. The address range of a PUD entry corresponding to SA is within +region bounds and hence PUD is picked for checking and setting the +Accessed bits. However, this not true if P4D is picked for profiling. +Hence in this case PUD is the highest possible level that can be picked +for profiling. + ....... + + P4D + + ....... + / \ + / \ + / \ + / \ + / \ + / ....... \ + / + PUD + \ + / ....... \ + / / \ \ +- - - - - +-----*---*--+=3D=3D=3D=3D+-*------+- -*- - - + + # SA # + + + # # + +- - - - - +------------+=3D=3D=3D=3D+--------+- - - - - + + | ----- DAMON region ------| + =20 Core Logics =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --=20 2.21.3