From nobody Sun Feb 8 17:37:40 2026 Received: from mail-oa1-f51.google.com (mail-oa1-f51.google.com [209.85.160.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E06915B0FF for ; Mon, 5 Aug 2024 12:55:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862553; cv=none; b=cP9Itgf62kkGwftFPuy7My05GZK7c7Uby7FzOia4DiFrTZ2HUqCoqoMy7erp9ZkcMjJ65TcDxM27BvZRfuijtUNpwjbadRc2wHi1dXvUI7EyMJz9GUNJEuWVp6akJfQ4FqQMerqY9AVCAvXqheA0e6rMSlQh53BBk6LIV9POxAI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862553; c=relaxed/simple; bh=K7Alr5MmTYl8M2Nek6l/USUCt8iBbjOZ/iyVmgmRNig=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GncSM8JY3gF+/FEPuy5vMqZCCaxr+NfAR9q2J0ElVYoDPCvIIs6mFmHi9dokFqXl+mk0iDZZJWr5qwIHsBr/LHVNBHa/qM443PLUa020r1Zk0Yoq/pxhAd6LkljneLNcqcfffv3yJr/usUaIlqkly6HipwZGD2U/1javKAnXWRw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=RDJbBh3n; arc=none smtp.client-ip=209.85.160.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="RDJbBh3n" Received: by mail-oa1-f51.google.com with SMTP id 586e51a60fabf-2635abdc742so1417945fac.2 for ; Mon, 05 Aug 2024 05:55:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1722862550; x=1723467350; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+E/n3u69l82MknfkBVWlLtitLQrhUjJGa1Gd08uFtUg=; b=RDJbBh3nhWBaUUfIY5GnXrkZbBqKEaX+6luBxBMXpwoAUCzoZTNRVT3SJ+3/0F1DMR CCYs8kU8Ap7k0Vtma/JxLyNgFnazSebtwWgr5dIFeBwpeAmSoEMg24lFCzmiCgdT0n6L UhKL+IhF+54osZ8oL/uBc2X79Ak4HDu41xINIJoFwPcwpvP87+Ot52/KODPpF0NPHBjS V+RrfXX35gUgRWO7qIBt8B+nWyK813tekY7P26T1XrAXiXPf0Oy4dCsup843LVkCV1XW 3Yo+6e5gi5Itmvw17GyJERglzxfUIrXkGrk4nzjLSCPF0A0Qs6aX9MHTsvKZ28FmKiob F/eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722862550; x=1723467350; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+E/n3u69l82MknfkBVWlLtitLQrhUjJGa1Gd08uFtUg=; b=MfhDNBIH8i5aNBciKxrYnyCKCLnPRAG7BQ32PSUE1UqsahCwNs7yCGhJMd2ycP9B1q Z3JE+S3x1XNRlVFgUcDeCZS8IRGe1xLAd9130UYobLtb+VpWb/knyGjP2Z4xCCL+CAMx 6LyZ51q2kX82LJadAUfKcbLWDy0ub5zLWDHBsPOhT8IMPi3pCLheDB/noZnWiaOsoPhk BWXhG54H+W0WRkaUFby+l5kMZZpVoSB6o/Jd5addrwqG/Z5RDgS5zDf5ShtWUNunWpnq sSoThPjbynEPZaPMCVPZDbr4EP38yLxORH0RnUBVZSbQv0r2v+LkxOxeDKCtIJmX0m+E wMPQ== X-Forwarded-Encrypted: i=1; AJvYcCV4iADmgSJ1uWh+z8DfaOq4e3aUzwP/48480fASFhxaPQ6kwHK7ZX0FccLQnuE438BF3lPP5O7gPnkBGws=@vger.kernel.org X-Gm-Message-State: AOJu0Yw5XEa7aZkXPKWSGmkBiH3L3snifuChH7wcZ0FQJ/CNJybEE25G CjE3FpoAAwJ8HEUwuIPs6vkjrmeDZ1mjA34yNjL/ULPJA5tMnrEwyVt0jFNLAm0= X-Google-Smtp-Source: AGHT+IFv0DoB3b0vEye0FjjGUQbE4c0I7YpIv3kq3xPjqljbSw0tByADYY9Wsgd4q3ipuhDCRO/jUQ== X-Received: by 2002:a05:6871:338c:b0:25e:44b9:b2ee with SMTP id 586e51a60fabf-26891aa626amr6693054fac.2.1722862550191; Mon, 05 Aug 2024 05:55:50 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7106ecfaf1asm5503030b3a.142.2024.08.05.05.55.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Aug 2024 05:55:49 -0700 (PDT) From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [RFC PATCH v2 1/7] mm: pgtable: make pte_offset_map_nolock() return pmdval Date: Mon, 5 Aug 2024 20:55:05 +0800 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make pte_offset_map_nolock() return pmdval so that we can recheck the *pmd once the lock is taken. This is a preparation for freeing empty PTE pages, no functional changes are expected. Signed-off-by: Qi Zheng --- Documentation/mm/split_page_table_lock.rst | 3 ++- arch/arm/mm/fault-armv.c | 2 +- arch/powerpc/mm/pgtable.c | 2 +- include/linux/mm.h | 4 ++-- mm/filemap.c | 2 +- mm/khugepaged.c | 4 ++-- mm/memory.c | 4 ++-- mm/mremap.c | 2 +- mm/page_vma_mapped.c | 2 +- mm/pgtable-generic.c | 21 ++++++++++++--------- mm/userfaultfd.c | 4 ++-- mm/vmscan.c | 2 +- 12 files changed, 28 insertions(+), 24 deletions(-) diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/= split_page_table_lock.rst index e4f6972eb6c04..e6a47d57531cd 100644 --- a/Documentation/mm/split_page_table_lock.rst +++ b/Documentation/mm/split_page_table_lock.rst @@ -18,7 +18,8 @@ There are helpers to lock/unlock a table and other access= or functions: pointer to its PTE table lock, or returns NULL if no PTE table; - pte_offset_map_nolock() maps PTE, returns pointer to PTE with pointer to its PTE table - lock (not taken), or returns NULL if no PTE table; + lock (not taken) and the value of its pmd entry, or returns NULL + if no PTE table; - pte_offset_map() maps PTE, returns pointer to PTE, or returns NULL if no PTE table; - pte_unmap() diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index 831793cd6ff94..db07e6a05eb6e 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -117,7 +117,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsig= ned long address, * must use the nested version. This also means we need to * open-code the spin-locking. */ - pte =3D pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl); + pte =3D pte_offset_map_nolock(vma->vm_mm, pmd, NULL, address, &ptl); if (!pte) return 0; =20 diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index 7316396e452d8..9b67d2a1457ed 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -398,7 +398,7 @@ void assert_pte_locked(struct mm_struct *mm, unsigned l= ong addr) */ if (pmd_none(*pmd)) return; - pte =3D pte_offset_map_nolock(mm, pmd, addr, &ptl); + pte =3D pte_offset_map_nolock(mm, pmd, NULL, addr, &ptl); BUG_ON(!pte); assert_spin_locked(ptl); pte_unmap(pte); diff --git a/include/linux/mm.h b/include/linux/mm.h index 43b40334e9b28..b1ef2afe620c5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2937,8 +2937,8 @@ static inline pte_t *pte_offset_map_lock(struct mm_st= ruct *mm, pmd_t *pmd, return pte; } =20 -pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, - unsigned long addr, spinlock_t **ptlp); +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, pmd_t *pmdv= alp, + unsigned long addr, spinlock_t **ptlp); =20 #define pte_unmap_unlock(pte, ptl) do { \ spin_unlock(ptl); \ diff --git a/mm/filemap.c b/mm/filemap.c index 67c3f5136db33..3285dffb64cf8 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3231,7 +3231,7 @@ static vm_fault_t filemap_fault_recheck_pte_none(stru= ct vm_fault *vmf) if (!(vmf->flags & FAULT_FLAG_ORIG_PTE_VALID)) return 0; =20 - ptep =3D pte_offset_map_nolock(vma->vm_mm, vmf->pmd, vmf->address, + ptep =3D pte_offset_map_nolock(vma->vm_mm, vmf->pmd, NULL, vmf->address, &vmf->ptl); if (unlikely(!ptep)) return VM_FAULT_NOPAGE; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index cdd1d8655a76b..91b93259ee214 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1009,7 +1009,7 @@ static int __collapse_huge_page_swapin(struct mm_stru= ct *mm, }; =20 if (!pte++) { - pte =3D pte_offset_map_nolock(mm, pmd, address, &ptl); + pte =3D pte_offset_map_nolock(mm, pmd, NULL, address, &ptl); if (!pte) { mmap_read_unlock(mm); result =3D SCAN_PMD_NULL; @@ -1598,7 +1598,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, if (userfaultfd_armed(vma) && !(vma->vm_flags & VM_SHARED)) pml =3D pmd_lock(mm, pmd); =20 - start_pte =3D pte_offset_map_nolock(mm, pmd, haddr, &ptl); + start_pte =3D pte_offset_map_nolock(mm, pmd, NULL, haddr, &ptl); if (!start_pte) /* mmap_lock + page lock should prevent this */ goto abort; if (!pml) diff --git a/mm/memory.c b/mm/memory.c index d6a9dcddaca4a..afd8a967fb953 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1108,7 +1108,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, ret =3D -ENOMEM; goto out; } - src_pte =3D pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl); + src_pte =3D pte_offset_map_nolock(src_mm, src_pmd, NULL, addr, &src_ptl); if (!src_pte) { pte_unmap_unlock(dst_pte, dst_ptl); /* ret =3D=3D 0 */ @@ -5671,7 +5671,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *v= mf) * it into a huge pmd: just retry later if so. */ vmf->pte =3D pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd, - vmf->address, &vmf->ptl); + NULL, vmf->address, &vmf->ptl); if (unlikely(!vmf->pte)) return 0; vmf->orig_pte =3D ptep_get_lockless(vmf->pte); diff --git a/mm/mremap.c b/mm/mremap.c index e7ae140fc6409..f672d0218a6fe 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -175,7 +175,7 @@ static int move_ptes(struct vm_area_struct *vma, pmd_t = *old_pmd, err =3D -EAGAIN; goto out; } - new_pte =3D pte_offset_map_nolock(mm, new_pmd, new_addr, &new_ptl); + new_pte =3D pte_offset_map_nolock(mm, new_pmd, NULL, new_addr, &new_ptl); if (!new_pte) { pte_unmap_unlock(old_pte, old_ptl); err =3D -EAGAIN; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index ae5cc42aa2087..507701b7bcc1e 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -33,7 +33,7 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw, sp= inlock_t **ptlp) * Though, in most cases, page lock already protects this. */ pvmw->pte =3D pte_offset_map_nolock(pvmw->vma->vm_mm, pvmw->pmd, - pvmw->address, ptlp); + NULL, pvmw->address, ptlp); if (!pvmw->pte) return false; =20 diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index a78a4adf711ac..443e3b34434a5 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -305,7 +305,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr,= pmd_t *pmdvalp) return NULL; } =20 -pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, pmd_t *pmdv= alp, unsigned long addr, spinlock_t **ptlp) { pmd_t pmdval; @@ -314,6 +314,8 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_= t *pmd, pte =3D __pte_offset_map(pmd, addr, &pmdval); if (likely(pte)) *ptlp =3D pte_lockptr(mm, &pmdval); + if (pmdvalp) + *pmdvalp =3D pmdval; return pte; } =20 @@ -347,14 +349,15 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pm= d_t *pmd, * and disconnected table. Until pte_unmap(pte) unmaps and rcu_read_unloc= k()s * afterwards. * - * pte_offset_map_nolock(mm, pmd, addr, ptlp), above, is like pte_offset_m= ap(); - * but when successful, it also outputs a pointer to the spinlock in ptlp = - as - * pte_offset_map_lock() does, but in this case without locking it. This = helps - * the caller to avoid a later pte_lockptr(mm, *pmd), which might by that = time - * act on a changed *pmd: pte_offset_map_nolock() provides the correct spi= nlock - * pointer for the page table that it returns. In principle, the caller s= hould - * recheck *pmd once the lock is taken; in practice, no callsite needs tha= t - - * either the mmap_lock for write, or pte_same() check on contents, is eno= ugh. + * pte_offset_map_nolock(mm, pmd, pmdvalp, addr, ptlp), above, is like + * pte_offset_map(); but when successful, it also outputs a pointer to the + * spinlock in ptlp - as pte_offset_map_lock() does, but in this case with= out + * locking it. This helps the caller to avoid a later pte_lockptr(mm, *pm= d), + * which might by that time act on a changed *pmd: pte_offset_map_nolock() + * provides the correct spinlock pointer for the page table that it return= s. + * In principle, the caller should recheck *pmd once the lock is taken; Bu= t in + * most cases, either the mmap_lock for write, or pte_same() check on cont= ents, + * is enough. * * Note that free_pgtables(), used after unmapping detached vmas, or when * exiting the whole mm, does not take page table lock before freeing a pa= ge diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 3b7715ecf292a..aa3c9cc51cc36 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1143,7 +1143,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t= *dst_pmd, pmd_t *src_pmd, src_addr, src_addr + PAGE_SIZE); mmu_notifier_invalidate_range_start(&range); retry: - dst_pte =3D pte_offset_map_nolock(mm, dst_pmd, dst_addr, &dst_ptl); + dst_pte =3D pte_offset_map_nolock(mm, dst_pmd, NULL, dst_addr, &dst_ptl); =20 /* Retry if a huge pmd materialized from under us */ if (unlikely(!dst_pte)) { @@ -1151,7 +1151,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t= *dst_pmd, pmd_t *src_pmd, goto out; } =20 - src_pte =3D pte_offset_map_nolock(mm, src_pmd, src_addr, &src_ptl); + src_pte =3D pte_offset_map_nolock(mm, src_pmd, NULL, src_addr, &src_ptl); =20 /* * We held the mmap_lock for reading so MADV_DONTNEED diff --git a/mm/vmscan.c b/mm/vmscan.c index 31d13462571e6..b00cd560c0e43 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3378,7 +3378,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long = start, unsigned long end, DEFINE_MAX_SEQ(walk->lruvec); int old_gen, new_gen =3D lru_gen_from_seq(max_seq); =20 - pte =3D pte_offset_map_nolock(args->mm, pmd, start & PMD_MASK, &ptl); + pte =3D pte_offset_map_nolock(args->mm, pmd, NULL, start & PMD_MASK, &ptl= ); if (!pte) return false; if (!spin_trylock(ptl)) { --=20 2.20.1 From nobody Sun Feb 8 17:37:40 2026 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11B4D15B995 for ; Mon, 5 Aug 2024 12:55:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862557; cv=none; b=eaItMMCauW/PJbTHWn1Dp5PggeMvXWIco1geLDVK0ZSYPWQTBXp/vr5TClPxxucC8hGZ8O86hN9lPxMByyLgYCwCcu+rFdlajiF3June0IFjImstb3cX82XreVmfpjeJeVxy9uKXlh0+pzEu8SYQSyRY2nuDAiaWv5TmvfkXDNE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862557; c=relaxed/simple; bh=2Xqu+2y/kqGayw1LhiCWd/D/5aR4Vut5ubymQlHw490=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=t2qCh+P9ZiFfQjYPVAdQj5A8tmCv/tvZ879POu8eeObGwuIchNrvZH2FULmPgYKDq6geiiPB8lBd0po5JScTxP62BdwS+Fla7WedXpmSV/ehGE1Cg07fHq0JXqTANhkINyfGLCZrmj4mYwJupwXyX54s11/JSzqLiUFHRk2diSk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=g8nZY27O; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="g8nZY27O" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-70d2cd07869so1261176b3a.0 for ; Mon, 05 Aug 2024 05:55:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1722862555; x=1723467355; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=omYHK4YJqVuB4cm7/RSntivvxRLai4c0fyHEJvres6w=; b=g8nZY27OWg96d/TGmk6lu2DKe7pgZeFhWNASkM3SztHrkajrK38SbMLONG21i6+mKz fOIYVXmp6jInFt9MgcNwYFYLZOEOWANCi+IekcEZlJVeeEIG2cHL3ZmK92NGHrpWlQlC WFhORBMz6xds1LL7DeaK4Q4quK1KhjgaQ/1G2Sbs2hwxRMlDzDHr+MJlRbGUx0HKh0tz rHcv6Z6t/H2RkTp61BNiG3C0gVVnD+dGZ8kdj7sBiSJdt8cy1/f/jB1Xga8rOZ1IFEXw 9IMuLN1rhRmZyZIk8ZrLZmT0RD1whdwbihsKUHePA1rRiUxgGTDYCNpm4zbbJwxdUk4b epGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722862555; x=1723467355; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=omYHK4YJqVuB4cm7/RSntivvxRLai4c0fyHEJvres6w=; b=ARt4DHbWRa8JofCURcZO3KzdwJz9X1/MnGTpclWwwVRGsChIqW4d/Fx3Nm5wYv9+IU kOmUwx3kIDIQSiWbeH034SOwDTaIuj564iYSOAL6Zy1i2Z0f1lWYpML1Nxpp8gPp6JjW mOMsvbe69ACKpnEM0bcPgXa5L83i17xR3XqR6NhZGrz1sXmI40bMxkuL6Y04M5xtXxml TWQvqYc0qbF1y+yhtMSxp0k0E0+A5Grxnq8v39j/mKxZYvcLoQ+6Y4WRwsBtNkeoKvdK AZhz2F10GePECZmFDlMqE8p0BdpnKDvFl6k646QxHSi2OzBpV5wGMLXQOdn/+UroCOWr Y8wg== X-Forwarded-Encrypted: i=1; AJvYcCX+/6+/0OvyMYGseMKldCPYQr9VakyQCr68iY4cZQw+i/07BsFPN6fCTpdyxYk330FfAUyQaWRvJugiOC4=@vger.kernel.org X-Gm-Message-State: AOJu0YwqzAsaqcgHf1PIGQLYIym1KMPEU5UBOwuH5QqyPVPKDN8YAwyD j8il/hUCVcm8m3Bvz70KTvmvpl2MRxZKUTwaNqt90I6D2WjyYIngoBj5vN5YOts= X-Google-Smtp-Source: AGHT+IGYYukG+ok6MYSzTAxan5mNsM3B2kkWkuJBYY1IjywvHeK1bY7aqXgLyTgGlf83PVoHS2ooMA== X-Received: by 2002:a05:6a00:21ce:b0:70d:2c09:45ff with SMTP id d2e1a72fcca58-7106d0927eemr9121873b3a.4.1722862555191; Mon, 05 Aug 2024 05:55:55 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7106ecfaf1asm5503030b3a.142.2024.08.05.05.55.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Aug 2024 05:55:54 -0700 (PDT) From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [RFC PATCH v2 2/7] mm: introduce CONFIG_PT_RECLAIM Date: Mon, 5 Aug 2024 20:55:06 +0800 Message-Id: <7c726839e2610f1873d9fa2a7c60715796579d1a.1722861064.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This configuration variable will be used to build the code needed to free empty user page table pages. This feature is not available on all architectures yet, so ARCH_SUPPORTS_PT_RECLAIM is needed. We can remove it once all architectures support this feature. Signed-off-by: Qi Zheng --- mm/Kconfig | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index 3936fe4d26d91..c10741c54dcb1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1278,6 +1278,20 @@ config NUMA_EMU into virtual nodes when booted with "numa=3Dfake=3DN", where N is the number of nodes. This is only useful for debugging. =20 +config ARCH_SUPPORTS_PT_RECLAIM + def_bool n + +config PT_RECLAIM + bool "reclaim empty user page table pages" + default y + depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP + select MMU_GATHER_RCU_TABLE_FREE + help + Try to reclaim empty user page table pages in paths other that munmap + and exit_mmap path. + + Note: now only empty user PTE page table pages will be reclaimed. + source "mm/damon/Kconfig" =20 endmenu --=20 2.20.1 From nobody Sun Feb 8 17:37:40 2026 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 81E1F15B995 for ; Mon, 5 Aug 2024 12:56:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862563; cv=none; b=McMc1U2bcjSK8i8aqG9WjjvVY7aKc47hAggMlqjLxpcYqCAD/IMRE2auYodEidyPHuKdMb+6urMJs0Uo8WIgZ7ulioh7E7ex+RgIP4ZsCFLa+GWNZOHOMOUQTbJETHclSZSniabO9u1ek9tr4rKvD/R0QhyoZvZ4aSSFb2QS2Tw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862563; c=relaxed/simple; bh=Nwv1k6bT3cXMhwbJoDYqVUEvjQvNxaIfaxouQYBgczY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=IZuhzgjRi1vaSIMG95mwKa6hxloq4wSL5KZ8ERpOjJwxDoETsgO5Rd0BTCB4hTmqtpwaTgGrxIMW9+9E2kZfxTyXcgaDNA0SZT9CwTJ4cbiq9aWeMlPYR3oxkAd45SiDEGI21Xi9Rhss+JCqEB5RZq2Hzxs5qwY2aBvyNT9gLWI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=JIIRD1Rm; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="JIIRD1Rm" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-70d19bfdabbso252651b3a.2 for ; Mon, 05 Aug 2024 05:56:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1722862561; x=1723467361; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RhrH1Q9NnZeRUJcj26yTP/GgTj/smFAH27zlkHuMnIM=; b=JIIRD1RmBcBzuoyJUWMY9E3Ggdn6xWnJihL7QgzerdfO6rpid412F+gzMb8xRpuI9i ZYDGGa4/aiYyrmukmg21uRl65e0xOX/zvTKxtRWGczEgtpTwxaRIhNZngf1YEhBWY7Ii RIJ1oN4CVJXli76O/gDv3S6566AbD7MIb9moc7AWcglI8RlseLFOOverlBJNRFVm0PSK FRkxA/LoBnXZbjv3DtAAIOnlngh2MP14OzqgQ8873rUq2UDFMn9ZmidtqbF9Wghs6Ra8 Z84N5eeqvS8TsZ+c8ljP4E40RoxhSkLLCz0O0kW6rKvgSDe6/WU9+9+CeUrsrW3xFsyv 4sww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722862561; x=1723467361; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RhrH1Q9NnZeRUJcj26yTP/GgTj/smFAH27zlkHuMnIM=; b=jUw0zSNvU4/IEIgaFGlvJ4iVHBb/4abMIWMAiz0AYa1YSgbTZD9SVpLBaOVyqPgx4N gchcbEXYvH1Mn5tnYpCSTpwoa1yH4WPNjqNoeDrRnKO5K9ioJl3O95BiHyBKqGa2XRrz qt4DW8iPS2aZga0J81IFtNbE4zhPOXSbik8QLPNQ4uYFW7X1gXWIg78uqoxn32gA5dKC wdnztjy2EfnN1IyoBTPUPfsYWrOoYW/ckDqrPL7RWDPo1OO/ci5c+gkrdYbr7ymiBS81 joJ7XZakH0s2hkGXu6vxE+U6WhtZO1Eghs6qtJ44PYKOo0jJynlI5mmOUzY/wdmX7Nlt +8dg== X-Forwarded-Encrypted: i=1; AJvYcCUx9DXVgmzuEMWHb7QBvBquFbYUapGHOn/7Ev+hpBLoXdIjPb/3ZosH0xZLb9QfUcAWytdhB/nbGbiF3sZOQZRyKvgm9mT0w0TKJGZ/ X-Gm-Message-State: AOJu0YwUHV/aciDf0WfcaJsi37fimqScpaIt3NOppezlk3RSnposkI15 CS7y1Zm8uMndLp4E52x9IXAk088g9rk/MkNjSSQNDMhkXAYGLdpoWozLICCuyGY= X-Google-Smtp-Source: AGHT+IFKE0ajxoQQC50+uF8r9y1IJZQgbjRdiBt9JdNBowXPi+FU6ZnE1IQEVkFfZ4V1YKUGUyQ/8Q== X-Received: by 2002:a05:6a00:a17:b0:710:5d11:ec2f with SMTP id d2e1a72fcca58-7106cdd9fd4mr9461205b3a.0.1722862560754; Mon, 05 Aug 2024 05:56:00 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7106ecfaf1asm5503030b3a.142.2024.08.05.05.55.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Aug 2024 05:55:59 -0700 (PDT) From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [RFC PATCH v2 3/7] mm: pass address information to pmd_install() Date: Mon, 5 Aug 2024 20:55:07 +0800 Message-Id: <095dc55b68ef4650e2eaf66ad7dd2feabe87f89e.1722861064.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the subsequent implementation of freeing empty page table pages, we need the address information to flush tlb, so pass address to pmd_install() in advance. No functional changes. Signed-off-by: Qi Zheng --- include/linux/hugetlb.h | 2 +- include/linux/mm.h | 9 +++++---- mm/debug_vm_pgtable.c | 2 +- mm/filemap.c | 2 +- mm/gup.c | 2 +- mm/internal.h | 3 ++- mm/memory.c | 15 ++++++++------- mm/migrate_device.c | 2 +- mm/mprotect.c | 8 ++++---- mm/mremap.c | 2 +- mm/userfaultfd.c | 6 +++--- 11 files changed, 28 insertions(+), 25 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a76db143bffee..fcdcef367fffe 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -189,7 +189,7 @@ static inline pte_t *pte_offset_huge(pmd_t *pmd, unsign= ed long address) static inline pte_t *pte_alloc_huge(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { - return pte_alloc(mm, pmd) ? NULL : pte_offset_huge(pmd, address); + return pte_alloc(mm, pmd, address) ? NULL : pte_offset_huge(pmd, address); } #endif =20 diff --git a/include/linux/mm.h b/include/linux/mm.h index b1ef2afe620c5..f0b821dcb085b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2758,7 +2758,7 @@ static inline void mm_inc_nr_ptes(struct mm_struct *m= m) {} static inline void mm_dec_nr_ptes(struct mm_struct *mm) {} #endif =20 -int __pte_alloc(struct mm_struct *mm, pmd_t *pmd); +int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long addr); int __pte_alloc_kernel(pmd_t *pmd); =20 #if defined(CONFIG_MMU) @@ -2945,13 +2945,14 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, = pmd_t *pmd, pmd_t *pmdvalp, pte_unmap(pte); \ } while (0) =20 -#define pte_alloc(mm, pmd) (unlikely(pmd_none(*(pmd))) && __pte_alloc(mm, = pmd)) +#define pte_alloc(mm, pmd, addr) \ + (unlikely(pmd_none(*(pmd))) && __pte_alloc(mm, pmd, addr)) =20 #define pte_alloc_map(mm, pmd, address) \ - (pte_alloc(mm, pmd) ? NULL : pte_offset_map(pmd, address)) + (pte_alloc(mm, pmd, address) ? NULL : pte_offset_map(pmd, address)) =20 #define pte_alloc_map_lock(mm, pmd, address, ptlp) \ - (pte_alloc(mm, pmd) ? \ + (pte_alloc(mm, pmd, address) ? \ NULL : pte_offset_map_lock(mm, pmd, address, ptlp)) =20 #define pte_alloc_kernel(pmd, address) \ diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index e4969fb54da34..18375744e1845 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -1246,7 +1246,7 @@ static int __init init_args(struct pgtable_debug_args= *args) args->start_pmdp =3D pmd_offset(args->pudp, 0UL); WARN_ON(!args->start_pmdp); =20 - if (pte_alloc(args->mm, args->pmdp)) { + if (pte_alloc(args->mm, args->pmdp, args->vaddr)) { pr_err("Failed to allocate pte entries\n"); ret =3D -ENOMEM; goto error; diff --git a/mm/filemap.c b/mm/filemap.c index 3285dffb64cf8..efcb8ae3f235f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3453,7 +3453,7 @@ static bool filemap_map_pmd(struct vm_fault *vmf, str= uct folio *folio, } =20 if (pmd_none(*vmf->pmd) && vmf->prealloc_pte) - pmd_install(mm, vmf->pmd, &vmf->prealloc_pte); + pmd_install(mm, vmf->pmd, vmf->address, &vmf->prealloc_pte); =20 return false; } diff --git a/mm/gup.c b/mm/gup.c index d19884e097fd2..53c3b73810150 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -972,7 +972,7 @@ static struct page *follow_pmd_mask(struct vm_area_stru= ct *vma, spin_unlock(ptl); split_huge_pmd(vma, pmd, address); /* If pmd was left empty, stuff a page table in there quickly */ - return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : + return pte_alloc(mm, pmd, address) ? ERR_PTR(-ENOMEM) : follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); } page =3D follow_huge_pmd(vma, address, pmd, flags, ctx); diff --git a/mm/internal.h b/mm/internal.h index 52f7fc4e8ac30..dfc992de01115 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -325,7 +325,8 @@ void folio_activate(struct folio *folio); void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, struct vm_area_struct *start_vma, unsigned long floor, unsigned long ceiling, bool mm_wr_locked); -void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte); +void pmd_install(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, + pgtable_t *pte); =20 struct zap_details; void unmap_page_range(struct mmu_gather *tlb, diff --git a/mm/memory.c b/mm/memory.c index afd8a967fb953..fef1e425e4702 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -417,7 +417,8 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_st= ate *mas, } while (vma); } =20 -void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte) +void pmd_install(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, + pgtable_t *pte) { spinlock_t *ptl =3D pmd_lock(mm, pmd); =20 @@ -443,13 +444,13 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pg= table_t *pte) spin_unlock(ptl); } =20 -int __pte_alloc(struct mm_struct *mm, pmd_t *pmd) +int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long addr) { pgtable_t new =3D pte_alloc_one(mm); if (!new) return -ENOMEM; =20 - pmd_install(mm, pmd, &new); + pmd_install(mm, pmd, addr, &new); if (new) pte_free(mm, new); return 0; @@ -2115,7 +2116,7 @@ static int insert_pages(struct vm_area_struct *vma, u= nsigned long addr, =20 /* Allocate the PTE if necessary; takes PMD lock once only. */ ret =3D -ENOMEM; - if (pte_alloc(mm, pmd)) + if (pte_alloc(mm, pmd, addr)) goto out; =20 while (pages_to_write_in_pmd) { @@ -4686,7 +4687,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) * Use pte_alloc() instead of pte_alloc_map(), so that OOM can * be distinguished from a transient failure of pte_offset_map(). */ - if (pte_alloc(vma->vm_mm, vmf->pmd)) + if (pte_alloc(vma->vm_mm, vmf->pmd, vmf->address)) return VM_FAULT_OOM; =20 /* Use the zero-page for reads */ @@ -5033,8 +5034,8 @@ vm_fault_t finish_fault(struct vm_fault *vmf) } =20 if (vmf->prealloc_pte) - pmd_install(vma->vm_mm, vmf->pmd, &vmf->prealloc_pte); - else if (unlikely(pte_alloc(vma->vm_mm, vmf->pmd))) + pmd_install(vma->vm_mm, vmf->pmd, vmf->address, &vmf->prealloc_pte); + else if (unlikely(pte_alloc(vma->vm_mm, vmf->pmd, vmf->address))) return VM_FAULT_OOM; } =20 diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 6d66dc1c6ffa0..e4d2e19e6611d 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -598,7 +598,7 @@ static void migrate_vma_insert_page(struct migrate_vma = *migrate, goto abort; if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp)) goto abort; - if (pte_alloc(mm, pmdp)) + if (pte_alloc(mm, pmdp, addr)) goto abort; if (unlikely(anon_vma_prepare(vma))) goto abort; diff --git a/mm/mprotect.c b/mm/mprotect.c index 37cf8d249405d..7b58db622f825 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -329,11 +329,11 @@ pgtable_populate_needed(struct vm_area_struct *vma, u= nsigned long cp_flags) * allocation failures during page faults by kicking OOM and returning * error. */ -#define change_pmd_prepare(vma, pmd, cp_flags) \ +#define change_pmd_prepare(vma, pmd, addr, cp_flags) \ ({ \ long err =3D 0; \ if (unlikely(pgtable_populate_needed(vma, cp_flags))) { \ - if (pte_alloc(vma->vm_mm, pmd)) \ + if (pte_alloc(vma->vm_mm, pmd, addr)) \ err =3D -ENOMEM; \ } \ err; \ @@ -374,7 +374,7 @@ static inline long change_pmd_range(struct mmu_gather *= tlb, again: next =3D pmd_addr_end(addr, end); =20 - ret =3D change_pmd_prepare(vma, pmd, cp_flags); + ret =3D change_pmd_prepare(vma, pmd, addr, cp_flags); if (ret) { pages =3D ret; break; @@ -401,7 +401,7 @@ static inline long change_pmd_range(struct mmu_gather *= tlb, * cleared; make sure pmd populated if * necessary, then fall-through to pte level. */ - ret =3D change_pmd_prepare(vma, pmd, cp_flags); + ret =3D change_pmd_prepare(vma, pmd, addr, cp_flags); if (ret) { pages =3D ret; break; diff --git a/mm/mremap.c b/mm/mremap.c index f672d0218a6fe..7723d11e77cd2 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -628,7 +628,7 @@ unsigned long move_page_tables(struct vm_area_struct *v= ma, } if (pmd_none(*old_pmd)) continue; - if (pte_alloc(new_vma->vm_mm, new_pmd)) + if (pte_alloc(new_vma->vm_mm, new_pmd, new_addr)) break; if (move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma, new_pmd, new_addr, need_rmap_locks) < 0) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index aa3c9cc51cc36..41d659bd2589c 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -796,7 +796,7 @@ static __always_inline ssize_t mfill_atomic(struct user= faultfd_ctx *ctx, break; } if (unlikely(pmd_none(dst_pmdval)) && - unlikely(__pte_alloc(dst_mm, dst_pmd))) { + unlikely(__pte_alloc(dst_mm, dst_pmd, dst_addr))) { err =3D -ENOMEM; break; } @@ -1713,13 +1713,13 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, uns= igned long dst_start, err =3D -ENOENT; break; } - if (unlikely(__pte_alloc(mm, src_pmd))) { + if (unlikely(__pte_alloc(mm, src_pmd, src_addr))) { err =3D -ENOMEM; break; } } =20 - if (unlikely(pte_alloc(mm, dst_pmd))) { + if (unlikely(pte_alloc(mm, dst_pmd, dst_addr))) { err =3D -ENOMEM; break; } --=20 2.20.1 From nobody Sun Feb 8 17:37:40 2026 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D01415E5D2 for ; Mon, 5 Aug 2024 12:56:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862568; cv=none; b=cFD3b/dzAMxh4mHvetiW6snLsshwqblcT0ZBHwCuxoiA1zgw3UcBg2QTqcRYxCuTwDvEqQnhxfIJEBOkTbqUvJPS4p2+RW4jmG/XcyQZFruahW0hbIah4kxX6Yb5tvkqBjD1Aua6BSDtburSm1HAE29g+2BVcZ8DSGstnOZJCM0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862568; c=relaxed/simple; bh=L1uI+cjpnbWMQ8R4wTw2lvTGlCh3KmgzGDSMichkW54=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uWSR80NS15NunSkBctMX97G/zRn+u1t7LKSPA8KFhwGtR2FCgWcrxF/exA1vG8z7Eh7JWv7NwnDLMswuT7R8P0+netNVGXUHlYvV7A2RoitSb+BD0mbNGDHU6xAoDonB0hCiaZ90jCppw/juyiYHEUHdpttrZD7GKHjwZoDwz28= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=I3zCJxuW; arc=none smtp.client-ip=209.85.216.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="I3zCJxuW" Received: by mail-pj1-f42.google.com with SMTP id 98e67ed59e1d1-2cd46049d2bso1778950a91.3 for ; Mon, 05 Aug 2024 05:56:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1722862566; x=1723467366; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nsBfabV4yKHDUnZv1G5PTky4g9osbznwx2f7l8Y7SBU=; b=I3zCJxuWV5Uha0dWatt0Mz7JOktMKXm6PUlw1sIiKkRdLcV/pB07clmgQAv96zPMIT Gi1PwODMoEAJsJDvU+3fceuLJMliFd+YHw/HZBOwbT/839vOM456sv9inhE5eB5vJ/bN eDBhxxBM+eMvo/PMCiO/9yMx818f2YEeJlXvlrGP6OoiIDCxoxXS+uXbUbyX3LkvPmwj WKIahGeo0sI72DOc8b5F7kk287UN+stzWtLbwcnxJSix/2J0I0cRO7jYpqZ/1/SH3OdT z3+e3U99Oe3aSvteETOhMkoC52lGpO7X6hTeoXYqNQPmir6+J5j0hrR1iErTiwb8Jc+N XBrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722862566; x=1723467366; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nsBfabV4yKHDUnZv1G5PTky4g9osbznwx2f7l8Y7SBU=; b=rIsyydFgMWkBbynT5X3wSgiCGmspZHsrIBcd2Jw92Zeuujm3IpyA4YxI7mkRrl4AoW v9Y1/P63wptYBJLg0YOSd+3uC/+XUCLEKlAhRUEvCkIBsD0aH5npCe/Rn2fEdmoNsStl BvKb1mbzncGT8iQnXUW5fZMGLWSFRvZxvl6tAis5xbOi6KRUrlLLFzNKjf50VtnJcWfA kpNf2MEpFGFgCXE8mJELZ89ERGICH88TTpnmSZxvKhd/TMhbn47rsGHcZlMxdmzPE+Lm 2uZG+fXXeNGH4jg2tTvv8KvOvlHWvPkb1hJSC4TVejVzfLP86q4f3n3gXEJutdnEA8Ly J2ow== X-Forwarded-Encrypted: i=1; AJvYcCUCTiZEy2g3+toNA5BJOriidC8vulnowr8QL5xDOJOf8M3ynRGscrw5yJdieIoE5SClmlNNjSss8xQcwHw=@vger.kernel.org X-Gm-Message-State: AOJu0YzLwRP+tD8RSvkaEhpyVFS3ln2SyQnHO45zYVRhQTQE+UGy6b54 GpMO2buIUIi38CpSTXwmjEvmksv8FvC7esPlqVoMvZr4SoG0bUKzq8QAdN9bcx0= X-Google-Smtp-Source: AGHT+IGC1DBrGh4Hjx0OyIm/9r87y3+3knx7wGDASaN8LJsQ77h18R7/hh4KYzuDRNZTKJBgub17iA== X-Received: by 2002:a05:6a20:8414:b0:1c4:a742:ab20 with SMTP id adf61e73a8af0-1c69943c40cmr6974660637.0.1722862565538; Mon, 05 Aug 2024 05:56:05 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7106ecfaf1asm5503030b3a.142.2024.08.05.05.56.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Aug 2024 05:56:05 -0700 (PDT) From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [RFC PATCH v2 4/7] mm: pgtable: try to reclaim empty PTE pages in zap_page_range_single() Date: Mon, 5 Aug 2024 20:55:08 +0800 Message-Id: <9fb3dc75cb7f023750da2b4645fd098429deaad5.1722861064.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now in order to pursue high performance, applications mostly use some high-performance user-mode memory allocators, such as jemalloc or tcmalloc. These memory allocators use madvise(MADV_DONTNEED or MADV_FREE) to release physical memory, but neither MADV_DONTNEED nor MADV_FREE will release page table memory, which may cause huge page table memory usage. The following are a memory usage snapshot of one process which actually happened on our server: VIRT: 55t RES: 590g VmPTE: 110g In this case, most of the page table entries are empty. For such a PTE page where all entries are empty, we can actually free it back to the system for others to use. As a first step, this commit attempts to synchronously free the empty PTE pages in zap_page_range_single() (MADV_DONTNEED etc will invoke this). In order to reduce overhead, we only handle the cases with a high probability of generating empty PTE pages, and other cases will be filtered out, such as: - hugetlb vma (unsuitable) - userfaultfd_wp vma (may reinstall the pte entry) - writable private file mapping case (COW-ed anon page is not zapped) - etc For userfaultfd_wp and private file mapping cases (and MADV_FREE case, of course), consider scanning and freeing empty PTE pages asynchronously in the future. The following code snippet can show the effect of optimization: mmap 50G while (1) { for (; i < 1024 * 25; i++) { touch 2M memory madvise MADV_DONTNEED 2M } } As we can see, the memory usage of VmPTE is reduced: before after VIRT 50.0 GB 50.0 GB RES 3.1 MB 3.1 MB VmPTE 102640 KB 240 KB Signed-off-by: Qi Zheng --- include/linux/pgtable.h | 14 +++++ mm/Makefile | 1 + mm/huge_memory.c | 3 + mm/internal.h | 14 +++++ mm/khugepaged.c | 30 +++++++-- mm/memory.c | 2 + mm/pt_reclaim.c | 131 ++++++++++++++++++++++++++++++++++++++++ 7 files changed, 189 insertions(+), 6 deletions(-) create mode 100644 mm/pt_reclaim.c diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 2a6a3cccfc367..572343650eb0f 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -447,6 +447,20 @@ static inline void arch_check_zapped_pmd(struct vm_are= a_struct *vma, } #endif =20 +#ifndef arch_flush_tlb_before_set_huge_page +static inline void arch_flush_tlb_before_set_huge_page(struct mm_struct *m= m, + unsigned long addr) +{ +} +#endif + +#ifndef arch_flush_tlb_before_set_pte_page +static inline void arch_flush_tlb_before_set_pte_page(struct mm_struct *mm, + unsigned long addr) +{ +} +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, diff --git a/mm/Makefile b/mm/Makefile index ab5ed56c5c033..8bec86469c1d5 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -145,3 +145,4 @@ obj-$(CONFIG_GENERIC_IOREMAP) +=3D ioremap.o obj-$(CONFIG_SHRINKER_DEBUG) +=3D shrinker_debug.o obj-$(CONFIG_EXECMEM) +=3D execmem.o obj-$(CONFIG_TMPFS_QUOTA) +=3D shmem_quota.o +obj-$(CONFIG_PT_RECLAIM) +=3D pt_reclaim.o diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 697fcf89f975b..0afbb1e45cdac 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -999,6 +999,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct v= m_fault *vmf, folio_add_new_anon_rmap(folio, vma, haddr, RMAP_EXCLUSIVE); folio_add_lru_vma(folio, vma); pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable); + arch_flush_tlb_before_set_huge_page(vma->vm_mm, haddr); set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry); update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); @@ -1066,6 +1067,7 @@ static void set_huge_zero_folio(pgtable_t pgtable, st= ruct mm_struct *mm, entry =3D mk_pmd(&zero_folio->page, vma->vm_page_prot); entry =3D pmd_mkhuge(entry); pgtable_trans_huge_deposit(mm, pmd, pgtable); + arch_flush_tlb_before_set_huge_page(mm, haddr); set_pmd_at(mm, haddr, pmd, entry); mm_inc_nr_ptes(mm); } @@ -1173,6 +1175,7 @@ static void insert_pfn_pmd(struct vm_area_struct *vma= , unsigned long addr, pgtable =3D NULL; } =20 + arch_flush_tlb_before_set_huge_page(mm, addr); set_pmd_at(mm, addr, pmd, entry); update_mmu_cache_pmd(vma, addr, pmd); =20 diff --git a/mm/internal.h b/mm/internal.h index dfc992de01115..09bd1cee7a523 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1441,4 +1441,18 @@ static inline bool try_to_accept_memory(struct zone = *zone, unsigned int order) } #endif /* CONFIG_UNACCEPTED_MEMORY */ =20 +#ifdef CONFIG_PT_RECLAIM +void try_to_reclaim_pgtables(struct mmu_gather *tlb, struct vm_area_struct= *vma, + unsigned long start_addr, unsigned long end_addr, + struct zap_details *details); +#else +static inline void try_to_reclaim_pgtables(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long start_addr, + unsigned long end_addr, + struct zap_details *details) +{ +} +#endif /* CONFIG_PT_RECLAIM */ + #endif /* __MM_INTERNAL_H */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 91b93259ee214..ffd3963b1c3d1 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1598,7 +1598,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, if (userfaultfd_armed(vma) && !(vma->vm_flags & VM_SHARED)) pml =3D pmd_lock(mm, pmd); =20 - start_pte =3D pte_offset_map_nolock(mm, pmd, NULL, haddr, &ptl); + start_pte =3D pte_offset_map_nolock(mm, pmd, &pgt_pmd, haddr, &ptl); if (!start_pte) /* mmap_lock + page lock should prevent this */ goto abort; if (!pml) @@ -1606,6 +1606,11 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, un= signed long addr, else if (ptl !=3D pml) spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); =20 + /* pmd entry may be changed by others */ + if (unlikely(IS_ENABLED(CONFIG_PT_RECLAIM) && !pml && + !pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) + goto abort; + /* step 2: clear page table and adjust rmap */ for (i =3D 0, addr =3D haddr, pte =3D start_pte; i < HPAGE_PMD_NR; i++, addr +=3D PAGE_SIZE, pte++) { @@ -1651,6 +1656,11 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, un= signed long addr, /* step 4: remove empty page table */ if (!pml) { pml =3D pmd_lock(mm, pmd); + if (unlikely(IS_ENABLED(CONFIG_PT_RECLAIM) && + !pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) { + spin_unlock(pml); + goto pmd_change; + } if (ptl !=3D pml) spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); } @@ -1682,6 +1692,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, pte_unmap_unlock(start_pte, ptl); if (pml && pml !=3D ptl) spin_unlock(pml); +pmd_change: if (notified) mmu_notifier_invalidate_range_end(&range); drop_folio: @@ -1703,6 +1714,7 @@ static void retract_page_tables(struct address_space = *mapping, pgoff_t pgoff) spinlock_t *pml; spinlock_t *ptl; bool skipped_uffd =3D false; + pte_t *pte; =20 /* * Check vma->anon_vma to exclude MAP_PRIVATE mappings that @@ -1738,11 +1750,17 @@ static void retract_page_tables(struct address_spac= e *mapping, pgoff_t pgoff) addr, addr + HPAGE_PMD_SIZE); mmu_notifier_invalidate_range_start(&range); =20 + pte =3D pte_offset_map_nolock(mm, pmd, &pgt_pmd, addr, &ptl); + if (!pte) + goto skip; + pml =3D pmd_lock(mm, pmd); - ptl =3D pte_lockptr(mm, pmd); if (ptl !=3D pml) spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); =20 + if (unlikely(IS_ENABLED(CONFIG_PT_RECLAIM) && + !pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) + goto unlock_skip; /* * Huge page lock is still held, so normally the page table * must remain empty; and we have already skipped anon_vma @@ -1758,11 +1776,11 @@ static void retract_page_tables(struct address_spac= e *mapping, pgoff_t pgoff) pgt_pmd =3D pmdp_collapse_flush(vma, addr, pmd); pmdp_get_lockless_sync(); } - +unlock_skip: + pte_unmap_unlock(pte, ptl); if (ptl !=3D pml) - spin_unlock(ptl); - spin_unlock(pml); - + spin_unlock(pml); +skip: mmu_notifier_invalidate_range_end(&range); =20 if (!skipped_uffd) { diff --git a/mm/memory.c b/mm/memory.c index fef1e425e4702..a8108451e4dac 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -423,6 +423,7 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, unsi= gned long addr, spinlock_t *ptl =3D pmd_lock(mm, pmd); =20 if (likely(pmd_none(*pmd))) { /* Has another populated it ? */ + arch_flush_tlb_before_set_pte_page(mm, addr); mm_inc_nr_ptes(mm); /* * Ensure all pte setup (eg. pte page lock and page clearing) are @@ -1931,6 +1932,7 @@ void zap_page_range_single(struct vm_area_struct *vma= , unsigned long address, * could have been expanded for hugetlb pmd sharing. */ unmap_single_vma(&tlb, vma, address, end, details, false); + try_to_reclaim_pgtables(&tlb, vma, address, end, details); mmu_notifier_invalidate_range_end(&range); tlb_finish_mmu(&tlb); hugetlb_zap_end(vma, details); diff --git a/mm/pt_reclaim.c b/mm/pt_reclaim.c new file mode 100644 index 0000000000000..e375e7f2059f8 --- /dev/null +++ b/mm/pt_reclaim.c @@ -0,0 +1,131 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include + +#include "internal.h" + +/* + * Locking: + * - already held the mmap read lock to traverse the pgtable + * - use pmd lock for clearing pmd entry + * - use pte lock for checking empty PTE page, and release it after clear= ing + * pmd entry, then we can capture the changed pmd in pte_offset_map_loc= k() + * etc after holding this pte lock. Thanks to this, we don't need to ho= ld the + * rmap-related locks. + * - users of pte_offset_map_lock() etc all expect the PTE page to be sta= ble by + * using rcu lock, so PTE pages should be freed by RCU. + */ +static int reclaim_pgtables_pmd_entry(pmd_t *pmd, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + struct mm_struct *mm =3D walk->mm; + struct mmu_gather *tlb =3D walk->private; + pte_t *start_pte, *pte; + pmd_t pmdval; + spinlock_t *pml =3D NULL, *ptl; + int i; + + start_pte =3D pte_offset_map_nolock(mm, pmd, &pmdval, addr, &ptl); + if (!start_pte) + return 0; + + pml =3D pmd_lock(mm, pmd); + if (ptl !=3D pml) + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + + if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) + goto out_ptl; + + /* Check if it is empty PTE page */ + for (i =3D 0, pte =3D start_pte; i < PTRS_PER_PTE; i++, pte++) { + if (!pte_none(ptep_get(pte))) + goto out_ptl; + } + pte_unmap(start_pte); + + pmd_clear(pmd); + if (ptl !=3D pml) + spin_unlock(ptl); + spin_unlock(pml); + + /* + * NOTE: + * In order to reuse mmu_gather to batch flush tlb and free PTE pages, + * here tlb is not flushed before pmd lock is unlocked. This may + * result in the following two situations: + * + * 1) Userland can trigger page fault and fill a huge page, which will + * cause the existence of small size TLB and huge TLB for the same + * address. + * + * 2) Userland can also trigger page fault and fill a PTE page, which + * will cause the existence of two small size TLBs, but the PTE + * page they map are different. + * + * Some CPUs do not allow these, to solve this, we can define + * arch_flush_tlb_before_set_{huge|pte}_page to detect this case and + * flush TLB before filling a huge page or a PTE page in page fault + * path. + */ + pte_free_tlb(tlb, pmd_pgtable(pmdval), addr); + mm_dec_nr_ptes(mm); + + return 0; + +out_ptl: + pte_unmap_unlock(start_pte, ptl); + if (pml !=3D ptl) + spin_unlock(pml); + + return 0; +} + +static const struct mm_walk_ops reclaim_pgtables_walk_ops =3D { + .pmd_entry =3D reclaim_pgtables_pmd_entry, + .walk_lock =3D PGWALK_RDLOCK, +}; + +void try_to_reclaim_pgtables(struct mmu_gather *tlb, struct vm_area_struct= *vma, + unsigned long start_addr, unsigned long end_addr, + struct zap_details *details) +{ + unsigned long start =3D max(vma->vm_start, start_addr); + unsigned long end; + + if (start >=3D vma->vm_end) + return; + end =3D min(vma->vm_end, end_addr); + if (end <=3D vma->vm_start) + return; + + /* Skip hugetlb case */ + if (is_vm_hugetlb_page(vma)) + return; + + /* Leave this to the THP path to handle */ + if (vma->vm_flags & VM_HUGEPAGE) + return; + + /* userfaultfd_wp case may reinstall the pte entry, also skip */ + if (userfaultfd_wp(vma)) + return; + + /* + * For private file mapping, the COW-ed page is an anon page, and it + * will not be zapped. For simplicity, skip the all writable private + * file mapping cases. + */ + if (details && !vma_is_anonymous(vma) && + !(vma->vm_flags & VM_MAYSHARE) && + (vma->vm_flags & VM_WRITE)) + return; + + start =3D ALIGN(start, PMD_SIZE); + end =3D ALIGN_DOWN(end, PMD_SIZE); + if (end - start < PMD_SIZE) + return; + + walk_page_range_vma(vma, start, end, &reclaim_pgtables_walk_ops, tlb); +} --=20 2.20.1 From nobody Sun Feb 8 17:37:40 2026 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF29E15ECF3 for ; Mon, 5 Aug 2024 12:56:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862572; cv=none; b=hu2rFxYWyrDbYMVIL4otAD3BNJNZFB09j6nd1T1tvbDf8irMEeOFnsdUlWf1g/edYkTb8LFQwHzcGtKNExqC8MRwa0CPSCqamRnq5bBQDdMylU1nPMS1y4mhVUs2SFroyOm2kgerWD4c2n2nx8ZN7fa+pu9T8iYjE9O/sv/o4L0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862572; c=relaxed/simple; bh=ZiCKPKvBLRcnxeUC/BZby0xYxdKkfsXHSVFooIO/IEY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=dHw66vAfMMkCO+fPziPtbUlciSYqItj5R04a/qRYEYH59jjPqLX3MiCfN6TfJzjZN4lC4S8mHW+bxjO/apQu5QMO+imW6V9BXqJjHKr/WYAAZ6Qhm2lbbEHokeqXc3Z6rU2664zrtECnAF/pUUW7EnH699qafYtAOGe0KLUg8M4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=HHkE/2IZ; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="HHkE/2IZ" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1fc5bc8d23cso6126155ad.1 for ; Mon, 05 Aug 2024 05:56:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1722862570; x=1723467370; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zvOCNxNaR9g68beFCkVQN7ot2Umf+8WTdO0qJ5nAYzo=; b=HHkE/2IZW0HHyo0pa9EBd5KptQ6QR5Ny2cyOsom7f8MGQcgaEQJoObsVZCy5ReO0Wh fQw6njQE3YRCHMNlN3q5YlZhkfcwZ8SgxGkA4N1FpSVf4XyBsd0G2lXoOMLglJT5sgFT yvC6iiUbLR7bKLp8dh24RFyjgNYNNg3ubrGtaB7FGYm/qgyLrqE9Hnnv6GmNj/+J4xe/ zjeEAiAhl+3egQRhjALqhC3oPd9VIZzUXf9BGCLAoqg0taDA6eSxeAX74zi7QaCZh1uM 5KPSBRFZrqBzLVkGoKprKIwfSj+GuxR4LwkURLQJCqF0EWZv4AIQLjmF79+82fm+PCel XpDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722862570; x=1723467370; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zvOCNxNaR9g68beFCkVQN7ot2Umf+8WTdO0qJ5nAYzo=; b=mZmDm/4T78T6cQ6Z1N5FCCPrBq88JPZAVgWVRQXkaj0wpIAzNTf7pJJl9DD7a9Zcy9 JUN1WwaU7NfeisDRm+JmiNWMJOyEXrEeHsCfoTky0AlspgwEb8yDqxot3oKWEPHdun/G 4tKeB27FrvOaprWZPkmdr4tAcAVa4ZIUDAyVYUWIGY0kbSh1u4u1gKR+IVPH800X9pQO gZ2q6uP2WpWGqR6zPKhJpfSJnM3r/Kj7pcAtvK08k/8M/UFBRn+WamWbekUa2ZQB1Nql bFexNEu/7oeqXw/GNiwMMNdiJEKG2TXaGxsDxXm0RoS0qeX38AtQWwf1jec+2Ep3Ce2X 1Bow== X-Forwarded-Encrypted: i=1; AJvYcCVynNYWlgL5to14M9y5BOmhb9fnZu+DGBjtqfT8zt7oThTkbF/lZPVK2dUCuCnCzIl7m6sS61/ryMDAXNArdqqcWwnT17jer/RZj+gJ X-Gm-Message-State: AOJu0YwgmvBvlktxNq5hmUwKF8Ye6HLWmj7WGZVK+mpXLnxKNT+frhoA lkc5H5kKtJuzpD/CB27PW4H/MUXoUF1F9UnADg6h3TntrfXXCgiPDpNAn5bV4tI= X-Google-Smtp-Source: AGHT+IGEkzD+QFj3cmoP4z9fngv/zsFb+TuS1tTAyot/td9F6PCtAysO8/m5eTxJBHfpd8BN64UXhw== X-Received: by 2002:a05:6a21:6da8:b0:1c4:c007:51b7 with SMTP id adf61e73a8af0-1c69965d0bfmr10731013637.6.1722862570294; Mon, 05 Aug 2024 05:56:10 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7106ecfaf1asm5503030b3a.142.2024.08.05.05.56.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Aug 2024 05:56:10 -0700 (PDT) From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [RFC PATCH v2 5/7] x86: mm: free page table pages by RCU instead of semi RCU Date: Mon, 5 Aug 2024 20:55:09 +0800 Message-Id: <9a3deedc55947030db20a5ef8aca7b2741df2d9d.1722861064.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, the page table pages will be freed by semi RCU, that is: - batch table freeing: asynchronous free by RCU - single table freeing: IPI + synchronous free In this way, the page table can be lockless traversed by disabling IRQ in paths such as fast GUP. But this is not enough to free the empty PTE page table pages in paths other that munmap and exit_mmap path, because IPI cannot be synchronized with rcu_read_lock() in pte_offset_map{_lock}(). In preparation for supporting empty PTE page table pages reclaimation, let single table also be freed by RCU like batch table freeing. Then we can also use pte_offset_map() etc to prevent PTE page from being freed. Like pte_free_defer(), we can also safely use ptdesc->pt_rcu_head to free the page table pages: - The pt_rcu_head is unioned with pt_list and pmd_huge_pte. - For pt_list, it is used to manage the PGD page in x86. Fortunately tlb_remove_table() will not be used for free PGD pages, so it is safe to use pt_rcu_head. - For pmd_huge_pte, we will do zap_deposited_table() before freeing the PMD page, so it is also safe. Signed-off-by: Qi Zheng --- arch/x86/include/asm/tlb.h | 19 +++++++++++++++++++ arch/x86/kernel/paravirt.c | 7 +++++++ arch/x86/mm/pgtable.c | 10 +++++++++- mm/mmu_gather.c | 9 ++++++++- 4 files changed, 43 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h index 580636cdc257b..e223b53a8b190 100644 --- a/arch/x86/include/asm/tlb.h +++ b/arch/x86/include/asm/tlb.h @@ -34,4 +34,23 @@ static inline void __tlb_remove_table(void *table) free_page_and_swap_cache(table); } =20 +#ifdef CONFIG_PT_RECLAIM +static inline void __tlb_remove_table_one_rcu(struct rcu_head *head) +{ + struct page *page; + + page =3D container_of(head, struct page, rcu_head); + free_page_and_swap_cache(page); +} + +static inline void __tlb_remove_table_one(void *table) +{ + struct page *page; + + page =3D table; + call_rcu(&page->rcu_head, __tlb_remove_table_one_rcu); +} +#define __tlb_remove_table_one __tlb_remove_table_one +#endif /* CONFIG_PT_RECLAIM */ + #endif /* _ASM_X86_TLB_H */ diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 5358d43886adc..199b9a3813b4a 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -60,10 +60,17 @@ void __init native_pv_lock_init(void) static_branch_disable(&virt_spin_lock_key); } =20 +#ifndef CONFIG_PT_RECLAIM static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) { tlb_remove_page(tlb, table); } +#else +static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) +{ + tlb_remove_table(tlb, table); +} +#endif =20 struct static_key paravirt_steal_enabled; struct static_key paravirt_steal_rq_enabled; diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index f5931499c2d6b..ea8522289c93d 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -19,12 +19,20 @@ EXPORT_SYMBOL(physical_mask); #endif =20 #ifndef CONFIG_PARAVIRT +#ifndef CONFIG_PT_RECLAIM static inline void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) { tlb_remove_page(tlb, table); } -#endif +#else +static inline +void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) +{ + tlb_remove_table(tlb, table); +} +#endif /* !CONFIG_PT_RECLAIM */ +#endif /* !CONFIG_PARAVIRT */ =20 gfp_t __userpte_alloc_gfp =3D GFP_PGTABLE_USER | PGTABLE_HIGHMEM; =20 diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 99b3e9408aa0f..d948479ca09e6 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -311,10 +311,17 @@ static inline void tlb_table_invalidate(struct mmu_ga= ther *tlb) } } =20 +#ifndef __tlb_remove_table_one +static inline void __tlb_remove_table_one(void *table) +{ + __tlb_remove_table(table); +} +#endif + static void tlb_remove_table_one(void *table) { tlb_remove_table_sync_one(); - __tlb_remove_table(table); + __tlb_remove_table_one(table); } =20 static void tlb_table_flush(struct mmu_gather *tlb) --=20 2.20.1 From nobody Sun Feb 8 17:37:40 2026 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DEBE515F323 for ; Mon, 5 Aug 2024 12:56:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862577; cv=none; b=BBDOIWFVbcDFgYVbxahAjpwRf079pnJkvwYaEOp+47PwIiuvSpclk7wI3Np12a4eLkn7AfzvF725A9Q6Qp8UhEN17guL62UDZw8tlvCT8XDF3cc/VaS8CmonmPxljqw3Vki74fshp6NGNJ3clO+O6DWjQcMVHYRxs4L+m+1oex0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862577; c=relaxed/simple; bh=avB8w0YfcXkyDQgGb/pTXo2SZkD2aFY9wudx96Zg4TA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=B3ZUbLxBI+srJgW9Sh9safOwFyHYEZ6yoQJN7TIT6u1PED8+R/MGNGB8SaukUoTYmDCroWRFAVojXjZdwhXpCtufW6JeJ8Qm9sAY/tvEDFrOWOl5H8fS/OPuHGIQxDHQcaYTZN/CXl/3oKGl3PQnrA+zLtDF5CQNJd4+GTDYug8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=I38LAKVI; arc=none smtp.client-ip=209.85.210.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="I38LAKVI" Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-70d2357df99so269748b3a.1 for ; Mon, 05 Aug 2024 05:56:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1722862575; x=1723467375; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XC0Rcbjdh8ylucXC7ADaLu0MmCwUQ8ESfO/B+g+YjaQ=; b=I38LAKVIsvYu3UJQHECma+wBvHT3NDksVnruyBmeL9bVleLEN2l14I4qSqzwioa+QJ 64gLHjrRYBBR9MFzL5FYwDH8skxshQ7AA7/o7cVSd0tb1hbhXL7lIWjwh+brClR4DTwT FENgTl96ZAH1N2NugY2j9F9k6vUOhJ7ccH2vkanaNndxOp9jalQja+HJk/DNqH1ZKuja qKW5y5AkFgSKBLqHWFGQXMWuo9+8fsLI3Oj5DAvrviyJ2z57vEsmJ+AnVScv6pPxysx4 IB+0lGM0wb6vNjgvsH8o/kpswzPrzxxG5m5J1ve7Z2ck5+h+mlcwixGa7apjYSph0+F2 Q63Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722862575; x=1723467375; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XC0Rcbjdh8ylucXC7ADaLu0MmCwUQ8ESfO/B+g+YjaQ=; b=qIMUz5MvKFXIlJQ8KPtSwH7ambi8cHeCXfqs8KIwd78zFfofkoYgH1WV3P6tssnENJ acJP4rEFH8u3mrP2h4pR8d9DGeDg5njplKLbFkwecXkFygL+Xzwn/koiSRtqhvSzJSt2 vygnwhJ8pDRoFFXPW6outES1DE901PJ4pxdUoNP6YJavySChiBHUfqmU/45fkjUra68V iOTzZ/l1mUcld5/f02sW/MBCMNv/Vaf2+XUHPF9ylIRp3NNWAXfVasdH4W6uEpyM0CFX l/Fbz4xdrO0hRtveGISnR9/FLMyEJKiFq2FCiX6a+zS3m5OMIlnGmM2dU+dDnsG1URoi f3Rg== X-Forwarded-Encrypted: i=1; AJvYcCXk/3oTNhp2Kw8Q0ZUgwHRkj9X/oOeSw44NMKQI/PORs+w1eDbIMzPSbGFYXl6hzBfVgx1hoarBhHyACO0=@vger.kernel.org X-Gm-Message-State: AOJu0YzJAMyNaOaYXx1agxndgZe4ERQF4eDgpyOEaI61RmDGBVndkvuW 5cgfKZuN01pBnEd8D+d8qsC4wrLL8NHiOQzZup3kwC5uoCZBhboUDUQTW3Ib0HQ= X-Google-Smtp-Source: AGHT+IF/l8kL7sdJ/NhN/cq8F6a1ZTnpgYnj4ROM1RQNcS/ED4bxJRGoEHg/fBeOyRXw7ShGlIlTUA== X-Received: by 2002:a05:6a00:3a20:b0:706:aadc:b0a7 with SMTP id d2e1a72fcca58-7106cf94aa4mr8039553b3a.1.1722862575131; Mon, 05 Aug 2024 05:56:15 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7106ecfaf1asm5503030b3a.142.2024.08.05.05.56.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Aug 2024 05:56:14 -0700 (PDT) From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [RFC PATCH v2 6/7] x86: mm: define arch_flush_tlb_before_set_huge_page Date: Mon, 5 Aug 2024 20:55:10 +0800 Message-Id: <1c8bee0c868c1e67ea02a6fa49225b00503b5436.1722861064.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When we use mmu_gather to batch flush tlb and free PTE pages, the TLB is not flushed before pmd lock is unlocked. This may result in the following two situations: 1) Userland can trigger page fault and fill a huge page, which will cause the existence of small size TLB and huge TLB for the same address. 2) Userland can also trigger page fault and fill a PTE page, which will cause the existence of two small size TLBs, but the PTE page they map are different. According to Intel's TLB Application note (317080), some CPUs of x86 do not allow the 1) case, so define arch_flush_tlb_before_set_huge_page to detect and fix this issue. Signed-off-by: Qi Zheng --- arch/x86/include/asm/pgtable.h | 6 ++++++ arch/x86/mm/pgtable.c | 13 +++++++++++++ 2 files changed, 19 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index e39311a89bf47..f93d964ab6a3e 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1668,6 +1668,12 @@ void arch_check_zapped_pte(struct vm_area_struct *vm= a, pte_t pte); #define arch_check_zapped_pmd arch_check_zapped_pmd void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd); =20 +#ifdef CONFIG_PT_RECLAIM +#define arch_flush_tlb_before_set_huge_page arch_flush_tlb_before_set_huge= _page +void arch_flush_tlb_before_set_huge_page(struct mm_struct *mm, + unsigned long addr); +#endif + #ifdef CONFIG_XEN_PV #define arch_has_hw_nonleaf_pmd_young arch_has_hw_nonleaf_pmd_young static inline bool arch_has_hw_nonleaf_pmd_young(void) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index ea8522289c93d..7e14cae819edd 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -934,3 +934,16 @@ void arch_check_zapped_pmd(struct vm_area_struct *vma,= pmd_t pmd) VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && pmd_shstk(pmd)); } + +#ifdef CONFIG_PT_RECLAIM +void arch_flush_tlb_before_set_huge_page(struct mm_struct *mm, + unsigned long addr) +{ + if (atomic_read(&mm->tlb_flush_pending)) { + unsigned long start =3D ALIGN_DOWN(addr, PMD_SIZE); + unsigned long end =3D start + PMD_SIZE; + + flush_tlb_mm_range(mm, start, end, PAGE_SHIFT, false); + } +} +#endif /* CONFIG_PT_RECLAIM */ --=20 2.20.1 From nobody Sun Feb 8 17:37:40 2026 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39F8B15F3E8 for ; Mon, 5 Aug 2024 12:56:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862581; cv=none; b=n+OsqIxYi1o8U9F/3/7rgStdWKo6hGtrRor8yGwfmcJzUc35EDfLO9jubJt3eED/v9cXafoITrXDgyhCGPFJZXFEU5vxwlPbKlmUB6T3+z9nh+mO03B6heE+DvV2VPWfswOVzuPwrDU6YYzGvVqFre5pdwS9Tn5nAiEzw4zS26Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722862581; c=relaxed/simple; bh=LrVsRNvxFEtcyRcqLpy6K+Q65YiOJ7LBOCCzVy0NSbA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qnYnDkeRjNZFK4vUotT24wKkFoTFz3v4AzfffZKfFeRyNfowLIcFdDoXsvHQPt8MEsh3LgRDbJdP8zmNKei8RHw7QRlKOmEHLer/geRY1lICWaCAQh2t0y+lAARJbTZUoExCod1WNgqVNoDrJMQq7FdE3Ln2dGqfhMU1GKsHNDI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=GY8MehYV; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="GY8MehYV" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-70d2879bfb0so1012854b3a.0 for ; Mon, 05 Aug 2024 05:56:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1722862579; x=1723467379; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PiQJiB5+YTgWJEkf+MbrVUnf+za6HpDp7AR+2atGBrA=; b=GY8MehYVOTLaU+DF4hvJF7yRiGTV44TaIen0XJJxaxAQ4MrDeeSpL2rPIcVzD+qRVa bTcVOvMS9VAFMlLx9S428G9h0k+yXXVaaMkZqFfFfnUmxIn16wM3iR74MqKNUriVCJdc f4f+pw7qDhNj3HHxOauiRaSyFRGTdJA89GyE9J30KSkbsgqKlnaC1wOPLh4r80W/Xrh6 NQLOmV9Q3bMxwaaw+6OeC7TaAcCmmH0EG4++58ysH5dbmK2m4p/Mt4OZHwPN340/ycym yq5Pb+zOmfSuG5lmclpge+iGhSHmSr9ZxO2ryqyIClPGftdwz/ka69hBoWo3EBFCEMgk C+Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722862579; x=1723467379; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PiQJiB5+YTgWJEkf+MbrVUnf+za6HpDp7AR+2atGBrA=; b=j+Y499KmKSWKYWE1yy2HFZFXPe4/A0i3hiaDD3Wy1+7YDPzIsnJi6BsYspfo9c7VTO 1H5u1xyoQDG52uvHM8ZtP30zZNBXcTEenM3BnP/85hrQmbZjIG3kQ4TfgqJgDxYoHhz5 sitPWoFtAgMcXRw1bRf569Dmc1pNviDee/dG9WAYvrYOBcF2ytlaMWFhzNi2T3mjkgVn Nz2xev2E6ZTGpJDYaHpwTG6irxw8tIXurLlm+KTsuEo/O7beUBB3KXZZO2VTIynQwgI0 3oahrjDHlznX/smvI88zmsnSW1/YVVSKpfTJadArzWC0GgGBDHYF1gGR1sLfIEaiUqgV oXfQ== X-Forwarded-Encrypted: i=1; AJvYcCU1p5By9vMGlE3yAWn17IcrM1M3vl84RpMuRCfsEikJnb/+u8Yc4ZzOAxwe/kHKbaFiHZ8FXNhaoMXZRYCasIcA3fNLTcWMyDsKhocs X-Gm-Message-State: AOJu0Yyd4PsZHbhKnXD2s5Kl2Fszrn8x81F7ET9yx/70/AIcFRjpGoVS ynLHtGXkp4Se4Q5kXEK9bi4gytj/CgD9Q67atZx6FX/kxnZkP3cN2a3M0ucQ0Wo= X-Google-Smtp-Source: AGHT+IFVoFNV/H2C9dHSZ1PBN5mIC2cyRnjqaIeUe5zi3ii7ROXtPfC7wvdYAPcekd2YlYQGHcbP5g== X-Received: by 2002:a05:6a20:9188:b0:1c4:c8ef:8e68 with SMTP id adf61e73a8af0-1c69966d186mr10728067637.9.1722862579692; Mon, 05 Aug 2024 05:56:19 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7106ecfaf1asm5503030b3a.142.2024.08.05.05.56.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Aug 2024 05:56:19 -0700 (PDT) From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [RFC PATCH v2 7/7] x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64 Date: Mon, 5 Aug 2024 20:55:11 +0800 Message-Id: <0949a17afe11cf108b8a09a538c00c93cdc5507e.1722861064.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now, x86 has fully supported the CONFIG_PT_RECLAIM feature, and reclaiming PTE pages is profitable only on 64-bit systems, so select ARCH_SUPPORTS_PT_RECLAIM if X86_64. Signed-off-by: Qi Zheng --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2f611ffd0c9a4..fff6a7e6ea1de 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -316,6 +316,7 @@ config X86 select FUNCTION_ALIGNMENT_4B imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE + select ARCH_SUPPORTS_PT_RECLAIM if X86_64 =20 config INSTRUCTION_DECODER def_bool y --=20 2.20.1