From nobody Sun Feb 8 11:21:41 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A5C8EB64DD for ; Fri, 21 Jul 2023 09:42:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231439AbjGUJl6 (ORCPT ); Fri, 21 Jul 2023 05:41:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231168AbjGUJlq (ORCPT ); Fri, 21 Jul 2023 05:41:46 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40D1630F1 for ; Fri, 21 Jul 2023 02:41:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689932495; x=1721468495; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AU+Bax7o8AGRnodMI5F/Lc+z+q2VSKWl4dvqmzGnJLE=; b=iZMfpCQs0IQzGDYY5PTn/Emw2jQjTzz9N9j8O1n1BgCpio0lOgQuzTMb JcodfBT+obN1a6x7YdnLkPGE5i++HaYfnNJFkk+x4FMt7O++IK6lbMFvJ xjQDL1N88rp5CrhoxucYfqHme/NBv3CDcNw7gAz3WgDN8mg+WphcxbBGs kp3aPxjCLUTno9S8J0Ha/+NpRwWHFtrYWVHoknrX3TiERjat8lKofER/L j/Fl6EWOZcUaoJ8d53fWbQ50PlekX9tCdvhsj5Q64K6JnqPLUe1/uxBPD ACx1p6aEIF06uFHdmGy+wh5amNrNtusHFU3rE03AOsfYpWPaKuoRRCe9B w==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="397874321" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="397874321" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2023 02:40:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="898661109" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="898661109" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by orsmga005.jf.intel.com with ESMTP; 21 Jul 2023 02:40:46 -0700 From: Yin Fengwei To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, minchan@kernel.org, yuzhao@google.com, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Cc: fengwei.yin@intel.com Subject: [RFC PATCH v2 1/4] madvise: not use mapcount() against large folio for sharing check Date: Fri, 21 Jul 2023 17:40:40 +0800 Message-Id: <20230721094043.2506691-2-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230721094043.2506691-1-fengwei.yin@intel.com> References: <20230721094043.2506691-1-fengwei.yin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The commit 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios") replaced the page_mapcount() with folio_mapcount() to check whether the folio is shared by other mapping. But it's not correct for large folio. folio_mapcount() returns the total mapcount of large folio which is not suitable to detect whether the folio is shared. Use folio_estimated_sharers() which returns a estimated number of shares. That means it's not 100% correct. But it should be OK for madvise case here. Signed-off-by: Yin Fengwei Reviewed-by: Yu Zhao --- mm/madvise.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 38382a5d1e39..f12933ebcc24 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -383,7 +383,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, folio =3D pfn_folio(pmd_pfn(orig_pmd)); =20 /* Do not interfere with other mappings of this folio */ - if (folio_mapcount(folio) !=3D 1) + if (folio_estimated_sharers(folio) !=3D 1) goto huge_unlock; =20 if (pageout_anon_only_filter && !folio_test_anon(folio)) @@ -459,7 +459,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (folio_test_large(folio)) { int err; =20 - if (folio_mapcount(folio) !=3D 1) + if (folio_estimated_sharers(folio) !=3D 1) break; if (pageout_anon_only_filter && !folio_test_anon(folio)) break; @@ -682,7 +682,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, if (folio_test_large(folio)) { int err; =20 - if (folio_mapcount(folio) !=3D 1) + if (folio_estimated_sharers(folio) !=3D 1) break; if (!folio_trylock(folio)) break; --=20 2.39.2 From nobody Sun Feb 8 11:21:41 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63027EB64DC for ; Fri, 21 Jul 2023 09:42:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230268AbjGUJmG (ORCPT ); Fri, 21 Jul 2023 05:42:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231233AbjGUJlt (ORCPT ); Fri, 21 Jul 2023 05:41:49 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B45F03A87 for ; Fri, 21 Jul 2023 02:41:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689932496; x=1721468496; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cu8lT3oOEGsTcv6kJqkYTpbtDaklUJpbsDBtNyUKlKw=; b=NPdJMpy4bZwjJPH6rtVAZkrw5vcqK7ajUQhRNip187YeZL9tdrlc65wi LzODMzFZ2BUKDDwxntXdesEyCkLMfcbOmP9vv+KUneBK7Rq1zQ5JZFbB4 QLoraSwsmD78ctusocsUV5OxANRBMmW4f8/dbCzhNTE3ipBxvZltFWU4E bfZd0lto9Hfyg7MPBQa5n7BqXg/ELhvvmVvIIgCFNk7GwBDbi+DNZjapj bAT992MXGIgGaLw1bMfzZikxqdjLfIDl2+PfKcY15AR6B7tFlZbtYXxIv vQ3ov33RAztM48nzSlFBQTrZ/aLlEWqluNIBSHe634PYX/zTynQmk7/yY A==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="397874364" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="397874364" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2023 02:41:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="971386942" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="971386942" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by fmsmga006.fm.intel.com with ESMTP; 21 Jul 2023 02:40:59 -0700 From: Yin Fengwei To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, minchan@kernel.org, yuzhao@google.com, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Cc: fengwei.yin@intel.com Subject: [RFC PATCH v2 2/4] madvise: Use notify-able API to clear and flush page table entries Date: Fri, 21 Jul 2023 17:40:41 +0800 Message-Id: <20230721094043.2506691-3-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230721094043.2506691-1-fengwei.yin@intel.com> References: <20230721094043.2506691-1-fengwei.yin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Currently, in function madvise_cold_or_pageout_pte_range(), the young bit of pte/pmd is cleared notify subscripter. Using notify-able API to make sure the subscripter is signaled about the young bit clearing. Signed-off-by: Yin Fengwei --- mm/madvise.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index f12933ebcc24..b236e201a738 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -403,14 +403,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pm= d, return 0; } =20 - if (pmd_young(orig_pmd)) { - pmdp_invalidate(vma, addr, pmd); - orig_pmd =3D pmd_mkold(orig_pmd); - - set_pmd_at(mm, addr, pmd, orig_pmd); - tlb_remove_pmd_tlb_entry(tlb, pmd, addr); - } - + pmdp_clear_flush_young_notify(vma, addr, pmd); folio_clear_referenced(folio); folio_test_clear_young(folio); if (folio_test_active(folio)) @@ -496,14 +489,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pm= d, =20 VM_BUG_ON_FOLIO(folio_test_large(folio), folio); =20 - if (pte_young(ptent)) { - ptent =3D ptep_get_and_clear_full(mm, addr, pte, - tlb->fullmm); - ptent =3D pte_mkold(ptent); - set_pte_at(mm, addr, pte, ptent); - tlb_remove_tlb_entry(tlb, pte, addr); - } - + ptep_clear_flush_young_notify(vma, addr, pte); /* * We are deactivating a folio for accelerating reclaiming. * VM couldn't reclaim the folio unless we clear PG_young. --=20 2.39.2 From nobody Sun Feb 8 11:21:41 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CF8FEB64DC for ; Fri, 21 Jul 2023 09:42:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231365AbjGUJma (ORCPT ); Fri, 21 Jul 2023 05:42:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230436AbjGUJmX (ORCPT ); Fri, 21 Jul 2023 05:42:23 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA84730F1 for ; Fri, 21 Jul 2023 02:41:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689932513; x=1721468513; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gTxpEd4Cqo7yQGmFIHWT43Boa+nqgiZNKOPooxf76ME=; b=l+wTlzo6b3+UIxz4PwTQt3RYIRyetpvCXB6Sc+PlETIss+h7nLYsYVXp kpP8kaZQSKLTPrmdm7P8ft6gj9+HnVdaq46pGKsycj337AfWYUnigfD+q lLBaQmVdWEXSVvgtYe532I8/NnHS0pBDbxn4Z4SfTJgpZjiq0lFMDunnP Kfn0gcmoRSaOnUxciHxH22/yBPnlgyOLGGnKunOftFpZ6QbmYH/PIDnJa qLOLDE4+gTUBhkISA3WEE+EMFJTlP4TA30G6Yuf64pf0bCBm+jNcQeib7 plhTXZziZVErP/sJFIFtunQvtbiEsOI5klY/+XAIycPGqRUfaBNUBDqTz Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="346575454" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="346575454" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2023 02:41:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="838480273" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="838480273" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by fmsmga002.fm.intel.com with ESMTP; 21 Jul 2023 02:41:13 -0700 From: Yin Fengwei To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, minchan@kernel.org, yuzhao@google.com, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Cc: fengwei.yin@intel.com Subject: [RFC PATCH v2 3/4] mm: add functions folio_in_range() and folio_within_vma() Date: Fri, 21 Jul 2023 17:40:42 +0800 Message-Id: <20230721094043.2506691-4-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230721094043.2506691-1-fengwei.yin@intel.com> References: <20230721094043.2506691-1-fengwei.yin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" It will be used to check whether the folio is mapped to specific VMA and whether the mapping address of folio is in the range. Also a helper function folio_within_vma() to check whether folio is in the range of vma based on folio_in_range(). Signed-off-by: Yin Fengwei --- mm/internal.h | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/mm/internal.h b/mm/internal.h index 483add0bfb28..c7dd15d8de3e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -585,6 +585,38 @@ extern long faultin_vma_page_range(struct vm_area_stru= ct *vma, bool write, int *locked); extern bool mlock_future_ok(struct mm_struct *mm, unsigned long flags, unsigned long bytes); + +static inline bool +folio_in_range(struct folio *folio, struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + pgoff_t pgoff, addr; + unsigned long vma_pglen =3D (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + + VM_WARN_ON_FOLIO(folio_test_ksm(folio), folio); + if (start < vma->vm_start) + start =3D vma->vm_start; + + if (end > vma->vm_end) + end =3D vma->vm_end; + + pgoff =3D folio_pgoff(folio); + + /* if folio start address is not in vma range */ + if (pgoff < vma->vm_pgoff || pgoff > vma->vm_pgoff + vma_pglen) + return false; + + addr =3D vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + + return ((addr >=3D start) && (addr + folio_size(folio) <=3D end)); +} + +static inline bool +folio_within_vma(struct folio *folio, struct vm_area_struct *vma) +{ + return folio_in_range(folio, vma, vma->vm_start, vma->vm_end); +} + /* * mlock_vma_folio() and munlock_vma_folio(): * should be called with vma's mmap_lock held for read or write, --=20 2.39.2 From nobody Sun Feb 8 11:21:41 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1962EEB64DC for ; Fri, 21 Jul 2023 09:42:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230049AbjGUJmk (ORCPT ); Fri, 21 Jul 2023 05:42:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231168AbjGUJmc (ORCPT ); Fri, 21 Jul 2023 05:42:32 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E606F3C15 for ; Fri, 21 Jul 2023 02:42:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689932522; x=1721468522; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Y7g87XRsxPlXPl3gcejHKFouz+BtPVkulKmHSjYpyg4=; b=ORWw3mM01xHKBC6eLy4RNDIuKIs11uVTECYRFvLmxExpiFibTbKTgeRm E+VBzTFmwmABflGJJiiJ9qfi6iIDOYEpGfKCVEYnu6qk6BFtxRiePmfH0 XTYQ8jPMldl8zvmG2RLb6H9BR25FPtPZrv4ZC2V9NnFP2zuyUgwpnB1Yh chzQ4n3r6so14OAl/OuVRsDuearPvaUEBjgtalMUawKKbpHs4y4+ANs7o wgQO+yWk6TODwPhU/QcgZiTLii2sdzByFkzmobBOmIr4mzej/Eu0CMtOM +jpbK0SPlKDZ7O8dCpcSRPDsCSUCb2xNfkA0pfgQnH9yZnw2CZoI/GWqo w==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="346575496" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="346575496" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2023 02:41:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="838480297" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="838480297" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by fmsmga002.fm.intel.com with ESMTP; 21 Jul 2023 02:41:26 -0700 From: Yin Fengwei To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, minchan@kernel.org, yuzhao@google.com, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Cc: fengwei.yin@intel.com Subject: [RFC PATCH v2 4/4] madvise: avoid trying to split large folio always in cold_pageout Date: Fri, 21 Jul 2023 17:40:43 +0800 Message-Id: <20230721094043.2506691-5-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230721094043.2506691-1-fengwei.yin@intel.com> References: <20230721094043.2506691-1-fengwei.yin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Current madvise_cold_or_pageout_pte_range() always tries to split large folio. Avoid trying to split large folio always by: - if large folio is in the request range, don't split it. Leave to page reclaim to decide whether the large folio needs be split. - if large folio crosses boundaries of request range, skip it if it's page cache. Try to split it if it's anonymous large folio. If failed to split it, just skip it. Invoke folio_referenced() to clear the A bit for large folio. As it will acquire pte lock, just do it after release pte lock. Signed-off-by: Yin Fengwei --- mm/internal.h | 10 +++++ mm/madvise.c | 118 +++++++++++++++++++++++++++++++++++--------------- 2 files changed, 93 insertions(+), 35 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index c7dd15d8de3e..cd1ff348d690 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -586,6 +586,16 @@ extern long faultin_vma_page_range(struct vm_area_stru= ct *vma, extern bool mlock_future_ok(struct mm_struct *mm, unsigned long flags, unsigned long bytes); =20 +static inline unsigned int +folio_op_size(struct folio *folio, pte_t pte, + unsigned long addr, unsigned long end) +{ + unsigned int nr; + + nr =3D folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte); + return min_t(unsigned int, nr, (end - addr) >> PAGE_SHIFT); +} + static inline bool folio_in_range(struct folio *folio, struct vm_area_struct *vma, unsigned long start, unsigned long end) diff --git a/mm/madvise.c b/mm/madvise.c index b236e201a738..71af370c3251 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -339,6 +339,23 @@ static inline bool can_do_file_pageout(struct vm_area_= struct *vma) file_permission(vma->vm_file, MAY_WRITE) =3D=3D 0; } =20 +static inline bool skip_cur_entry(struct folio *folio, bool pageout_anon_o= nly) +{ + if (!folio) + return true; + + if (folio_is_zone_device(folio)) + return true; + + if (!folio_test_lru(folio)) + return true; + + if (pageout_anon_only && !folio_test_anon(folio)) + return true; + + return false; +} + static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -352,7 +369,9 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, spinlock_t *ptl; struct folio *folio =3D NULL; LIST_HEAD(folio_list); + LIST_HEAD(reclaim_list); bool pageout_anon_only_filter; + unsigned long start =3D addr; =20 if (fatal_signal_pending(current)) return -EINTR; @@ -442,54 +461,90 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *p= md, continue; =20 folio =3D vm_normal_folio(vma, addr, ptent); - if (!folio || folio_is_zone_device(folio)) + if (skip_cur_entry(folio, pageout_anon_only_filter)) continue; =20 /* - * Creating a THP page is expensive so split it only if we - * are sure it's worth. Split it if we are only owner. + * Split large folio only if it's anonymous, cross the + * boundaries of request range and we are likely the + * only onwer. */ if (folio_test_large(folio)) { - int err; + int err, step; =20 if (folio_estimated_sharers(folio) !=3D 1) - break; - if (pageout_anon_only_filter && !folio_test_anon(folio)) - break; - if (!folio_trylock(folio)) - break; + continue; + if (folio_in_range(folio, vma, start, end)) + goto pageout_cold_folio; + if (!folio_test_anon(folio) || !folio_trylock(folio)) + continue; + folio_get(folio); + step =3D folio_op_size(folio, ptent, addr, end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(start_pte, ptl); start_pte =3D NULL; err =3D split_folio(folio); folio_unlock(folio); folio_put(folio); - if (err) - break; + start_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); if (!start_pte) break; arch_enter_lazy_mmu_mode(); - pte--; - addr -=3D PAGE_SIZE; - continue; - } =20 - /* - * Do not interfere with other mappings of this folio and - * non-LRU folio. - */ - if (!folio_test_lru(folio) || folio_mapcount(folio) !=3D 1) + /* split success. retry the same entry */ + if (!err) + step =3D 0; + + /* + * Split fails, jump over the whole folio to avoid + * grabbing same folio but fails to split it again + * and again. + */ + pte +=3D step - 1; + addr +=3D (step - 1) << PAGE_SHIFT; continue; + } =20 - if (pageout_anon_only_filter && !folio_test_anon(folio)) + /* Do not interfere with other mappings of this folio */ + if (folio_mapcount(folio) !=3D 1) continue; =20 VM_BUG_ON_FOLIO(folio_test_large(folio), folio); - ptep_clear_flush_young_notify(vma, addr, pte); + +pageout_cold_folio: + if (folio_isolate_lru(folio)) { + if (folio_test_unevictable(folio)) + folio_putback_lru(folio); + else + list_add(&folio->lru, &folio_list); + } + } + + if (start_pte) { + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(start_pte, ptl); + } + + while (!list_empty(&folio_list)) { + folio =3D lru_to_folio(&folio_list); + list_del(&folio->lru); + + if (folio_test_large(folio)) { + int refs; + unsigned long flags; + struct mem_cgroup *memcg =3D folio_memcg(folio); + + refs =3D folio_referenced(folio, 0, memcg, &flags); + if ((flags & VM_LOCKED) || (refs =3D=3D -1)) { + folio_putback_lru(folio); + continue; + } + } + /* * We are deactivating a folio for accelerating reclaiming. * VM couldn't reclaim the folio unless we clear PG_young. @@ -501,22 +556,15 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *p= md, if (folio_test_active(folio)) folio_set_workingset(folio); if (pageout) { - if (folio_isolate_lru(folio)) { - if (folio_test_unevictable(folio)) - folio_putback_lru(folio); - else - list_add(&folio->lru, &folio_list); - } - } else - folio_deactivate(folio); + list_add(&folio->lru, &reclaim_list); + } else { + folio_clear_active(folio); + folio_putback_lru(folio); + } } =20 - if (start_pte) { - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(start_pte, ptl); - } if (pageout) - reclaim_pages(&folio_list); + reclaim_pages(&reclaim_list); cond_resched(); =20 return 0; --=20 2.39.2