From nobody Sun Feb 8 16:05:24 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDD6FEB64D9 for ; Wed, 12 Jul 2023 06:02:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231901AbjGLGCA (ORCPT ); Wed, 12 Jul 2023 02:02:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231908AbjGLGB6 (ORCPT ); Wed, 12 Jul 2023 02:01:58 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7CE1A1 for ; Tue, 11 Jul 2023 23:01:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689141716; x=1720677716; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=o/Aws3JMOGR03jlt+wsWp78WyeH6J7qvKifxMU03VXk=; b=HOWl/6IvgD7ycJi94VOA24NeNfDtM69WHwqB2Ql84ce5BfNHhJ3VbMYA 0EM8S5SFqbw4AhwxQ6Zg4XPLvHfXOkHJWOw0QaRXL5ZwZR+bjXsCIcxGP oU2s9LJAbznR4SEtcI+hBCFJVL+Iag2tN4qMoQMD9rKR3ZO29bUs7fDJp jAF7FoWtJr3JMcwy7D0GOCC+m2o2jk/jdDELlrXW/cW+fnY66BSU1zbQl qedxo5r54FVa2ntq2zqjF9CwhQrsgpoJbw63RWcvr4TjTRx4UFMOvF7zF G+/Nddhhm976b/qsVcgTEFi9OohPAIKZdNrGcd5bNGaFoRNtM4PCTfdan g==; X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="354715284" X-IronPort-AV: E=Sophos;i="6.01,198,1684825200"; d="scan'208";a="354715284" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2023 23:01:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="756643382" X-IronPort-AV: E=Sophos;i="6.01,198,1684825200"; d="scan'208";a="756643382" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by orsmga001.jf.intel.com with ESMTP; 11 Jul 2023 23:01:53 -0700 From: Yin Fengwei To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, yuzhao@google.com, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Cc: fengwei.yin@intel.com Subject: [RFC PATCH v2 1/3] mm: add functions folio_in_range() and folio_within_vma() Date: Wed, 12 Jul 2023 14:01:42 +0800 Message-Id: <20230712060144.3006358-2-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230712060144.3006358-1-fengwei.yin@intel.com> References: <20230712060144.3006358-1-fengwei.yin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" It will be used to check whether the folio is mapped to specific VMA and whether the mapping address of folio is in the range. Also a helper function folio_within_vma() to check whether folio is in the range of vma based on folio_in_range(). Signed-off-by: Yin Fengwei Reviewed-by: Yu Zhao --- mm/internal.h | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/mm/internal.h b/mm/internal.h index 483add0bfb289..c7dd15d8de3ef 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -585,6 +585,38 @@ extern long faultin_vma_page_range(struct vm_area_stru= ct *vma, bool write, int *locked); extern bool mlock_future_ok(struct mm_struct *mm, unsigned long flags, unsigned long bytes); + +static inline bool +folio_in_range(struct folio *folio, struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + pgoff_t pgoff, addr; + unsigned long vma_pglen =3D (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + + VM_WARN_ON_FOLIO(folio_test_ksm(folio), folio); + if (start < vma->vm_start) + start =3D vma->vm_start; + + if (end > vma->vm_end) + end =3D vma->vm_end; + + pgoff =3D folio_pgoff(folio); + + /* if folio start address is not in vma range */ + if (pgoff < vma->vm_pgoff || pgoff > vma->vm_pgoff + vma_pglen) + return false; + + addr =3D vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + + return ((addr >=3D start) && (addr + folio_size(folio) <=3D end)); +} + +static inline bool +folio_within_vma(struct folio *folio, struct vm_area_struct *vma) +{ + return folio_in_range(folio, vma, vma->vm_start, vma->vm_end); +} + /* * mlock_vma_folio() and munlock_vma_folio(): * should be called with vma's mmap_lock held for read or write, --=20 2.39.2 From nobody Sun Feb 8 16:05:24 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F96EEB64D9 for ; Wed, 12 Jul 2023 06:02:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231950AbjGLGCT (ORCPT ); Wed, 12 Jul 2023 02:02:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231908AbjGLGCR (ORCPT ); Wed, 12 Jul 2023 02:02:17 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C772F198A for ; Tue, 11 Jul 2023 23:02:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689141730; x=1720677730; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bgQflrx8qMg9vkqGlfihJRy1xgfAbAds62AMD1mGmM4=; b=FvPwCiTQ2ewOzBpaDjqFetT9isTYHhpAjobWoHZFeIe6ep3ztCH19Ivb HwavmLE6K5tpFAHGpCl9gLrlUIO86Hu4bpaQHM5vFzoSqw/0sip947DEb v/GqNwmNSHzHsOXFTTeiBg/xMFKtubFyY587jPVcxWUlJ1YlqlBD1UL73 45TaGFtOkIz/5LQL9iVU2D6N+2MH4innH2+N977o9rYInXR42z3XrSjMl bE7OxDR4p6pvixxAa7G5PGwrkors1Egl2usSZhabG4nJyGOg1/OtVtW5p vsMDOZX+f41IOJE0TLfGLil/F6I8tf3uPAqVWsszJPqaYkwq4g83C+0K4 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="363673767" X-IronPort-AV: E=Sophos;i="6.01,198,1684825200"; d="scan'208";a="363673767" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2023 23:02:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="1052051350" X-IronPort-AV: E=Sophos;i="6.01,198,1684825200"; d="scan'208";a="1052051350" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by fmsmga005.fm.intel.com with ESMTP; 11 Jul 2023 23:02:06 -0700 From: Yin Fengwei To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, yuzhao@google.com, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Cc: fengwei.yin@intel.com Subject: [RFC PATCH v2 2/3] mm: handle large folio when large folio in VM_LOCKED VMA range Date: Wed, 12 Jul 2023 14:01:43 +0800 Message-Id: <20230712060144.3006358-3-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230712060144.3006358-1-fengwei.yin@intel.com> References: <20230712060144.3006358-1-fengwei.yin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If large folio is in the range of VM_LOCKED VMA, it should be mlocked to avoid being picked by page reclaim. Which may split the large folio and then mlock each pages again. Mlock this kind of large folio to prevent them being picked by page reclaim. For the large folio which cross the boundary of VM_LOCKED VMA, we'd better not to mlock it. So if the system is under memory pressure, this kind of large folio will be split and the pages ouf of VM_LOCKED VMA can be reclaimed. Signed-off-by: Yin Fengwei --- mm/internal.h | 11 ++++++++--- mm/rmap.c | 34 +++++++++++++++++++++++++++------- 2 files changed, 35 insertions(+), 10 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index c7dd15d8de3ef..776141de2797a 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -643,7 +643,8 @@ static inline void mlock_vma_folio(struct folio *folio, * still be set while VM_SPECIAL bits are added: so ignore it then. */ if (unlikely((vma->vm_flags & (VM_LOCKED|VM_SPECIAL)) =3D=3D VM_LOCKED) && - (compound || !folio_test_large(folio))) + (compound || !folio_test_large(folio) || + folio_in_range(folio, vma, vma->vm_start, vma->vm_end))) mlock_folio(folio); } =20 @@ -651,8 +652,12 @@ void munlock_folio(struct folio *folio); static inline void munlock_vma_folio(struct folio *folio, struct vm_area_struct *vma, bool compound) { - if (unlikely(vma->vm_flags & VM_LOCKED) && - (compound || !folio_test_large(folio))) + /* + * To handle the case that a mlocked large folio is unmapped from VMA + * piece by piece, allow munlock the large folio which is partially + * mapped to VMA. + */ + if (unlikely(vma->vm_flags & VM_LOCKED)) munlock_folio(folio); } =20 diff --git a/mm/rmap.c b/mm/rmap.c index 2668f5ea35342..455f415d8d9ca 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -803,6 +803,14 @@ struct folio_referenced_arg { unsigned long vm_flags; struct mem_cgroup *memcg; }; + +static inline bool should_restore_mlock(struct folio *folio, + struct vm_area_struct *vma, bool pmd_mapped) +{ + return !folio_test_large(folio) || + pmd_mapped || folio_within_vma(folio, vma); +} + /* * arg: folio_referenced_arg will be passed */ @@ -816,13 +824,25 @@ static bool folio_referenced_one(struct folio *folio, while (page_vma_mapped_walk(&pvmw)) { address =3D pvmw.address; =20 - if ((vma->vm_flags & VM_LOCKED) && - (!folio_test_large(folio) || !pvmw.pte)) { - /* Restore the mlock which got missed */ - mlock_vma_folio(folio, vma, !pvmw.pte); - page_vma_mapped_walk_done(&pvmw); - pra->vm_flags |=3D VM_LOCKED; - return false; /* To break the loop */ + if (vma->vm_flags & VM_LOCKED) { + if (should_restore_mlock(folio, vma, !pvmw.pte)) { + /* Restore the mlock which got missed */ + mlock_vma_folio(folio, vma, !pvmw.pte); + page_vma_mapped_walk_done(&pvmw); + pra->vm_flags |=3D VM_LOCKED; + return false; /* To break the loop */ + } else { + /* + * For large folio cross VMA boundaries, it's + * expected to be picked by page reclaim. But + * should skip reference of pages which are in + * the range of VM_LOCKED vma. As page reclaim + * should just count the reference of pages out + * the range of VM_LOCKED vma. + */ + pra->mapcount--; + continue; + } } =20 if (pvmw.pte) { --=20 2.39.2 From nobody Sun Feb 8 16:05:24 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0971FEB64DA for ; Wed, 12 Jul 2023 06:02:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231908AbjGLGCg (ORCPT ); Wed, 12 Jul 2023 02:02:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231979AbjGLGCZ (ORCPT ); Wed, 12 Jul 2023 02:02:25 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9B031995 for ; Tue, 11 Jul 2023 23:02:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689141743; x=1720677743; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=q8YW+Ak1aFSHS6ap1YknDbUFbazLnvkg+GE8oMd5GIc=; b=enRLnjAyfbywbEY81IxqlMNG7o2X8GJXfjfhkjMMe30vy9nHF5QIlrLO pk7GQQEcbmRTdT7CRnEMrEDTyuHlAQvNKVwSRogWVSJ+Xb8K/gDvEx6Ld AcFFP5iC+aQgBHb1nPrlvlP1Dp6BnAjYunmHyexrcpRtBlFamgq6gvGbk A0C3HwsITopl4phQz+mzoL3Iau5PNh2IIMiMPXaOirR6XDBP68Hw3801M wCH5yjghn1s57dln24wKUlhptM6beOsXshyjvOnU9Ybrv/1chhBEwCkPE Ja1Ce058+MAMmBCUlb3WKNlzYYk81R8p1ii39aJKKvBrDdvNmnm02uqly g==; X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="349662855" X-IronPort-AV: E=Sophos;i="6.01,198,1684825200"; d="scan'208";a="349662855" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2023 23:02:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="865994265" X-IronPort-AV: E=Sophos;i="6.01,198,1684825200"; d="scan'208";a="865994265" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by fmsmga001.fm.intel.com with ESMTP; 11 Jul 2023 23:02:20 -0700 From: Yin Fengwei To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, yuzhao@google.com, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Cc: fengwei.yin@intel.com Subject: [RFC PATCH v2 3/3] mm: mlock: update mlock_pte_range to handle large folio Date: Wed, 12 Jul 2023 14:01:44 +0800 Message-Id: <20230712060144.3006358-4-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230712060144.3006358-1-fengwei.yin@intel.com> References: <20230712060144.3006358-1-fengwei.yin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Current kernel only lock base size folio during mlock syscall. Add large folio support with following rules: - Only mlock large folio when it's in VM_LOCKED VMA range - If there is cow folio, mlock the cow folio as cow folio is also in VM_LOCKED VMA range. - munlock will apply to the large folio which is in VMA range or cross the VMA boundary. The last rule is used to handle the case that the large folio is mlocked, later the VMA is split in the middle of large folio and this large folio become cross VMA boundary. Signed-off-by: Yin Fengwei --- mm/mlock.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 99 insertions(+), 5 deletions(-) diff --git a/mm/mlock.c b/mm/mlock.c index 0a0c996c5c214..f49e079066870 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -305,6 +305,95 @@ void munlock_folio(struct folio *folio) local_unlock(&mlock_fbatch.lock); } =20 +static inline bool should_mlock_folio(struct folio *folio, + struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_LOCKED) + return (!folio_test_large(folio) || + folio_within_vma(folio, vma)); + + /* + * For unlock, allow munlock large folio which is partially + * mapped to VMA. As it's possible that large folio is + * mlocked and VMA is split later. + * + * During memory pressure, such kind of large folio can + * be split. And the pages are not in VM_LOCKed VMA + * can be reclaimed. + */ + + return true; +} + +static inline unsigned int get_folio_mlock_step(struct folio *folio, + pte_t pte, unsigned long addr, unsigned long end) +{ + unsigned int nr; + + nr =3D folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte); + return min_t(unsigned int, nr, (end - addr) >> PAGE_SHIFT); +} + +void mlock_folio_range(struct folio *folio, struct vm_area_struct *vma, + pte_t *pte, unsigned long addr, unsigned int nr) +{ + struct folio *cow_folio; + unsigned int step =3D 1; + + mlock_folio(folio); + if (nr =3D=3D 1) + return; + + for (; nr > 0; pte +=3D step, addr +=3D (step << PAGE_SHIFT), nr -=3D ste= p) { + pte_t ptent; + + step =3D 1; + ptent =3D ptep_get(pte); + + if (!pte_present(ptent)) + continue; + + cow_folio =3D vm_normal_folio(vma, addr, ptent); + if (!cow_folio || cow_folio =3D=3D folio) { + continue; + } + + mlock_folio(cow_folio); + step =3D get_folio_mlock_step(folio, ptent, + addr, addr + (nr << PAGE_SHIFT)); + } +} + +void munlock_folio_range(struct folio *folio, struct vm_area_struct *vma, + pte_t *pte, unsigned long addr, unsigned int nr) +{ + struct folio *cow_folio; + unsigned int step =3D 1; + + munlock_folio(folio); + if (nr =3D=3D 1) + return; + + for (; nr > 0; pte +=3D step, addr +=3D (step << PAGE_SHIFT), nr -=3D ste= p) { + pte_t ptent; + + step =3D 1; + ptent =3D ptep_get(pte); + + if (!pte_present(ptent)) + continue; + + cow_folio =3D vm_normal_folio(vma, addr, ptent); + if (!cow_folio || cow_folio =3D=3D folio) { + continue; + } + + munlock_folio(cow_folio); + step =3D get_folio_mlock_step(folio, ptent, + addr, addr + (nr << PAGE_SHIFT)); + } +} + static int mlock_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) =20 @@ -314,6 +403,7 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long ad= dr, pte_t *start_pte, *pte; pte_t ptent; struct folio *folio; + unsigned int step =3D 1; =20 ptl =3D pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -329,24 +419,28 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long = addr, goto out; } =20 - start_pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + pte =3D start_pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); if (!start_pte) { walk->action =3D ACTION_AGAIN; return 0; } - for (pte =3D start_pte; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { + + for (; addr !=3D end; pte +=3D step, addr +=3D (step << PAGE_SHIFT)) { + step =3D 1; ptent =3D ptep_get(pte); if (!pte_present(ptent)) continue; folio =3D vm_normal_folio(vma, addr, ptent); if (!folio || folio_is_zone_device(folio)) continue; - if (folio_test_large(folio)) + if (!should_mlock_folio(folio, vma)) continue; + + step =3D get_folio_mlock_step(folio, ptent, addr, end); if (vma->vm_flags & VM_LOCKED) - mlock_folio(folio); + mlock_folio_range(folio, vma, pte, addr, step); else - munlock_folio(folio); + munlock_folio_range(folio, vma, pte, addr, step); } pte_unmap(start_pte); out: --=20 2.39.2