From nobody Wed May 13 18:29:32 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BEB9C433EF for ; Wed, 27 Apr 2022 11:02:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232127AbiD0LFy (ORCPT ); Wed, 27 Apr 2022 07:05:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232287AbiD0LDc (ORCPT ); Wed, 27 Apr 2022 07:03:32 -0400 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D246E1BEAD for ; Wed, 27 Apr 2022 03:52:17 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R121e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0VBSxH83_1651056734; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VBSxH83_1651056734) by smtp.aliyun-inc.com(127.0.0.1); Wed, 27 Apr 2022 18:52:14 +0800 From: Baolin Wang To: akpm@linux-foundation.org, mike.kravetz@oracle.com Cc: almasrymina@google.com, songmuchun@bytedance.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 1/3] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs Date: Wed, 27 Apr 2022 18:52:05 +0800 Message-Id: <0443c8cf20db554d3ff4b439b30e0ff26c0181dd.1651056365.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" When moving hugetlb page tables, the cache flushing is called in move_page_tables() without considering the shared PMDs, which may be cause cache issues on some architectures. Thus we should move the hugetlb cache flushing into move_hugetlb_page_tables() with considering the shared PMDs ranges, calculated by adjust_range_if_pmd_sharing_possible(). Meanwhile also expanding the TLBs flushing range in case of shared PMDs. Note this is discovered via code inspection, and did not meet a real problem in practice so far. Fixes: 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage back= ed vma") Signed-off-by: Baolin Wang Reviewed-by: Mike Kravetz Reviewed-by: Muchun Song --- mm/hugetlb.c | 17 +++++++++++++++-- mm/mremap.c | 2 +- 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1945dfb..d3a6094 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4937,10 +4937,17 @@ int move_hugetlb_page_tables(struct vm_area_struct = *vma, unsigned long old_addr_copy; pte_t *src_pte, *dst_pte; struct mmu_notifier_range range; + bool shared_pmd =3D false; =20 mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, old_addr, old_end); adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end); + /* + * In case of shared PMDs, we should cover the maximum possible + * range. + */ + flush_cache_range(vma, range.start, range.end); + mmu_notifier_invalidate_range_start(&range); /* Prevent race with file truncation */ i_mmap_lock_write(mapping); @@ -4957,8 +4964,10 @@ int move_hugetlb_page_tables(struct vm_area_struct *= vma, */ old_addr_copy =3D old_addr; =20 - if (huge_pmd_unshare(mm, vma, &old_addr_copy, src_pte)) + if (huge_pmd_unshare(mm, vma, &old_addr_copy, src_pte)) { + shared_pmd =3D true; continue; + } =20 dst_pte =3D huge_pte_alloc(mm, new_vma, new_addr, sz); if (!dst_pte) @@ -4966,7 +4975,11 @@ int move_hugetlb_page_tables(struct vm_area_struct *= vma, =20 move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte); } - flush_tlb_range(vma, old_end - len, old_end); + + if (shared_pmd) + flush_tlb_range(vma, range.start, range.end); + else + flush_tlb_range(vma, old_end - len, old_end); mmu_notifier_invalidate_range_end(&range); i_mmap_unlock_write(mapping); =20 diff --git a/mm/mremap.c b/mm/mremap.c index 98f50e6..0970025 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -490,12 +490,12 @@ unsigned long move_page_tables(struct vm_area_struct = *vma, return 0; =20 old_end =3D old_addr + len; - flush_cache_range(vma, old_addr, old_end); =20 if (is_vm_hugetlb_page(vma)) return move_hugetlb_page_tables(vma, new_vma, old_addr, new_addr, len); =20 + flush_cache_range(vma, old_addr, old_end); mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm, old_addr, old_end); mmu_notifier_invalidate_range_start(&range); --=20 1.8.3.1 From nobody Wed May 13 18:29:32 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFB43C433FE for ; Wed, 27 Apr 2022 11:02:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232134AbiD0LGD (ORCPT ); Wed, 27 Apr 2022 07:06:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232284AbiD0LDc (ORCPT ); Wed, 27 Apr 2022 07:03:32 -0400 Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com [115.124.30.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DACA911E19E for ; Wed, 27 Apr 2022 03:52:18 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R831e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0VBTIjXj_1651056734; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VBTIjXj_1651056734) by smtp.aliyun-inc.com(127.0.0.1); Wed, 27 Apr 2022 18:52:15 +0800 From: Baolin Wang To: akpm@linux-foundation.org, mike.kravetz@oracle.com Cc: almasrymina@google.com, songmuchun@bytedance.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 2/3] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing Date: Wed, 27 Apr 2022 18:52:06 +0800 Message-Id: <4f7ae6dfdc838ab71e1655188b657c032ff1f28f.1651056365.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The cache level flush will always be first when changing an existing virtual=E2=80=93>physical mapping to a new value, since this allows us to properly handle systems whose caches are strict and require a virtual=E2=80=93>physical translation to exist for a virtual address. So we should move the cache flushing before huge_pmd_unshare(). As Muchun pointed out[1], now the architectures whose supporting hugetlb PMD sharing have no cache flush issues in practice. But I think we should still follow the cache/TLB flushing rules when changing a valid virtual address mapping in case of potential issues in future. [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/ Signed-off-by: Baolin Wang Reviewed-by: Mike Kravetz --- mm/rmap.c | 40 ++++++++++++++++++++++------------------ 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 61e63db..4f0d115 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, * do this outside rmap routines. */ VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + /* + * huge_pmd_unshare may unmap an entire PMD page. + * There is no way of knowing exactly which PMDs may + * be cached for this mm, so we must flush them all. + * start/end were already adjusted above to cover this + * range. + */ + flush_cache_range(vma, range.start, range.end); + if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { - /* - * huge_pmd_unshare unmapped an entire PMD - * page. There is no way of knowing exactly - * which PMDs may be cached for this mm, so - * we must flush them all. start/end were - * already adjusted above to cover this range. - */ - flush_cache_range(vma, range.start, range.end); flush_tlb_range(vma, range.start, range.end); mmu_notifier_invalidate_range(mm, range.start, range.end); @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, page_vma_mapped_walk_done(&pvmw); break; } + } else { + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); } =20 /* * Nuke the page table entry. When having to clear * PageAnonExclusive(), we always have to flush. */ - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); if (should_defer_flush(mm, flags) && !anon_exclusive) { /* * We clear the PTE but do not flush so potentially @@ -1890,15 +1892,16 @@ static bool try_to_migrate_one(struct folio *folio,= struct vm_area_struct *vma, * do this outside rmap routines. */ VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + /* + * huge_pmd_unshare may unmap an entire PMD page. + * There is no way of knowing exactly which PMDs may + * be cached for this mm, so we must flush them all. + * start/end were already adjusted above to cover this + * range. + */ + flush_cache_range(vma, range.start, range.end); + if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { - /* - * huge_pmd_unshare unmapped an entire PMD - * page. There is no way of knowing exactly - * which PMDs may be cached for this mm, so - * we must flush them all. start/end were - * already adjusted above to cover this range. - */ - flush_cache_range(vma, range.start, range.end); flush_tlb_range(vma, range.start, range.end); mmu_notifier_invalidate_range(mm, range.start, range.end); @@ -1915,10 +1918,11 @@ static bool try_to_migrate_one(struct folio *folio,= struct vm_area_struct *vma, page_vma_mapped_walk_done(&pvmw); break; } + } else { + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); } =20 /* Nuke the page table entry. */ - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); pteval =3D ptep_clear_flush(vma, address, pvmw.pte); =20 /* Set the dirty flag on the folio now the pte is gone. */ --=20 1.8.3.1 From nobody Wed May 13 18:29:32 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71665C433F5 for ; Wed, 27 Apr 2022 11:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231792AbiD0LFs (ORCPT ); Wed, 27 Apr 2022 07:05:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232297AbiD0LDc (ORCPT ); Wed, 27 Apr 2022 07:03:32 -0400 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05A6048B885 for ; Wed, 27 Apr 2022 03:52:19 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04357;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0VBTIjXy_1651056735; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VBTIjXy_1651056735) by smtp.aliyun-inc.com(127.0.0.1); Wed, 27 Apr 2022 18:52:16 +0800 From: Baolin Wang To: akpm@linux-foundation.org, mike.kravetz@oracle.com Cc: almasrymina@google.com, songmuchun@bytedance.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 3/3] mm: rmap: Use flush_cache_range() to flush cache for hugetlb pages Date: Wed, 27 Apr 2022 18:52:07 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Now we will use flush_cache_page() to flush cache for anonymous hugetlb pages when unmapping or migrating a hugetlb page mapping, but the flush_cache_page() only handles a PAGE_SIZE range on some architectures (like arm32, arc and so on), which will cause potential cache issues. Thus change to use flush_cache_range() to cover the whole size of a hugetlb page. Signed-off-by: Baolin Wang Reviewed-by: Mike Kravetz --- mm/rmap.c | 90 ++++++++++++++++++++++++++++++++++-------------------------= ---- 1 file changed, 48 insertions(+), 42 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 4f0d115..6fdd198 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1528,13 +1528,7 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, anon_exclusive =3D folio_test_anon(folio) && PageAnonExclusive(subpage); =20 - if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) { - /* - * To call huge_pmd_unshare, i_mmap_rwsem must be - * held in write mode. Caller needs to explicitly - * do this outside rmap routines. - */ - VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + if (folio_test_hugetlb(folio)) { /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may @@ -1544,22 +1538,31 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, */ flush_cache_range(vma, range.start, range.end); =20 - if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { - flush_tlb_range(vma, range.start, range.end); - mmu_notifier_invalidate_range(mm, range.start, - range.end); - + if (!folio_test_anon(folio)) { /* - * The ref count of the PMD page was dropped - * which is part of the way map counting - * is done for shared PMDs. Return 'true' - * here. When there is no other sharing, - * huge_pmd_unshare returns false and we will - * unmap the actual page and drop map count - * to zero. + * To call huge_pmd_unshare, i_mmap_rwsem must be + * held in write mode. Caller needs to explicitly + * do this outside rmap routines. */ - page_vma_mapped_walk_done(&pvmw); - break; + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + + if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { + flush_tlb_range(vma, range.start, range.end); + mmu_notifier_invalidate_range(mm, range.start, + range.end); + + /* + * The ref count of the PMD page was dropped + * which is part of the way map counting + * is done for shared PMDs. Return 'true' + * here. When there is no other sharing, + * huge_pmd_unshare returns false and we will + * unmap the actual page and drop map count + * to zero. + */ + page_vma_mapped_walk_done(&pvmw); + break; + } } } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); @@ -1885,13 +1888,7 @@ static bool try_to_migrate_one(struct folio *folio, = struct vm_area_struct *vma, anon_exclusive =3D folio_test_anon(folio) && PageAnonExclusive(subpage); =20 - if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) { - /* - * To call huge_pmd_unshare, i_mmap_rwsem must be - * held in write mode. Caller needs to explicitly - * do this outside rmap routines. - */ - VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + if (folio_test_hugetlb(folio)) { /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may @@ -1901,22 +1898,31 @@ static bool try_to_migrate_one(struct folio *folio,= struct vm_area_struct *vma, */ flush_cache_range(vma, range.start, range.end); =20 - if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { - flush_tlb_range(vma, range.start, range.end); - mmu_notifier_invalidate_range(mm, range.start, - range.end); - + if (!folio_test_anon(folio)) { /* - * The ref count of the PMD page was dropped - * which is part of the way map counting - * is done for shared PMDs. Return 'true' - * here. When there is no other sharing, - * huge_pmd_unshare returns false and we will - * unmap the actual page and drop map count - * to zero. + * To call huge_pmd_unshare, i_mmap_rwsem must be + * held in write mode. Caller needs to explicitly + * do this outside rmap routines. */ - page_vma_mapped_walk_done(&pvmw); - break; + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + + if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { + flush_tlb_range(vma, range.start, range.end); + mmu_notifier_invalidate_range(mm, range.start, + range.end); + + /* + * The ref count of the PMD page was dropped + * which is part of the way map counting + * is done for shared PMDs. Return 'true' + * here. When there is no other sharing, + * huge_pmd_unshare returns false and we will + * unmap the actual page and drop map count + * to zero. + */ + page_vma_mapped_walk_done(&pvmw); + break; + } } } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); --=20 1.8.3.1