[PATCH 6/6] mm/hugetlb: make detecting shared pte more reliable

Miaohe Lin posted 6 patches 3 years, 7 months ago
There is a newer version of this series
[PATCH 6/6] mm/hugetlb: make detecting shared pte more reliable
Posted by Miaohe Lin 3 years, 7 months ago
If the pagetables are shared, we shouldn't copy or take references. Since
src could have unshared and dst shares with another vma, huge_pte_none()
is thus used to determine whether dst_pte is shared. But this check isn't
reliable. A shared pte could have pte none in pagetable in fact. The page
count of ptep page should be checked here in order to reliably determine
whether pte is shared.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/hugetlb.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e1356ad57087..25db6d07479e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4795,15 +4795,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 
 		/*
 		 * If the pagetables are shared don't copy or take references.
-		 * dst_pte == src_pte is the common case of src/dest sharing.
 		 *
+		 * dst_pte == src_pte is the common case of src/dest sharing.
 		 * However, src could have 'unshared' and dst shares with
-		 * another vma.  If dst_pte !none, this implies sharing.
-		 * Check here before taking page table lock, and once again
-		 * after taking the lock below.
+		 * another vma. So page_count of ptep page is checked instead
+		 * to reliably determine whether pte is shared.
 		 */
-		dst_entry = huge_ptep_get(dst_pte);
-		if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) {
+		if (page_count(virt_to_page(dst_pte)) > 1) {
 			addr |= last_addr_mask;
 			continue;
 		}
@@ -4814,11 +4812,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 		entry = huge_ptep_get(src_pte);
 		dst_entry = huge_ptep_get(dst_pte);
 again:
-		if (huge_pte_none(entry) || !huge_pte_none(dst_entry)) {
+		if (huge_pte_none(entry)) {
 			/*
-			 * Skip if src entry none.  Also, skip in the
-			 * unlikely case dst entry !none as this implies
-			 * sharing with another vma.
+			 * Skip if src entry none.
 			 */
 			;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
-- 
2.23.0
Re: [PATCH 6/6] mm/hugetlb: make detecting shared pte more reliable
Posted by Mike Kravetz 3 years, 7 months ago
On 08/16/22 21:05, Miaohe Lin wrote:
> If the pagetables are shared, we shouldn't copy or take references. Since
> src could have unshared and dst shares with another vma, huge_pte_none()
> is thus used to determine whether dst_pte is shared. But this check isn't
> reliable. A shared pte could have pte none in pagetable in fact. The page
> count of ptep page should be checked here in order to reliably determine
> whether pte is shared.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/hugetlb.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)

You are correct, this is a better/more reliable way to check for pmd sharing.
It is accurate since we hold i_mmap_rwsem.  I like it.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

Note to self, this will not work if we move to vma based locking for pmd
sharing and do not hold i_mmap_rwsem here.

> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e1356ad57087..25db6d07479e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4795,15 +4795,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>  
>  		/*
>  		 * If the pagetables are shared don't copy or take references.
> -		 * dst_pte == src_pte is the common case of src/dest sharing.
>  		 *
> +		 * dst_pte == src_pte is the common case of src/dest sharing.
>  		 * However, src could have 'unshared' and dst shares with
> -		 * another vma.  If dst_pte !none, this implies sharing.
> -		 * Check here before taking page table lock, and once again
> -		 * after taking the lock below.
> +		 * another vma. So page_count of ptep page is checked instead
> +		 * to reliably determine whether pte is shared.
>  		 */
> -		dst_entry = huge_ptep_get(dst_pte);
> -		if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) {
> +		if (page_count(virt_to_page(dst_pte)) > 1) {
>  			addr |= last_addr_mask;
>  			continue;
>  		}
> @@ -4814,11 +4812,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>  		entry = huge_ptep_get(src_pte);
>  		dst_entry = huge_ptep_get(dst_pte);
>  again:
> -		if (huge_pte_none(entry) || !huge_pte_none(dst_entry)) {
> +		if (huge_pte_none(entry)) {
>  			/*
> -			 * Skip if src entry none.  Also, skip in the
> -			 * unlikely case dst entry !none as this implies
> -			 * sharing with another vma.
> +			 * Skip if src entry none.
>  			 */
>  			;
>  		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
> -- 
> 2.23.0
>