[PATCH v7 4/6] arm64: mm: Optimize split_kernel_leaf_mapping()

Ryan Roberts posted 6 patches 1 month ago
There is a newer version of this series
[PATCH v7 4/6] arm64: mm: Optimize split_kernel_leaf_mapping()
Posted by Ryan Roberts 1 month ago
The common case for split_kernel_leaf_mapping() is for a single page.
Let's optimize this by only calling split_kernel_leaf_mapping_locked()
once.

Since the start and end address are PAGE_SIZE apart, they must be
contained within the same contpte block. Further, if start is at the
beginning of the block or end is at the end of the block, then the other
address must be in the _middle_ of the block. So if we split on this
middle-of-the-contpte-block address, it is guaranteed that the
containing contpte block is split to ptes and both start and end are
therefore mapped by pte.

This avoids the second call to split_kernel_leaf_mapping_locked()
meaning we only have to walk the pgtable once.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/mm/mmu.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 114b88216b0c..8b5b19e1154b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -740,9 +740,21 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
 	mutex_lock(&pgtable_split_lock);
 	arch_enter_lazy_mmu_mode();
 
-	ret = split_kernel_leaf_mapping_locked(start);
-	if (!ret)
-		ret = split_kernel_leaf_mapping_locked(end);
+	/*
+	 * Optimize for the common case of splitting out a single page from a
+	 * larger mapping. Here we can just split on the "least aligned" of
+	 * start and end and this will guarantee that there must also be a split
+	 * on the more aligned address since the both addresses must be in the
+	 * same contpte block and it must have been split to ptes.
+	 */
+	if (end - start == PAGE_SIZE) {
+		start = __ffs(start) < __ffs(end) ? start : end;
+		ret = split_kernel_leaf_mapping_locked(start);
+	} else {
+		ret = split_kernel_leaf_mapping_locked(start);
+		if (!ret)
+			ret = split_kernel_leaf_mapping_locked(end);
+	}
 
 	arch_leave_lazy_mmu_mode();
 	mutex_unlock(&pgtable_split_lock);
-- 
2.43.0
Re: [PATCH v7 4/6] arm64: mm: Optimize split_kernel_leaf_mapping()
Posted by Catalin Marinas 4 weeks, 1 day ago
On Fri, Aug 29, 2025 at 12:52:45PM +0100, Ryan Roberts wrote:
> The common case for split_kernel_leaf_mapping() is for a single page.
> Let's optimize this by only calling split_kernel_leaf_mapping_locked()
> once.
> 
> Since the start and end address are PAGE_SIZE apart, they must be
> contained within the same contpte block. Further, if start is at the
> beginning of the block or end is at the end of the block, then the other
> address must be in the _middle_ of the block. So if we split on this
> middle-of-the-contpte-block address, it is guaranteed that the
> containing contpte block is split to ptes and both start and end are
> therefore mapped by pte.
> 
> This avoids the second call to split_kernel_leaf_mapping_locked()
> meaning we only have to walk the pgtable once.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

And I agree with Yang, you can just fold this into the previous patch.
Re: [PATCH v7 4/6] arm64: mm: Optimize split_kernel_leaf_mapping()
Posted by Ryan Roberts 4 weeks, 1 day ago
On 03/09/2025 20:20, Catalin Marinas wrote:
> On Fri, Aug 29, 2025 at 12:52:45PM +0100, Ryan Roberts wrote:
>> The common case for split_kernel_leaf_mapping() is for a single page.
>> Let's optimize this by only calling split_kernel_leaf_mapping_locked()
>> once.
>>
>> Since the start and end address are PAGE_SIZE apart, they must be
>> contained within the same contpte block. Further, if start is at the
>> beginning of the block or end is at the end of the block, then the other
>> address must be in the _middle_ of the block. So if we split on this
>> middle-of-the-contpte-block address, it is guaranteed that the
>> containing contpte block is split to ptes and both start and end are
>> therefore mapped by pte.
>>
>> This avoids the second call to split_kernel_leaf_mapping_locked()
>> meaning we only have to walk the pgtable once.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> 
> And I agree with Yang, you can just fold this into the previous patch.

Yep, will do. Thanks for the review.
Re: [PATCH v7 4/6] arm64: mm: Optimize split_kernel_leaf_mapping()
Posted by Yang Shi 1 month ago

On 8/29/25 4:52 AM, Ryan Roberts wrote:
> The common case for split_kernel_leaf_mapping() is for a single page.
> Let's optimize this by only calling split_kernel_leaf_mapping_locked()
> once.
>
> Since the start and end address are PAGE_SIZE apart, they must be
> contained within the same contpte block. Further, if start is at the
> beginning of the block or end is at the end of the block, then the other
> address must be in the _middle_ of the block. So if we split on this
> middle-of-the-contpte-block address, it is guaranteed that the
> containing contpte block is split to ptes and both start and end are
> therefore mapped by pte.
>
> This avoids the second call to split_kernel_leaf_mapping_locked()
> meaning we only have to walk the pgtable once.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>   arch/arm64/mm/mmu.c | 18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 114b88216b0c..8b5b19e1154b 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -740,9 +740,21 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
>   	mutex_lock(&pgtable_split_lock);
>   	arch_enter_lazy_mmu_mode();
>   
> -	ret = split_kernel_leaf_mapping_locked(start);
> -	if (!ret)
> -		ret = split_kernel_leaf_mapping_locked(end);
> +	/*
> +	 * Optimize for the common case of splitting out a single page from a
> +	 * larger mapping. Here we can just split on the "least aligned" of
> +	 * start and end and this will guarantee that there must also be a split
> +	 * on the more aligned address since the both addresses must be in the
> +	 * same contpte block and it must have been split to ptes.
> +	 */
> +	if (end - start == PAGE_SIZE) {
> +		start = __ffs(start) < __ffs(end) ? start : end;
> +		ret = split_kernel_leaf_mapping_locked(start);

This makes sense to me. I suggested the same thing in the discussion 
with Dev for v5. I'd like to have this patch squashed into patch #3.

Thanks,
Yang

> +	} else {
> +		ret = split_kernel_leaf_mapping_locked(start);
> +		if (!ret)
> +			ret = split_kernel_leaf_mapping_locked(end);
> +	}
>   
>   	arch_leave_lazy_mmu_mode();
>   	mutex_unlock(&pgtable_split_lock);