[v1] multiblock allocator improvements

[PATCH 01/13] Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits"

Posted by Ojaswin Mujoo 2 years, 8 months ago

This reverts commit 32c0869370194ae5ac9f9f501953ef693040f6a1.

The reverted commit was intended to remove a dead check however it was observed
that this check was actually being used to exit early instead of looping
sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than
the goal extent. Due to this, a my performance tests (fsmark, parallel file
writes in a highly fragmented FS) were seeing a 2x-3x regression.

Example, the default value of the following variables is:

sbi->s_mb_max_to_scan = 200
sbi->s_mb_min_to_scan = 10

In ext4_mb_check_limits() if we find an extent smaller than goal, then we return
early and try again. This loop will go on until we have processed
sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and
just use whatever we have even if it is smaller than goal extent.

Now, the regression comes when we find an extent bigger than goal. Earlier, in
this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use
the bigger extent. However with commit 32c08693 that check was removed and hence
we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough
free extent to satisfy the request. The only time we would exit early would be
when the free extent is *exactly* the size of our goal, which is pretty uncommon
occurrence and so we would almost always end up looping 200 times.

Hence, revert the commit by adding the check back to fix the regression. Also
add a comment to outline this policy.

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/ext4/mballoc.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 9c7881a4ea75..2e1a5f001883 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2062,7 +2062,7 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
 	if (bex->fe_len < gex->fe_len)
 		return;
 
-	if (finish_group)
+	if (finish_group || ac->ac_found > sbi->s_mb_min_to_scan)
 		ext4_mb_use_best_found(ac, e4b);
 }
 
@@ -2074,6 +2074,20 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
  * in the context. Later, the best found extent will be used, if
  * mballoc can't find good enough extent.
  *
+ * The algorithm used is roughly as follows:
+ *
+ * * If free extent found is exactly as big as goal, then
+ *   stop the scan and use it immediately
+ *
+ * * If free extent found is smaller than goal, then keep retrying
+ *   upto a max of sbi->s_mb_max_to_scan times (default 200). After
+ *   that stop scanning and use whatever we have.
+ *
+ * * If free extent found is bigger than goal, then keep retrying
+ *   upto a max of sbi->s_mb_min_to_scan times (default 10) before
+ *   stopping the scan and using the extent.
+ *
+ *
  * FIXME: real allocation policy is to be designed yet!
  */
 static void ext4_mb_measure_extent(struct ext4_allocation_context *ac,
-- 
2.31.1

Re: [PATCH 01/13] Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits"

Posted by Linux regression tracking #adding (Thorsten Leemhuis) 2 years, 8 months ago

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 25.05.23 13:32, Ojaswin Mujoo wrote:
> This reverts commit 32c0869370194ae5ac9f9f501953ef693040f6a1.
> 
> The reverted commit was intended to remove a dead check however it was observed
> that this check was actually being used to exit early instead of looping
> sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than
> the goal extent. Due to this, a my performance tests (fsmark, parallel file
> writes in a highly fragmented FS) were seeing a 2x-3x regression.
> 
> Example, the default value of the following variables is:
> 
> sbi->s_mb_max_to_scan = 200
> sbi->s_mb_min_to_scan = 10
> 
> In ext4_mb_check_limits() if we find an extent smaller than goal, then we return
> early and try again. This loop will go on until we have processed
> sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and
> just use whatever we have even if it is smaller than goal extent.
> 
> Now, the regression comes when we find an extent bigger than goal. Earlier, in
> this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use
> the bigger extent. However with commit 32c08693 that check was removed and hence
> we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough
> free extent to satisfy the request. The only time we would exit early would be
> when the free extent is *exactly* the size of our goal, which is pretty uncommon
> occurrence and so we would almost always end up looping 200 times.
> 
> Hence, revert the commit by adding the check back to fix the regression. Also
> add a comment to outline this policy.
> 
> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> [...]

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 32c0869370194ae5ac
#regzbot title ext4: 2x-3x regression in performance tests
#regzbot monitor:
https://lore.kernel.org/all/ddcae9658e46880dfec2fb0aa61d01fb3353d202.1685449706.git.ojaswin@linux.ibm.com/
#regzbot fix: Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan
dead check in ext4_mb_check_limits"
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: [PATCH 01/13] Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits"

Posted by Kemeng Shi 2 years, 8 months ago


on 5/25/2023 7:32 PM, Ojaswin Mujoo wrote:
> This reverts commit 32c0869370194ae5ac9f9f501953ef693040f6a1.
> 
> The reverted commit was intended to remove a dead check however it was observed
> that this check was actually being used to exit early instead of looping
> sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than
> the goal extent. Due to this, a my performance tests (fsmark, parallel file
> writes in a highly fragmented FS) were seeing a 2x-3x regression.
> 
> Example, the default value of the following variables is:
> 
> sbi->s_mb_max_to_scan = 200
> sbi->s_mb_min_to_scan = 10
> 
> In ext4_mb_check_limits() if we find an extent smaller than goal, then we return
> early and try again. This loop will go on until we have processed
> sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and
> just use whatever we have even if it is smaller than goal extent.
> 
> Now, the regression comes when we find an extent bigger than goal. Earlier, in
> this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use
> the bigger extent. However with commit 32c08693 that check was removed and hence
> we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough
> free extent to satisfy the request. The only time we would exit early would be
> when the free extent is *exactly* the size of our goal, which is pretty uncommon
> occurrence and so we would almost always end up looping 200 times.
> 
> Hence, revert the commit by adding the check back to fix the regression. Also
> add a comment to outline this policy.
> 
> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
>  fs/ext4/mballoc.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 9c7881a4ea75..2e1a5f001883 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2062,7 +2062,7 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
>  	if (bex->fe_len < gex->fe_len)
>  		return;
>  
> -	if (finish_group)
> +	if (finish_group || ac->ac_found > sbi->s_mb_min_to_scan)
>  		ext4_mb_use_best_found(ac, e4b);
>  }
>  
> @@ -2074,6 +2074,20 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
>   * in the context. Later, the best found extent will be used, if
>   * mballoc can't find good enough extent.
>   *
> + * The algorithm used is roughly as follows:
> + *
> + * * If free extent found is exactly as big as goal, then
> + *   stop the scan and use it immediately
> + *
> + * * If free extent found is smaller than goal, then keep retrying
> + *   upto a max of sbi->s_mb_max_to_scan times (default 200). After
> + *   that stop scanning and use whatever we have.
> + *
> + * * If free extent found is bigger than goal, then keep retrying
> + *   upto a max of sbi->s_mb_min_to_scan times (default 10) before
> + *   stopping the scan and using the extent.
> + *
> + *
>   * FIXME: real allocation policy is to be designed yet!
>   */
>  static void ext4_mb_measure_extent(struct ext4_allocation_context *ac,
> 

My bad, it seems that I mixed up with s_mb_min_to_scan and s_mb_max_to_scan
in previous patch which will make s_mb_min_to_scan not work. Thanks for the
fix. It looks goot to me. Feel free to add my first reviewed-by :)
Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com>

-- 
Best wishes
Kemeng Shi