[PATCH] bcachefs: WARNING in bch2_trans_srcu_unlock

Camila Alvarez posted 1 patch 1 year, 6 months ago
fs/bcachefs/io_misc.c | 2 --
1 file changed, 2 deletions(-)
[PATCH] bcachefs: WARNING in bch2_trans_srcu_unlock
Posted by Camila Alvarez 1 year, 6 months ago
bch2_extent_fallocate should not return -BCH_ERR_transaction_restart_nested when there are no buckets available.
Returning -BCH_ERR_transaction_restart_nested causes the sector allocation to retry even if there's no space available.

Reported-by: syzbot+1e515cab343dbe5aa38a@syzkaller.appspotmail.com
Signed-off-by: Camila Alvarez <cam.alvarez.i@gmail.com>
---
 fs/bcachefs/io_misc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/bcachefs/io_misc.c b/fs/bcachefs/io_misc.c
index 82f9170dab3f..b5b92ef030df 100644
--- a/fs/bcachefs/io_misc.c
+++ b/fs/bcachefs/io_misc.c
@@ -90,8 +90,6 @@ int bch2_extent_fallocate(struct btree_trans *trans,
 				opts.data_replicas,
 				opts.data_replicas,
 				BCH_WATERMARK_normal, 0, &cl, &wp);
-		if (bch2_err_matches(ret, BCH_ERR_operation_blocked))
-			ret = -BCH_ERR_transaction_restart_nested;
 		if (ret)
 			goto err;
 
-- 
2.34.1
Re: [PATCH] bcachefs: WARNING in bch2_trans_srcu_unlock
Posted by Kent Overstreet 1 year, 6 months ago
On Tue, Jul 23, 2024 at 09:08:00PM GMT, Camila Alvarez wrote:
> bch2_extent_fallocate should not return -BCH_ERR_transaction_restart_nested when there are no buckets available.
> Returning -BCH_ERR_transaction_restart_nested causes the sector allocation to retry even if there's no space available.

It seems like you're saying this fixes a livelock, but I don't see that:
if we return operation_blocked, the closure_sync we do before we return
will still wait.

More explanation?

> 
> Reported-by: syzbot+1e515cab343dbe5aa38a@syzkaller.appspotmail.com
> Signed-off-by: Camila Alvarez <cam.alvarez.i@gmail.com>
> ---
>  fs/bcachefs/io_misc.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/bcachefs/io_misc.c b/fs/bcachefs/io_misc.c
> index 82f9170dab3f..b5b92ef030df 100644
> --- a/fs/bcachefs/io_misc.c
> +++ b/fs/bcachefs/io_misc.c
> @@ -90,8 +90,6 @@ int bch2_extent_fallocate(struct btree_trans *trans,
>  				opts.data_replicas,
>  				opts.data_replicas,
>  				BCH_WATERMARK_normal, 0, &cl, &wp);
> -		if (bch2_err_matches(ret, BCH_ERR_operation_blocked))
> -			ret = -BCH_ERR_transaction_restart_nested;
>  		if (ret)
>  			goto err;
>  
> -- 
> 2.34.1
>
Re: [PATCH] bcachefs: WARNING in bch2_trans_srcu_unlock
Posted by Camila Alvarez Inostroza 1 year, 6 months ago

On Fri, 2 Aug 2024, Kent Overstreet wrote:

> On Tue, Jul 23, 2024 at 09:08:00PM GMT, Camila Alvarez wrote:
>> bch2_extent_fallocate should not return -BCH_ERR_transaction_restart_nested when there are no buckets available.
>> Returning -BCH_ERR_transaction_restart_nested causes the sector allocation to retry even if there's no space available.
>
> It seems like you're saying this fixes a livelock, but I don't see that:
> if we return operation_blocked, the closure_sync we do before we return
> will still wait.
>
> More explanation?
>
Yes, sorry for the delay. The closure_sync will still wait, but even after 
waiting there's no space left.
I think the issue is that we're trying to allocate a bucket when none are 
available:
closure_wait(&c->freelist_wait, cl);

journal_write_done is called at some point:
bch2_do_discard(c)
closure_wake_up(&c->freelist_wait)
...

That apparently doesn't free any buckets (should we add some sort of 
assertion to notify us when this happens?), so there is still no space 
left, so bch2_bucket_alloc_trans keeps returning -BCH_ERR_freelist_empty.
With this bch2_alloc_sectors_start_trans return 
-BCH_err_bucket_alloc_blocked, which bch2_extent_fallocate turns into 
-BCH_ERR_transaction_restart_nested.

Finally __bchfs_fallocate sees the error as BCH_ERR_transaction_restart, 
so it tries again.
The issue is that the journal never discards any buckets so it just keeps 
retrying indefinitely.

I've been thinking, and maybe we don't want to just keep __bchfs_fallocate 
from retrying, but would prefer to control how many times it does.

Does all of this make any sense to you?


>>
>> Reported-by: syzbot+1e515cab343dbe5aa38a@syzkaller.appspotmail.com
>> Signed-off-by: Camila Alvarez <cam.alvarez.i@gmail.com>
>> ---
>>  fs/bcachefs/io_misc.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/fs/bcachefs/io_misc.c b/fs/bcachefs/io_misc.c
>> index 82f9170dab3f..b5b92ef030df 100644
>> --- a/fs/bcachefs/io_misc.c
>> +++ b/fs/bcachefs/io_misc.c
>> @@ -90,8 +90,6 @@ int bch2_extent_fallocate(struct btree_trans *trans,
>>  				opts.data_replicas,
>>  				opts.data_replicas,
>>  				BCH_WATERMARK_normal, 0, &cl, &wp);
>> -		if (bch2_err_matches(ret, BCH_ERR_operation_blocked))
>> -			ret = -BCH_ERR_transaction_restart_nested;
>>  		if (ret)
>>  			goto err;
>>
>> --
>> 2.34.1
>>
>