[PATCH] bcachefs: don't call sleeping funcs when handling inconsistency errors

Bharadwaj Raju posted 1 patch 10 months, 1 week ago
fs/bcachefs/error.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
[PATCH] bcachefs: don't call sleeping funcs when handling inconsistency errors
Posted by Bharadwaj Raju 10 months, 1 week ago
In bch2_bkey_pick_read_device, we're in an RCU lock. So, we can't call
any potentially-sleeping functions. However, we call bch2_dev_rcu,
which calls bch2_fs_inconsistent in its error case. That then calls
bch2_prt_print on a non-atomic printbuf, as well as uses the blocking
variant of bch2_print_string_as_lines, both of which lead to calls to
potentially-sleeping functions, namely krealloc with GFP_KERNEL
and console_lock respectively.

Give a nonzero atomic to the printbuf, and use the nonblocking variant
of bch2_print_string_as_lines.

Reported-by: syzbot+c82cd2906e2f192410bb@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c82cd2906e2f192410bb
Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
---
 fs/bcachefs/error.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/bcachefs/error.c b/fs/bcachefs/error.c
index d4dfd13a8076..6f0f2f12c004 100644
--- a/fs/bcachefs/error.c
+++ b/fs/bcachefs/error.c
@@ -60,6 +60,8 @@ static bool bch2_fs_trans_inconsistent(struct bch_fs *c, struct btree_trans *tra
 {
 	struct printbuf buf = PRINTBUF;
 
+	buf.atomic++;
+
 	bch2_log_msg_start(c, &buf);
 
 	prt_vprintf(&buf, fmt, args);
@@ -68,7 +70,9 @@ static bool bch2_fs_trans_inconsistent(struct bch_fs *c, struct btree_trans *tra
 	if (trans)
 		bch2_trans_updates_to_text(&buf, trans);
 	bool ret = __bch2_inconsistent_error(c, &buf);
-	bch2_print_string_as_lines(KERN_ERR, buf.buf);
+	bch2_print_string_as_lines_nonblocking(KERN_ERR, buf.buf);
+
+	buf.atomic--;
 
 	printbuf_exit(&buf);
 	return ret;
-- 
2.49.0
Re: [PATCH] bcachefs: don't call sleeping funcs when handling inconsistency errors
Posted by Kent Overstreet 10 months, 1 week ago
On Wed, Apr 02, 2025 at 09:40:40PM +0530, Bharadwaj Raju wrote:
> In bch2_bkey_pick_read_device, we're in an RCU lock. So, we can't call
> any potentially-sleeping functions. However, we call bch2_dev_rcu,
> which calls bch2_fs_inconsistent in its error case. That then calls
> bch2_prt_print on a non-atomic printbuf, as well as uses the blocking
> variant of bch2_print_string_as_lines, both of which lead to calls to
> potentially-sleeping functions, namely krealloc with GFP_KERNEL
> and console_lock respectively.
> 
> Give a nonzero atomic to the printbuf, and use the nonblocking variant
> of bch2_print_string_as_lines.

Sorry, beat you to it :)

You also missed the one the syzbot report actually hit -
bch2_inconsistent_error().

commit fef0ac7dbdd3c2166462720a2c0c9b16ad0680a5
Author: Kent Overstreet <kent.overstreet@linux.dev>
Date:   Wed Apr 2 11:02:12 2025 -0400

    bcachefs: Fix scheduling while atomic
    
    bch2_inconsistent(), bch2_fs_inconsistent() be called from interrupt
    context, or with rcu_read_lock() held.
    
    The one syzbot found is in
      bch2_bkey_pick_read_device
      bch2_dev_rcu
      bch2_fs_inconsistent
    
    We're starting to switch to lift the printbufs up to higher levels so we
    can emit better log messages and print them all in one go (avoid
    garbling), so that conversion will help with spotting these in the
    future; when we declare a printbuf it must be flagged if we're in an
    atomic context.
    
    Reported-by: syzbot+c82cd2906e2f192410bb@syzkaller.appspotmail.com
    Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

diff --git a/fs/bcachefs/error.c b/fs/bcachefs/error.c
index d4dfd13a8076..b885bd92834c 100644
--- a/fs/bcachefs/error.c
+++ b/fs/bcachefs/error.c
@@ -45,6 +45,8 @@ bool __bch2_inconsistent_error(struct bch_fs *c, struct printbuf *out)
 bool bch2_inconsistent_error(struct bch_fs *c)
 {
 	struct printbuf buf = PRINTBUF;
+	buf.atomic++;
+
 	printbuf_indent_add_nextline(&buf, 2);
 
 	bool ret = __bch2_inconsistent_error(c, &buf);
@@ -59,6 +61,7 @@ static bool bch2_fs_trans_inconsistent(struct bch_fs *c, struct btree_trans *tra
 				       const char *fmt, va_list args)
 {
 	struct printbuf buf = PRINTBUF;
+	buf.atomic++;
 
 	bch2_log_msg_start(c, &buf);
 

> 
> Reported-by: syzbot+c82cd2906e2f192410bb@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=c82cd2906e2f192410bb
> Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
> ---
>  fs/bcachefs/error.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/bcachefs/error.c b/fs/bcachefs/error.c
> index d4dfd13a8076..6f0f2f12c004 100644
> --- a/fs/bcachefs/error.c
> +++ b/fs/bcachefs/error.c
> @@ -60,6 +60,8 @@ static bool bch2_fs_trans_inconsistent(struct bch_fs *c, struct btree_trans *tra
>  {
>  	struct printbuf buf = PRINTBUF;
>  
> +	buf.atomic++;
> +
>  	bch2_log_msg_start(c, &buf);
>  
>  	prt_vprintf(&buf, fmt, args);
> @@ -68,7 +70,9 @@ static bool bch2_fs_trans_inconsistent(struct bch_fs *c, struct btree_trans *tra
>  	if (trans)
>  		bch2_trans_updates_to_text(&buf, trans);
>  	bool ret = __bch2_inconsistent_error(c, &buf);
> -	bch2_print_string_as_lines(KERN_ERR, buf.buf);
> +	bch2_print_string_as_lines_nonblocking(KERN_ERR, buf.buf);
> +
> +	buf.atomic--;
>  
>  	printbuf_exit(&buf);
>  	return ret;
> -- 
> 2.49.0
>
Re: [PATCH] bcachefs: don't call sleeping funcs when handling inconsistency errors
Posted by Bharadwaj Raju 10 months, 1 week ago
On Wed, Apr 2, 2025 at 9:47 PM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Wed, Apr 02, 2025 at 09:40:40PM +0530, Bharadwaj Raju wrote:
> > In bch2_bkey_pick_read_device, we're in an RCU lock. So, we can't call
> > any potentially-sleeping functions. However, we call bch2_dev_rcu,
> > which calls bch2_fs_inconsistent in its error case. That then calls
> > bch2_prt_print on a non-atomic printbuf, as well as uses the blocking
> > variant of bch2_print_string_as_lines, both of which lead to calls to
> > potentially-sleeping functions, namely krealloc with GFP_KERNEL
> > and console_lock respectively.
> >
> > Give a nonzero atomic to the printbuf, and use the nonblocking variant
> > of bch2_print_string_as_lines.
>
> Sorry, beat you to it :)
>
> You also missed the one the syzbot report actually hit -
> bch2_inconsistent_error().

Oops, thank you.

If I'm not wrong, though, the bch2_print_string_as_lines
still needs to be changed to bch2_print_string_as_lines_nonblocking?

In my testing that also produces the same BUG warning.

Should I make a patch for that?
Re: [PATCH] bcachefs: don't call sleeping funcs when handling inconsistency errors
Posted by Kent Overstreet 10 months, 1 week ago
On Wed, Apr 02, 2025 at 10:03:10PM +0530, Bharadwaj Raju wrote:
> On Wed, Apr 2, 2025 at 9:47 PM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> >
> > On Wed, Apr 02, 2025 at 09:40:40PM +0530, Bharadwaj Raju wrote:
> > > In bch2_bkey_pick_read_device, we're in an RCU lock. So, we can't call
> > > any potentially-sleeping functions. However, we call bch2_dev_rcu,
> > > which calls bch2_fs_inconsistent in its error case. That then calls
> > > bch2_prt_print on a non-atomic printbuf, as well as uses the blocking
> > > variant of bch2_print_string_as_lines, both of which lead to calls to
> > > potentially-sleeping functions, namely krealloc with GFP_KERNEL
> > > and console_lock respectively.
> > >
> > > Give a nonzero atomic to the printbuf, and use the nonblocking variant
> > > of bch2_print_string_as_lines.
> >
> > Sorry, beat you to it :)
> >
> > You also missed the one the syzbot report actually hit -
> > bch2_inconsistent_error().
> 
> Oops, thank you.
> 
> If I'm not wrong, though, the bch2_print_string_as_lines
> still needs to be changed to bch2_print_string_as_lines_nonblocking?
> 
> In my testing that also produces the same BUG warning.
> 
> Should I make a patch for that?

Yeah, you're right - please do.

If you're feeling particularly adventurous - print_string_as_lines() is
a hack, I think we should be able to do something more robust by
skipping printk (that's where the 1k limit comes from) and calling
something lower level - that will require digging into the printk
codepath and finding lower level we can call.

I also just noticed that print_string_as_lines() needs to check for
being passed a NULL pointer - in case the printbuf memory allocation
fails. Want to get that one too?
Re: [PATCH] bcachefs: don't call sleeping funcs when handling inconsistency errors
Posted by Bharadwaj Raju 10 months, 1 week ago
On Wed, Apr 2, 2025 at 10:08 PM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Wed, Apr 02, 2025 at 10:03:10PM +0530, Bharadwaj Raju wrote:
> > On Wed, Apr 2, 2025 at 9:47 PM Kent Overstreet
> > <kent.overstreet@linux.dev> wrote:
>
> If you're feeling particularly adventurous - print_string_as_lines() is
> a hack, I think we should be able to do something more robust by
> skipping printk (that's where the 1k limit comes from) and calling
> something lower level - that will require digging into the printk
> codepath and finding lower level we can call.

I tried looking into the printk codepath, namely vprintk_emit -> vprintk_store.
It doesn't seem like there's a convenient single lower-level
entrypoint we could
call which just avoids the 1k limit, rather there's a lot of internal
code mixed with
the truncation that we'd have to just copy if we want printk behavior.
I don't think that's a reasonable option for us.

> I also just noticed that print_string_as_lines() needs to check for
> being passed a NULL pointer - in case the printbuf memory allocation
> fails. Want to get that one too?

From what I'm seeing __bch2_print_string_as_lines already checks
for the lines pointer being NULL. The only unchecked pointer is prefix,
which I don't think needs to be checked since it will be something constant
from kern_levels.h, not something which could be NULL.
Re: [PATCH] bcachefs: don't call sleeping funcs when handling inconsistency errors
Posted by Kent Overstreet 10 months, 1 week ago
On Fri, Apr 04, 2025 at 12:22:45PM +0530, Bharadwaj Raju wrote:
> On Wed, Apr 2, 2025 at 10:08 PM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> >
> > On Wed, Apr 02, 2025 at 10:03:10PM +0530, Bharadwaj Raju wrote:
> > > On Wed, Apr 2, 2025 at 9:47 PM Kent Overstreet
> > > <kent.overstreet@linux.dev> wrote:
> >
> > If you're feeling particularly adventurous - print_string_as_lines() is
> > a hack, I think we should be able to do something more robust by
> > skipping printk (that's where the 1k limit comes from) and calling
> > something lower level - that will require digging into the printk
> > codepath and finding lower level we can call.
> 
> I tried looking into the printk codepath, namely vprintk_emit -> vprintk_store.
> It doesn't seem like there's a convenient single lower-level
> entrypoint we could
> call which just avoids the 1k limit, rather there's a lot of internal
> code mixed with
> the truncation that we'd have to just copy if we want printk behavior.
> I don't think that's a reasonable option for us.

Yeah that does look a bit messy - and it doesn't exactly explain where
the 1k limit comes in.


> > I also just noticed that print_string_as_lines() needs to check for
> > being passed a NULL pointer - in case the printbuf memory allocation
> > fails. Want to get that one too?
> 
> From what I'm seeing __bch2_print_string_as_lines already checks
> for the lines pointer being NULL. The only unchecked pointer is prefix,
> which I don't think needs to be checked since it will be something constant
> from kern_levels.h, not something which could be NULL.

Yep, I should've checked the code before I said that :)