netfs: Add a check for NULL folioq in netfs_writeback_unlock_folios

[PATCH] netfs: Add a check for NULL folioq in netfs_writeback_unlock_folios

Posted by Chang Yu 1 month ago

syzkaller reported a null-pointer dereference bug
(https://syzkaller.appspot.com/bug?extid=af5c06208fa71bf31b16) in
netfs_writeback_unlock_folios caused by passing a NULL folioq to
folioq_folio. Fix by adding a check before entering the loop.

Signed-off-by: Chang Yu <marcus.yu.56@gmail.com>
Reported-by: syzbot+af5c06208fa71bf31b16@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=af5c06208fa71bf31b16
Fixes: cd0277ed0c18 ("netfs: Use new folio_queue data type and iterator instead of xarray iter")
---
 fs/netfs/write_collect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index 1d438be2e1b4..23d46a409ff2 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -98,7 +98,7 @@ static void netfs_writeback_unlock_folios(struct netfs_io_request *wreq,
 		slot = 0;
 	}
 
-	for (;;) {
+	while (folioq) {
 		struct folio *folio;
 		struct netfs_folio *finfo;
 		unsigned long long fpos, fend;
-- 
2.47.0

Re: [PATCH] netfs: Add a check for NULL folioq in netfs_writeback_unlock_folios

Posted by David Howells 1 month ago

Chang Yu <marcus.yu.56@gmail.com> wrote:

> syzkaller reported a null-pointer dereference bug
> (https://syzkaller.appspot.com/bug?extid=af5c06208fa71bf31b16) in
> netfs_writeback_unlock_folios caused by passing a NULL folioq to
> folioq_folio. Fix by adding a check before entering the loop.

And, of course, the preceding:

	if (slot >= folioq_nr_slots(folioq)) {

doesn't oops because it doesn't actually dereference folioq.

However... if we get into this function, there absolutely *should* be at least
one folioq in the rolling buffer.  Part of the rolling buffer's method of
operation involves keeping at least one folioq around at all times so that we
don't need to use locks to add/remove from the queue.

Either the rolling buffer wasn't initialised yet (and it should be initialised
for all write requests by netfs_create_write_req()) or it has been destroyed
already.

Either way, your patch is, unfortunately, just covering up the symptoms rather
than fixing the root cause.  I suggest instead that we patch the function to
detect the empty rolling buffer up front, dump some information about the bad
request and return.

David

Re: [PATCH] netfs: Add a check for NULL folioq in netfs_writeback_unlock_folios

Posted by Chang Yu 1 month ago

On Fri, Oct 25, 2024 at 09:05:53AM +0100, David Howells wrote:
> Chang Yu <marcus.yu.56@gmail.com> wrote:
> 
> > syzkaller reported a null-pointer dereference bug
> > (https://syzkaller.appspot.com/bug?extid=af5c06208fa71bf31b16) in
> > netfs_writeback_unlock_folios caused by passing a NULL folioq to
> > folioq_folio. Fix by adding a check before entering the loop.
> 
> And, of course, the preceding:
> 
> 	if (slot >= folioq_nr_slots(folioq)) {
> 
> doesn't oops because it doesn't actually dereference folioq.
> 
> However... if we get into this function, there absolutely *should* be at least
> one folioq in the rolling buffer.  Part of the rolling buffer's method of
> operation involves keeping at least one folioq around at all times so that we
> don't need to use locks to add/remove from the queue.
> 
> Either the rolling buffer wasn't initialised yet (and it should be initialised
> for all write requests by netfs_create_write_req()) or it has been destroyed
> already.
> 
> Either way, your patch is, unfortunately, just covering up the symptoms rather
> than fixing the root cause.  I suggest instead that we patch the function to
> detect the empty rolling buffer up front, dump some information about the bad
> request and return.
> 
> David
> 
I see. This might be a stupid question, but is it ever possible that we have
exactly one folioq and at the same time

        slot >= folioq_nr_slots(folioq)

is true? Then I imagine netfs_delete_buffer_head would return NULL and
cause the bug to trigger as well?

Re: [PATCH] netfs: Add a check for NULL folioq in netfs_writeback_unlock_folios

Posted by David Howells 3 weeks, 4 days ago

Chang Yu <marcus.yu.56@gmail.com> wrote:

> I see. This might be a stupid question, but is it ever possible that we have
> exactly one folioq and at the same time
> 
>         slot >= folioq_nr_slots(folioq)
> 
> is true? Then I imagine netfs_delete_buffer_head would return NULL and
> cause the bug to trigger as well?

Whilst it is possible for "slot >= folioq_nr_slots(folioq)" to be true on what
is currently the last folioq, wreq->cleaned_to suggests that there must be
still-locked folios in the queue:

		unsigned long long clean_to = min(wreq->collected_to, wreq->contiguity);
		if (wreq->cleaned_to < clean_to)
			netfs_writeback_unlock_folios(wreq, clean_to, &notes);

David