[PATCH v6.13] fs/netfs/read_pgpriv2: skip folio queues without `marks3`

Max Kellermann posted 1 patch 12 months ago
fs/netfs/read_pgpriv2.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
[PATCH v6.13] fs/netfs/read_pgpriv2: skip folio queues without `marks3`
Posted by Max Kellermann 12 months ago
At the beginning of the function, folio queues with marks3==0 are
skipped, but after that, the `marks3` field is ignored.  If one such
queue is found, `slot` is set to 64 (because `__ffs(0)==64`), leading
to a buffer overflow in the folioq_folio() call.  The resulting crash
may look like this:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP PTI
 CPU: 11 UID: 0 PID: 2909 Comm: kworker/u262:1 Not tainted 6.13.1-cm4all2-vm #415
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
 Workqueue: events_unbound netfs_read_termination_worker
 RIP: 0010:netfs_pgpriv2_write_to_the_cache+0x15a/0x3f0
 Code: 48 85 c0 48 89 44 24 08 0f 84 24 01 00 00 48 8b 80 40 01 00 00 48 8b 7c 24 08 f3 48 0f bc c0 89 44 24 18 89 c0 48 8b 74 c7 08 <48> 8b 06 48 c7 04 24 00 10 00 00 a8 40 74 10 0f b6 4e 40 b8 00 10
 RSP: 0018:ffffbbc440effe18 EFLAGS: 00010203
 RAX: 0000000000000040 RBX: ffff96f8fc034000 RCX: 0000000000000000
 RDX: 0000000000000040 RSI: 0000000000000000 RDI: ffff96f8fc036400
 RBP: 0000000000001000 R08: ffff96f9132bb400 R09: 0000000000001000
 R10: ffff96f8c1263c80 R11: 0000000000000003 R12: 0000000000001000
 R13: ffff96f8fb75ade8 R14: fffffaaf5ca90000 R15: ffff96f8fb75ad00
 FS:  0000000000000000(0000) GS:ffff9703cf0c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000010c9ca003 CR4: 00000000001706b0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  ? __die+0x1f/0x60
  ? page_fault_oops+0x158/0x450
  ? search_extable+0x22/0x30
  ? netfs_pgpriv2_write_to_the_cache+0x15a/0x3f0
  ? search_module_extables+0xe/0x40
  ? exc_page_fault+0x62/0x120
  ? asm_exc_page_fault+0x22/0x30
  ? netfs_pgpriv2_write_to_the_cache+0x15a/0x3f0
  ? netfs_pgpriv2_write_to_the_cache+0xf6/0x3f0
  netfs_read_termination_worker+0x1f/0x60
  process_one_work+0x138/0x2d0
  worker_thread+0x2a5/0x3b0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xba/0xe0
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x30/0x50
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>

Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
Note this patch doesn't apply to v6.14 as it was obsoleted by commit
e2d46f2ec332 ("netfs: Change the read result collector to only use one
work item").
---
 fs/netfs/read_pgpriv2.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
index 54d5004fec18..e72f5e674834 100644
--- a/fs/netfs/read_pgpriv2.c
+++ b/fs/netfs/read_pgpriv2.c
@@ -181,16 +181,17 @@ void netfs_pgpriv2_write_to_the_cache(struct netfs_io_request *rreq)
 			break;
 
 		folioq_unmark3(folioq, slot);
-		if (!folioq->marks3) {
+		while (!folioq->marks3) {
 			folioq = folioq->next;
 			if (!folioq)
-				break;
+				goto end_of_queue;
 		}
 
 		slot = __ffs(folioq->marks3);
 		folio = folioq_folio(folioq, slot);
 	}
 
+end_of_queue:
 	netfs_issue_write(wreq, &wreq->io_streams[1]);
 	smp_wmb(); /* Write lists before ALL_QUEUED. */
 	set_bit(NETFS_RREQ_ALL_QUEUED, &wreq->flags);
-- 
2.47.2
Re: [PATCH v6.13] fs/netfs/read_pgpriv2: skip folio queues without `marks3`
Posted by David Howells 11 months, 4 weeks ago
Max Kellermann <max.kellermann@ionos.com> wrote:

> At the beginning of the function, folio queues with marks3==0 are
> skipped, but after that, the `marks3` field is ignored.  If one such
> queue is found, `slot` is set to 64 (because `__ffs(0)==64`), leading
> to a buffer overflow in the folioq_folio() call.  The resulting crash
> may look like this:
> 
>  BUG: kernel NULL pointer dereference, address: 0000000000000000
> ...
> 
> Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
> Cc: stable@vger.kernel.org
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>

Signed-off-by: David Howells <dhowells@redhat.com>
Re: [PATCH v6.13] fs/netfs/read_pgpriv2: skip folio queues without `marks3`
Posted by Greg KH 12 months ago
On Mon, Feb 10, 2025 at 11:31:44PM +0100, Max Kellermann wrote:
> At the beginning of the function, folio queues with marks3==0 are
> skipped, but after that, the `marks3` field is ignored.  If one such
> queue is found, `slot` is set to 64 (because `__ffs(0)==64`), leading
> to a buffer overflow in the folioq_folio() call.  The resulting crash
> may look like this:
> 
>  BUG: kernel NULL pointer dereference, address: 0000000000000000
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 0 P4D 0
>  Oops: Oops: 0000 [#1] SMP PTI
>  CPU: 11 UID: 0 PID: 2909 Comm: kworker/u262:1 Not tainted 6.13.1-cm4all2-vm #415
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
>  Workqueue: events_unbound netfs_read_termination_worker
>  RIP: 0010:netfs_pgpriv2_write_to_the_cache+0x15a/0x3f0
>  Code: 48 85 c0 48 89 44 24 08 0f 84 24 01 00 00 48 8b 80 40 01 00 00 48 8b 7c 24 08 f3 48 0f bc c0 89 44 24 18 89 c0 48 8b 74 c7 08 <48> 8b 06 48 c7 04 24 00 10 00 00 a8 40 74 10 0f b6 4e 40 b8 00 10
>  RSP: 0018:ffffbbc440effe18 EFLAGS: 00010203
>  RAX: 0000000000000040 RBX: ffff96f8fc034000 RCX: 0000000000000000
>  RDX: 0000000000000040 RSI: 0000000000000000 RDI: ffff96f8fc036400
>  RBP: 0000000000001000 R08: ffff96f9132bb400 R09: 0000000000001000
>  R10: ffff96f8c1263c80 R11: 0000000000000003 R12: 0000000000001000
>  R13: ffff96f8fb75ade8 R14: fffffaaf5ca90000 R15: ffff96f8fb75ad00
>  FS:  0000000000000000(0000) GS:ffff9703cf0c0000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000000000000 CR3: 000000010c9ca003 CR4: 00000000001706b0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   <TASK>
>   ? __die+0x1f/0x60
>   ? page_fault_oops+0x158/0x450
>   ? search_extable+0x22/0x30
>   ? netfs_pgpriv2_write_to_the_cache+0x15a/0x3f0
>   ? search_module_extables+0xe/0x40
>   ? exc_page_fault+0x62/0x120
>   ? asm_exc_page_fault+0x22/0x30
>   ? netfs_pgpriv2_write_to_the_cache+0x15a/0x3f0
>   ? netfs_pgpriv2_write_to_the_cache+0xf6/0x3f0
>   netfs_read_termination_worker+0x1f/0x60
>   process_one_work+0x138/0x2d0
>   worker_thread+0x2a5/0x3b0
>   ? __pfx_worker_thread+0x10/0x10
>   kthread+0xba/0xe0
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork+0x30/0x50
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork_asm+0x1a/0x30
>   </TASK>
> 
> Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
> Cc: stable@vger.kernel.org
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> ---
> Note this patch doesn't apply to v6.14 as it was obsoleted by commit
> e2d46f2ec332 ("netfs: Change the read result collector to only use one
> work item").

Why can't we just take what is upstream instead?

Diverging from that ALWAYS ends up being more work and problems in the
end.  Only do so if you have no other choice.

thanks,

greg k-h
Re: [PATCH v6.13] fs/netfs/read_pgpriv2: skip folio queues without `marks3`
Posted by Max Kellermann 12 months ago
On Tue, Feb 11, 2025 at 7:30 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> > Note this patch doesn't apply to v6.14 as it was obsoleted by commit
> > e2d46f2ec332 ("netfs: Change the read result collector to only use one
> > work item").
>
> Why can't we just take what is upstream instead?
>
> Diverging from that ALWAYS ends up being more work and problems in the
> end.  Only do so if you have no other choice.

Usually I agree with that, and I trust that you will make the right decision.

Before you decide, let me point out that netfs has been extremely
unstable since 6.10 (July 2024), ever since commit 2e9d7e4b984a ("mm:
Remove the PG_fscache alias for PG_private_2). All of our web servers
have been crashing since 6.10 all the time (see
https://lore.kernel.org/netfs/CAKPOu+_DA8XiMAA2ApMj7Pyshve_YWknw8Hdt1=zCy9Y87R1qw@mail.gmail.com/
for one of several bug reports I posted), and I went through
considerable trouble by resisting the pressure from people asking me
to downgrade to 6.6. (I want the bugs fixed, I don't want to go back.)
For several months, 6.12 had been crashing instantly on boot due to
yet another netfs regression
(https://lore.kernel.org/netfs/CAKPOu+_4m80thNy5_fvROoxBm689YtA0dZ-=gcmkzwYSY4syqw@mail.gmail.com/)
which wasn't fixed until 6.12.11, so our production servers are still
on 6.11.11 today.
Before these bugs got ironed out, v6.11 commit ee4cdf7ba857 ("netfs:
Speed up buffered reading") wreaked more havoc, leading to 8 "Fixes"
commits so far, plus the 2 I posted yesterday.

I wrote that e2d46f2ec332 has obsoleted my patch (actually both
patches), but I don't know if it really fixes both bugs. The code that
was buggy does not exist anymore in the form that my patch addresses,
but I don't know if it was just refactored and the bugs were kept (or
maybe yet more bugs sneaked in).


Max
Re: [PATCH v6.13] fs/netfs/read_pgpriv2: skip folio queues without `marks3`
Posted by Greg KH 12 months ago
On Tue, Feb 11, 2025 at 09:05:44AM +0100, Max Kellermann wrote:
> On Tue, Feb 11, 2025 at 7:30 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> 
> > > Note this patch doesn't apply to v6.14 as it was obsoleted by commit
> > > e2d46f2ec332 ("netfs: Change the read result collector to only use one
> > > work item").
> >
> > Why can't we just take what is upstream instead?
> >
> > Diverging from that ALWAYS ends up being more work and problems in the
> > end.  Only do so if you have no other choice.
> 
> Usually I agree with that, and I trust that you will make the right decision.

I defer to the maintainers of the subsystem here to make that decision,
as they are the ones that will be getting my "FAILED" backport emails
when things fail to backport due to this being out-of-tree :)

thanks,

greg k-h