[PATCH v2] aoe: fix the potential use-after-free problem in more places

Chun-Yi Lee posted 1 patch 2 months, 2 weeks ago
There is a newer version of this series
drivers/block/aoe/aoecmd.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
[PATCH v2] aoe: fix the potential use-after-free problem in more places
Posted by Chun-Yi Lee 2 months, 2 weeks ago
For fixing CVE-2023-6270, f98364e92662 ("aoe: fix the potential
use-after-free problem in aoecmd_cfg_pkts") makes tx() calling dev_put()
instead of doing in aoecmd_cfg_pkts(). It avoids that the tx() runs
into use-after-free.

Then Nicolai Stange found more places in aoe have potential use-after-free
problem with tx(). e.g. revalidate(), aoecmd_ata_rw(), resend(), probe()
and aoecmd_cfg_rsp(). Those functions also use aoenet_xmit() to push
packet to tx queue. So they should also use dev_hold() to increase the
refcnt of skb->dev.

Link: https://nvd.nist.gov/vuln/detail/CVE-2023-6270
Fixes: f98364e92662 ("aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts")
Reported-by: Nicolai Stange <nstange@suse.com>
Signed-off-by: Chun-Yi Lee <jlee@suse.com>
---

v2:
- Improve patch description
    - Improved wording
    - Add oneline summary of the commit f98364e92662
- Used curly brackets in the if-else blocks.

 drivers/block/aoe/aoecmd.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index cc9077b588d7..d1f4ddc57645 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -361,6 +361,7 @@ ata_rw_frameinit(struct frame *f)
 	}
 
 	ah->cmdstat = ATA_CMD_PIO_READ | writebit | extbit;
+	dev_hold(t->ifp->nd);
 	skb->dev = t->ifp->nd;
 }
 
@@ -401,6 +402,8 @@ aoecmd_ata_rw(struct aoedev *d)
 		__skb_queue_head_init(&queue);
 		__skb_queue_tail(&queue, skb);
 		aoenet_xmit(&queue);
+	} else {
+		dev_put(f->t->ifp->nd);
 	}
 	return 1;
 }
@@ -483,10 +486,13 @@ resend(struct aoedev *d, struct frame *f)
 	memcpy(h->dst, t->addr, sizeof h->dst);
 	memcpy(h->src, t->ifp->nd->dev_addr, sizeof h->src);
 
+	dev_hold(t->ifp->nd);
 	skb->dev = t->ifp->nd;
 	skb = skb_clone(skb, GFP_ATOMIC);
-	if (skb == NULL)
+	if (skb == NULL) {
+		dev_put(t->ifp->nd);
 		return;
+	}
 	f->sent = ktime_get();
 	__skb_queue_head_init(&queue);
 	__skb_queue_tail(&queue, skb);
@@ -617,6 +623,8 @@ probe(struct aoetgt *t)
 		__skb_queue_head_init(&queue);
 		__skb_queue_tail(&queue, skb);
 		aoenet_xmit(&queue);
+	} else {
+		dev_put(f->t->ifp->nd);
 	}
 }
 
@@ -1395,6 +1403,7 @@ aoecmd_ata_id(struct aoedev *d)
 	ah->cmdstat = ATA_CMD_ID_ATA;
 	ah->lba3 = 0xa0;
 
+	dev_hold(t->ifp->nd);
 	skb->dev = t->ifp->nd;
 
 	d->rttavg = RTTAVG_INIT;
@@ -1404,6 +1413,8 @@ aoecmd_ata_id(struct aoedev *d)
 	skb = skb_clone(skb, GFP_ATOMIC);
 	if (skb)
 		f->sent = ktime_get();
+	else
+		dev_put(t->ifp->nd);
 
 	return skb;
 }
-- 
2.35.3
Re: [PATCH v2] aoe: fix the potential use-after-free problem in more places
Posted by Valentin Kleibel 2 months, 2 weeks ago
> Then Nicolai Stange found more places in aoe have potential use-after-free
> problem with tx(). e.g. revalidate(), aoecmd_ata_rw(), resend(), probe()
> and aoecmd_cfg_rsp(). Those functions also use aoenet_xmit() to push
> packet to tx queue. So they should also use dev_hold() to increase the
> refcnt of skb->dev.

We've tested your patch on our servers and ran into an issue.
With heavy I/O load the aoe device had stale I/Os (e.g. rsync waiting 
indefinetly on one core) that can be "fixed" by running aoe-revalidate 
on that device.

Additionally when trying to shut down the system we see the message:
unregister_netdevice: waiting for XXX to become free. Usage Count = XXXXX
on aoe devices with a usage count somewhere in the millions.
This has been the same as without the patch, i assume the fix is still 
incomplete.

Thanks for your work,
Valentin
Re: [PATCH v2] aoe: fix the potential use-after-free problem in more places
Posted by joeyli 1 month, 4 weeks ago
Hi Valentin,

On Thu, Sep 12, 2024 at 12:58:46PM +0200, Valentin Kleibel wrote:
> > Then Nicolai Stange found more places in aoe have potential use-after-free
> > problem with tx(). e.g. revalidate(), aoecmd_ata_rw(), resend(), probe()
> > and aoecmd_cfg_rsp(). Those functions also use aoenet_xmit() to push
> > packet to tx queue. So they should also use dev_hold() to increase the
> > refcnt of skb->dev.
> 
> We've tested your patch on our servers and ran into an issue.
> With heavy I/O load the aoe device had stale I/Os (e.g. rsync waiting
> indefinetly on one core) that can be "fixed" by running aoe-revalidate on
> that device.
> 
> Additionally when trying to shut down the system we see the message:
> unregister_netdevice: waiting for XXX to become free. Usage Count = XXXXX
> on aoe devices with a usage count somewhere in the millions.
> This has been the same as without the patch, i assume the fix is still
> incomplete.
>

For the reference count debugging, I have sent a patch series here:

[RFC PATCH 0/2] tracking the references of net_device in aoe
https://lore.kernel.org/lkml/20241002040616.25193-1-jlee@suse.com/T/#t

Base on my testing, the number of dev_hold(nd) and dev_put(nd) are balance
in aoe after the this 'aoe: fix the potential use-after-free problem in more places'
patch be applied on v6.11 kernel. I have tested add/modify/delete files in remote
target by aoe. My testing is not a heavy I/O testing. But the result is
balance.

Could you please help to try the above debug patch series for looking at the
refcnt value in aoe in your side?

Thanks a lot!
Joey Lee
Re: [PATCH v2] aoe: fix the potential use-after-free problem in more places
Posted by Valentin Kleibel 3 weeks, 4 days ago
Hi Joey,

>> We've tested your patch on our servers and ran into an issue.
>> With heavy I/O load the aoe device had stale I/Os (e.g. rsync waiting
>> indefinetly on one core) that can be "fixed" by running aoe-revalidate on
>> that device.
[...]> For the reference count debugging, I have sent a patch series here:
> 
> [RFC PATCH 0/2] tracking the references of net_device in aoe
> https://lore.kernel.org/lkml/20241002040616.25193-1-jlee@suse.com/T/#t
> 
> Base on my testing, the number of dev_hold(nd) and dev_put(nd) are balance
> in aoe after the this 'aoe: fix the potential use-after-free problem in more places'
> patch be applied on v6.11 kernel. I have tested add/modify/delete files in remote
> target by aoe. My testing is not a heavy I/O testing. But the result is
> balance.
> 
> Could you please help to try the above debug patch series for looking at the
> refcnt value in aoe in your side?

Thanks for your work, i can confirm refcnt value is balanced and the 
issue is fixed now.

However, the I/O waiting issue reported before is still there, and 
occurs more often now.
This problem started with the first patch CVE-2023-6270 applied in 
commit f98364e92662.
This only happens with heavy I/O on our "older" storage systems with 
spinning disks. Unfortunately we do not know how we could debug this, 
have you got any hints what we could do?

Thanks,
Valentin

PS: sorry for the delay, I'm now back from a long vacation
Re: [PATCH v2] aoe: fix the potential use-after-free problem in more places
Posted by joeyli 2 weeks, 4 days ago
Hi Valentin,

Sorry for my delay!

On Mon, Nov 04, 2024 at 02:38:20PM +0100, Valentin Kleibel wrote:
> Hi Joey,
> 
> > > We've tested your patch on our servers and ran into an issue.
> > > With heavy I/O load the aoe device had stale I/Os (e.g. rsync waiting
> > > indefinetly on one core) that can be "fixed" by running aoe-revalidate on
> > > that device.
> [...]> For the reference count debugging, I have sent a patch series here:
> > 
> > [RFC PATCH 0/2] tracking the references of net_device in aoe
> > https://lore.kernel.org/lkml/20241002040616.25193-1-jlee@suse.com/T/#t
> > 
> > Base on my testing, the number of dev_hold(nd) and dev_put(nd) are balance
> > in aoe after the this 'aoe: fix the potential use-after-free problem in more places'
> > patch be applied on v6.11 kernel. I have tested add/modify/delete files in remote
> > target by aoe. My testing is not a heavy I/O testing. But the result is
> > balance.
> > 
> > Could you please help to try the above debug patch series for looking at the
> > refcnt value in aoe in your side?
> 
> Thanks for your work, i can confirm refcnt value is balanced and the issue
> is fixed now.
>

Great! Thanks for your testing!
 
> However, the I/O waiting issue reported before is still there, and occurs
> more often now.
> This problem started with the first patch CVE-2023-6270 applied in commit
> f98364e92662.
> This only happens with heavy I/O on our "older" storage systems with
> spinning disks. Unfortunately we do not know how we could debug this, have
> you got any hints what we could do?

OK, spinning disk is good information. Could you please give more information
about your environment? e.g. CPU number, storage size shared by aoe? how heavy of
your I/O?

If the situation can be reproduced, then I think that perf can be used to analyze
bottleneck.

Regards
Joey Lee
Re: [PATCH v2] aoe: fix the potential use-after-free problem in more places
Posted by joeyli 2 months, 2 weeks ago
Hi Valentin,

On Thu, Sep 12, 2024 at 12:58:46PM +0200, Valentin Kleibel wrote:
> > Then Nicolai Stange found more places in aoe have potential use-after-free
> > problem with tx(). e.g. revalidate(), aoecmd_ata_rw(), resend(), probe()
> > and aoecmd_cfg_rsp(). Those functions also use aoenet_xmit() to push
> > packet to tx queue. So they should also use dev_hold() to increase the
> > refcnt of skb->dev.
> 
> We've tested your patch on our servers and ran into an issue.
> With heavy I/O load the aoe device had stale I/Os (e.g. rsync waiting
> indefinetly on one core) that can be "fixed" by running aoe-revalidate on
> that device.
> 
> Additionally when trying to shut down the system we see the message:
> unregister_netdevice: waiting for XXX to become free. Usage Count = XXXXX
> on aoe devices with a usage count somewhere in the millions.
> This has been the same as without the patch, i assume the fix is still
> incomplete.
>

Thanks for your testing! I will look into it and reproduce issue again for
improvement. 

Joey Lee
Re: [PATCH v2] aoe: fix the potential use-after-free problem in more places
Posted by Greg KH 2 months, 2 weeks ago
On Thu, Sep 12, 2024 at 06:29:35PM +0800, Chun-Yi Lee wrote:
> For fixing CVE-2023-6270, f98364e92662 ("aoe: fix the potential
> use-after-free problem in aoecmd_cfg_pkts") makes tx() calling dev_put()
> instead of doing in aoecmd_cfg_pkts(). It avoids that the tx() runs
> into use-after-free.
> 
> Then Nicolai Stange found more places in aoe have potential use-after-free
> problem with tx(). e.g. revalidate(), aoecmd_ata_rw(), resend(), probe()
> and aoecmd_cfg_rsp(). Those functions also use aoenet_xmit() to push
> packet to tx queue. So they should also use dev_hold() to increase the
> refcnt of skb->dev.
> 
> Link: https://nvd.nist.gov/vuln/detail/CVE-2023-6270
> Fixes: f98364e92662 ("aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts")
> Reported-by: Nicolai Stange <nstange@suse.com>
> Signed-off-by: Chun-Yi Lee <jlee@suse.com>
> ---
> 

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- You have marked a patch with a "Fixes:" tag for a commit that is in an
  older released kernel, yet you do not have a cc: stable line in the
  signed-off-by area at all, which means that the patch will not be
  applied to any older kernel releases.  To properly fix this, please
  follow the documented rules in the
  Documentation/process/stable-kernel-rules.rst file for how to resolve
  this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot