[PATCH] rfkill: fix deadlock in rfkill_send_events

Edward AD posted 1 patch 2 years, 2 months ago
net/rfkill/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
[PATCH] rfkill: fix deadlock in rfkill_send_events
Posted by Edward AD 2 years, 2 months ago
syzbot report:
syz-executor675/5132 is trying to acquire lock:
ffff8880297ee088 (&data->mtx){+.+.}-{3:3}, at: rfkill_send_events+0x226/0x3f0 net/rfkill/core.c:286

but task is already holding lock:
ffff88801bfc0088 (&data->mtx){+.+.}-{3:3}, at: rfkill_fop_open+0x146/0x750 net/rfkill/core.c:1183

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&data->mtx);
  lock(&data->mtx);

 *** DEADLOCK ***

In 2c3dfba4cf84 insert rfkill_sync() to rfkill_fop_open(), it will call
rfkill_send_events() and then triger this issue.

Fixes: 2c3dfba4cf84 ("rfkill: sync before userspace visibility/changes")
Reported-and-tested-by: syzbot+509238e523e032442b80@syzkaller.appspotmail.com
Signed-off-by: Edward AD <twuufnxlz@gmail.com>
---
 net/rfkill/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 08630896b6c8..a14e0d4a0b00 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -1180,7 +1180,6 @@ static int rfkill_fop_open(struct inode *inode, struct file *file)
 	init_waitqueue_head(&data->read_wait);
 
 	mutex_lock(&rfkill_global_mutex);
-	mutex_lock(&data->mtx);
 	/*
 	 * start getting events from elsewhere but hold mtx to get
 	 * startup events added first
@@ -1191,9 +1190,12 @@ static int rfkill_fop_open(struct inode *inode, struct file *file)
 		if (!ev)
 			goto free;
 		rfkill_sync(rfkill);
+		mutex_lock(&data->mtx);
 		rfkill_fill_event(&ev->ev, rfkill, RFKILL_OP_ADD);
 		list_add_tail(&ev->list, &data->events);
+		mutex_unlock(&data->mtx);
 	}
+	mutex_lock(&data->mtx);
 	list_add(&data->list, &rfkill_fds);
 	mutex_unlock(&data->mtx);
 	mutex_unlock(&rfkill_global_mutex);
-- 
2.25.1
Re: [PATCH] rfkill: fix deadlock in rfkill_send_events
Posted by Simon Horman 2 years, 2 months ago
On Tue, Oct 10, 2023 at 09:08:15AM +0800, Edward AD wrote:
> syzbot report:
> syz-executor675/5132 is trying to acquire lock:
> ffff8880297ee088 (&data->mtx){+.+.}-{3:3}, at: rfkill_send_events+0x226/0x3f0 net/rfkill/core.c:286
> 
> but task is already holding lock:
> ffff88801bfc0088 (&data->mtx){+.+.}-{3:3}, at: rfkill_fop_open+0x146/0x750 net/rfkill/core.c:1183
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&data->mtx);
>   lock(&data->mtx);
> 
>  *** DEADLOCK ***
> 
> In 2c3dfba4cf84 insert rfkill_sync() to rfkill_fop_open(), it will call
> rfkill_send_events() and then triger this issue.
> 
> Fixes: 2c3dfba4cf84 ("rfkill: sync before userspace visibility/changes")
> Reported-and-tested-by: syzbot+509238e523e032442b80@syzkaller.appspotmail.com
> Signed-off-by: Edward AD <twuufnxlz@gmail.com>

Hi Edward,

I am wondering if you considered moving the rfkill_sync() calls
to before &data->mtx is taken, to avoid the need to drop and
retake it?

Perhaps it doesn't work for some reason (compile tested only!).
But this does seem somehow cleaner for me.
Re: [PATCH] rfkill: fix deadlock in rfkill_send_events
Posted by Edward AD 2 years, 2 months ago
Hi Simon Horman,
On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote:
> I am wondering if you considered moving the rfkill_sync() calls
> to before &data->mtx is taken, to avoid the need to drop and
> retake it?
If you move rfkill_sync() before calling &data->mtx, more code will be added 
because rfkill_sync() is in the loop body.
> 
> Perhaps it doesn't work for some reason (compile tested only!).
> But this does seem somehow cleaner for me.
BR,
edward
Re: [PATCH] rfkill: fix deadlock in rfkill_send_events
Posted by Simon Horman 2 years, 2 months ago
On Sat, Oct 14, 2023 at 10:43:22AM +0800, Edward AD wrote:
> Hi Simon Horman,
> On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote:
> > I am wondering if you considered moving the rfkill_sync() calls
> > to before &data->mtx is taken, to avoid the need to drop and
> > retake it?
> If you move rfkill_sync() before calling &data->mtx, more code will be added 
> because rfkill_sync() is in the loop body.

Maybe that is true. And maybe that is a good argument for
not taking the approach that I suggested. But I do think it
is simpler from a locking perspective, and that has some merit.

> > 
> > Perhaps it doesn't work for some reason (compile tested only!).
> > But this does seem somehow cleaner for me.
> BR,
> edward
>
Re: [PATCH] rfkill: fix deadlock in rfkill_send_events
Posted by Johannes Berg 2 years, 2 months ago
On Sat, 2023-10-14 at 09:29 +0200, Simon Horman wrote:
> On Sat, Oct 14, 2023 at 10:43:22AM +0800, Edward AD wrote:
> > Hi Simon Horman,
> > On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote:
> > > I am wondering if you considered moving the rfkill_sync() calls
> > > to before &data->mtx is taken, to avoid the need to drop and
> > > retake it?
> > If you move rfkill_sync() before calling &data->mtx, more code will be added 
> > because rfkill_sync() is in the loop body.
> 
> Maybe that is true. And maybe that is a good argument for
> not taking the approach that I suggested. But I do think it
> is simpler from a locking perspective, and that has some merit.
> 

FWIW, I missed this patch and discussion until now, but I already fixed
the issue differently:

https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless.git/commit/?id=f2ac54ebf85615a6d78f5eb213a8bbeeb17ebe5d

There was never any need to hold the data->mtx for anything but the list
manipulation, and even that isn't _really_ needed since the 'data' is
completely fresh and not seen anywhere else yet.

(I'll also note that the subject of this thread is wrong since this was
never an *actual* deadlock, just a *possible* one reported by lockdep.)

johannes