[PATCH] scsi: sd: call device_del() if device_add_disk() fails

Fabio M. De Francesco posted 1 patch 4 years, 2 months ago
drivers/scsi/sd.c | 1 +
1 file changed, 1 insertion(+)
[PATCH] scsi: sd: call device_del() if device_add_disk() fails
Posted by Fabio M. De Francesco 4 years, 2 months ago
In sd_probe(), if device_add_disk() fails it simply calls put_device()
and jumps to the "out" label but the device is never deleted from system.
This leads to a memory leak as reported by Syzbot.[1]

Fix this bug by calling device_del() soon before put_device() when 
device_add_disk() fails.

[1] [syzbot] memory leak in blk_mq_init_tags
https://lore.kernel.org/lkml/000000000000c341cc05db38c1b0@google.com/

Reported-by: syzbot+f08c77040fa163a75a46@syzkaller.appspotmail.com
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
---

This patch replace the previous attempt to fix the bug reported by
Syzbot. Therefore, the previous wrong patch at 
https://lore.kernel.org/lkml/20220328084452.11479-1-fmdefrancesco@gmail.com/
must be discarded.

Many thanks to Dan Carpenter.

 drivers/scsi/sd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index a390679cf458..13d96d0f9dde 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3474,6 +3474,7 @@ static int sd_probe(struct device *dev)
 
 	error = device_add_disk(dev, gd, NULL);
 	if (error) {
+		device_del(&sdkp->disk_dev);
 		put_device(&sdkp->disk_dev);
 		goto out;
 	}
-- 
2.34.1
Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails
Posted by Dan Carpenter 4 years, 2 months ago
On Tue, Mar 29, 2022 at 05:49:48PM +0200, Fabio M. De Francesco wrote:
> In sd_probe(), if device_add_disk() fails it simply calls put_device()
> and jumps to the "out" label but the device is never deleted from system.
> This leads to a memory leak as reported by Syzbot.[1]
> 
> Fix this bug by calling device_del() soon before put_device() when 
> device_add_disk() fails.
> 
> [1] [syzbot] memory leak in blk_mq_init_tags
> https://lore.kernel.org/lkml/000000000000c341cc05db38c1b0@google.com/
> 
> Reported-by: syzbot+f08c77040fa163a75a46@syzkaller.appspotmail.com
> Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
> Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>

Thanks!

regards,
dan carpenter
Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails
Posted by Wenchao Hao 4 years, 2 months ago
I do not think it's necessary to call device_del() on this path. If the device
has been added, put_device() would delete it from sysfs. So the origin error
handle is ok with me.
Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails
Posted by Dan Carpenter 4 years, 2 months ago
On Thu, Mar 31, 2022 at 11:26:22AM -0400, 'Wenchao Hao' via syzkaller-bugs wrote:
> I do not think it's necessary to call device_del() on this path. If the device
> has been added, put_device() would delete it from sysfs. So the origin error
> handle is ok with me.
> 

No.  The original is buggy and it was detected at runtime by syzbot.
It's not static analysis, it is an actual bug found in testing.

The device_put() unwinds device_initialize().  The device_del() unwinds
device_add().  Take a look at the comments to device_add() or take a
look at how device_register/unregister() work.

The temptation was to call device_unregister() which is a combined
device_del(); device_put(); but when the device_initialize() and
device_add() are called separately, then I think it is more readable to
call del and put separately as well.

regards,
dan carpenter
Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails
Posted by Wenchao Hao 4 years, 2 months ago
On 2022/3/31 13:41, Dan Carpenter wrote:
> On Thu, Mar 31, 2022 at 11:26:22AM -0400, 'Wenchao Hao' via syzkaller-bugs wrote:
>> I do not think it's necessary to call device_del() on this path. If the device
>> has been added, put_device() would delete it from sysfs. So the origin error
>> handle is ok with me.
>>
> 
> No.  The original is buggy and it was detected at runtime by syzbot.
> It's not static analysis, it is an actual bug found in testing.
> 
Yes, it's a bug, but the root reason is not we forget to call 
device_del(sdkp->disk_dev). It's because we did not cleanup gendisk.
The leak memory is allocated in elevator_init_mq(), we should clean
this memory via blk_cleanup_queue().

I summit a patch which would fix this memory leak:

https://lore.kernel.org/linux-scsi/20220401011018.1026553-1-haowenchao@huawei.com/T/#u

> The device_put() unwinds device_initialize().  The device_del() unwinds
> device_add().  Take a look at the comments to device_add() or take a
> look at how device_register/unregister() work.
> 

You may read the implement of put_device(), it is based on kobj_xxx.
If the kobj is still in sysfs, a cleanup would be performed.
And device_del() seems would not decrease the reference count of kobj,
the main aim is to make it invisibleto sysfs.
Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails
Posted by Dan Carpenter 4 years, 2 months ago
Wenchao Hao, what you're saying makes a lot of sense but it raises a lot
of questions in turn.

Fabio, did you test your patch?

This is another reason why syzbot should display the whole dmesg because
otherwise we can't ask people link to their test results if it just says
"PASSED" with no additional information.  If syzbot provided a dmesg
at the end then I would require a link to it under the --- cut off
line for patches that I review.

In a way this gets back to the original testing that syzbot did do.  If
Wenchao's reading of the code is correct the Fabio's patch caused a
series of use after free bugs but because the test results just said
"PASSED" with no additional information.

Either way, failing to call device_del() is still a bug.

Also, I don't really understand why we don't have to call
put_device(&sdkp->disk_dev) at the end of sd_remove().

regards,
dan carpenter
Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails
Posted by James Bottomley 4 years, 2 months ago
On Thu, 2022-03-31 at 16:42 +0300, Dan Carpenter wrote:
[...]
> Also, I don't really understand why we don't have to call
> put_device(&sdkp->disk_dev) at the end of sd_remove().

That's because the final put is done by the gendisk ->free_disk()
function which is scsi_disk_free_disk().  Most of the gendisk functions
we provide convert a gendisk to a scsi_disk (via the gendisk
private_data), so the sdkp has to live as long as the gendisk.

James
Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails
Posted by Dan Carpenter 4 years, 2 months ago
On Thu, Mar 31, 2022 at 10:19:57AM -0400, James Bottomley wrote:
> On Thu, 2022-03-31 at 16:42 +0300, Dan Carpenter wrote:
> [...]
> > Also, I don't really understand why we don't have to call
> > put_device(&sdkp->disk_dev) at the end of sd_remove().
> 
> That's because the final put is done by the gendisk ->free_disk()
> function which is scsi_disk_free_disk().  Most of the gendisk functions
> we provide convert a gendisk to a scsi_disk (via the gendisk
> private_data), so the sdkp has to live as long as the gendisk.
> 
> James

Thanks James.

Ah...  And scsi_disk_free_disk() will not be called unless
device_add_disk() succeeds.  So Wenchao Hao's patch is correct.

And after that there is just a missing device_del() and also I see
another problem where if device_add() fails then we need to call
put_disk(gd); on that error path.

regards,
dan carpenter
Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails
Posted by Fabio M. De Francesco 4 years, 2 months ago
On gioved? 31 marzo 2022 15:42:10 CEST Dan Carpenter wrote:
> Wenchao Hao, what you're saying makes a lot of sense but it raises a lot
> of questions in turn.
> 
> Fabio, did you test your patch?

Yes, I did, Dan. I tested it the usual way with the "#syz test:" command.
Obviously I have not the hardware to test code on it.

Therefore, the messages that say "Syzbot tested the patch and it didn't
trigger any issue" is all that I can know about the code being good or not.
This is the criterion I've always used before sending patches for Syzbot's
reports.

However, my knowledge of these subsystems and the API that are related to 
this bug were very little, but now I can say that during the last couple of 
days it has improved to the point where I can affirm that Wenchao's patch
seems to me to be the only correct solution.

Thanks for all the help you and the the other developers provided. It was
invaluable for a better understanding of this matter.

Regards,

Fabio M. De Francesco
Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails
Posted by Dan Carpenter 4 years, 2 months ago
On Thu, Mar 31, 2022 at 06:14:27PM +0200, Fabio M. De Francesco wrote:
> On gioved? 31 marzo 2022 15:42:10 CEST Dan Carpenter wrote:
> > Wenchao Hao, what you're saying makes a lot of sense but it raises a lot
> > of questions in turn.
> > 
> > Fabio, did you test your patch?
> 
> Yes, I did, Dan. I tested it the usual way with the "#syz test:" command.
> Obviously I have not the hardware to test code on it.
> 

Yeah.  What a nightmare.  You posted a link to the first test.  It said
passed but definitely introduced some use after frees but how was anyone
supposed to know?

No way we would have figured this out.  I'm working to make Smatch
understand device_put() better but this one is way difficult.

Sorry that you went through this.

regards,
dan carpenter
Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails
Posted by Fabio M. De Francesco 4 years, 2 months ago
On gioved? 31 marzo 2022 18:24:16 CEST Dan Carpenter wrote:
> On Thu, Mar 31, 2022 at 06:14:27PM +0200, Fabio M. De Francesco wrote:
> > On gioved? 31 marzo 2022 15:42:10 CEST Dan Carpenter wrote:
> > > Wenchao Hao, what you're saying makes a lot of sense but it raises a lot
> > > of questions in turn.
> > > 
> > > Fabio, did you test your patch?
> > 
> > Yes, I did, Dan. I tested it the usual way with the "#syz test:" command.
> > Obviously I have not the hardware to test code on it.
> > 
> 
> Yeah.  What a nightmare.  You posted a link to the first test.  It said
> passed but definitely introduced some use after frees but how was anyone
> supposed to know?

Maybe that a "spare-time Linux developer" like me should leave these 
kinds of bug fixes to more experienced people. But we should also note 
that I tried two or three different patches and _all_ of them passed
the tests. 

> 
> No way we would have figured this out.

I think that something should change about the way Syzbot tests patches 
and about how it provides the results. The other four or five bugs that 
I have fixed were based mainly to the fact that they passed the Syzbot 
tests. 

Perhaps I've been lucky but my patches were good and they were merged. 

However, I began to trust Syzbot too much. This is not how I should 
approach and try to solve bugs.

> I'm working to make Smatch
> understand device_put() better but this one is way difficult.
> 
> Sorry that you went through this.

Please don't be sorry :)

Believe me when I say that I cannot explain how many things I have 
learned during these days while working on this issue. I see no 
problems at all but only opportunities for learning.

Thank you very much!

Fabio M. De Francesco

> 
> regards,
> dan carpenter
> 
>