[v1] scsi: ppa: Fix use-after-free caused by unfinished delayed work

[PATCH RESEND] scsi: ppa: Fix use-after-free caused by unfinished delayed work

Posted by Duoming Zhou 1 month, 1 week ago

The delayed work item 'ppa_tq' is initialized in __ppa_attach() and
scheduled via ppa_queuecommand() for processing SCSI commands. When
the parallel port SCSI host adapter is detached through ppa_detach(),
the ppa_struct device instance is deallocated.

However, the delayed work might still be pending or executing when
ppa_detach() is called, leading to use-after-free bugs when the
work function ppa_interrupt() accesses the already freed ppa_struct
memory.

The race condition can occur as follows:

CPU 0(detach thread)   | CPU 1(delayed work)
                       | ppa_queuecommand()
                       |   ppa_queuecommand_lck()
ppa_detach()           |     schedule_delayed_work()
  kfree(dev) //FREE    | ppa_interrupt()
                       |   dev = container_of(...) //USE
                           dev-> //USE

Add disable_delayed_work_sync() in ppa_detach() to guarantee proper
cancellation of the delayed work item before ppa_struct is deallocated.
This is similar to commit ab58153ec64f ("scsi: imm: Fix use-after-free
bug caused by unfinished delayed work").

This bug was identified through static analysis.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@kernel.org
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 drivers/scsi/ppa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/ppa.c b/drivers/scsi/ppa.c
index ea682f3044b..8da2a78ebac 100644
--- a/drivers/scsi/ppa.c
+++ b/drivers/scsi/ppa.c
@@ -1136,6 +1136,7 @@ static void ppa_detach(struct parport *pb)
 	ppa_struct *dev;
 	list_for_each_entry(dev, &ppa_hosts, list) {
 		if (dev->dev->port == pb) {
+			disable_delayed_work_sync(&dev->ppa_tq);
 			list_del_init(&dev->list);
 			scsi_remove_host(dev->host);
 			scsi_host_put(dev->host);
-- 
2.34.1

Re: [PATCH RESEND] scsi: ppa: Fix use-after-free caused by unfinished delayed work

Posted by James Bottomley 1 month, 1 week ago

On Thu, 2026-01-01 at 21:55 +0800, Duoming Zhou wrote:
> diff --git a/drivers/scsi/ppa.c b/drivers/scsi/ppa.c
> index ea682f3044b..8da2a78ebac 100644
> --- a/drivers/scsi/ppa.c
> +++ b/drivers/scsi/ppa.c
> @@ -1136,6 +1136,7 @@ static void ppa_detach(struct parport *pb)
>  	ppa_struct *dev;
>  	list_for_each_entry(dev, &ppa_hosts, list) {
>  		if (dev->dev->port == pb) {
> +			disable_delayed_work_sync(&dev->ppa_tq);
>  			list_del_init(&dev->list);
>  			scsi_remove_host(dev->host);
>  			scsi_host_put(dev->host);

This fix looks bogus: if there's an active workqueue on ppa it's
because there's an outstanding command and it's emulating polling.  If
you stop the polling by disabling the workqueue, the command will never
return and the host will never get freed, so this will leak resources,
won't it?

Also the race condition you identify is one of many tied to an
incorrect ppa_struct lifetime: it should never be free'd before the
host itself is gone because a live host may do a callback which will
get the ppa_struct from hostdata, so if the host is still alive for any
reason when ppa_detach() is called, we'll get the same problem.

Regards,

James

Re: [PATCH RESEND] scsi: ppa: Fix use-after-free caused by unfinished delayed work

Posted by duoming@zju.edu.cn 1 month ago

On Thu, 01 Jan 2026 10:21:28 -0500, James Bottomley wrote:
> > diff --git a/drivers/scsi/ppa.c b/drivers/scsi/ppa.c
> > index ea682f3044b..8da2a78ebac 100644
> > --- a/drivers/scsi/ppa.c
> > +++ b/drivers/scsi/ppa.c
> > @@ -1136,6 +1136,7 @@ static void ppa_detach(struct parport *pb)
> >  	ppa_struct *dev;
> >  	list_for_each_entry(dev, &ppa_hosts, list) {
> >  		if (dev->dev->port == pb) {
> > +			disable_delayed_work_sync(&dev->ppa_tq);
> >  			list_del_init(&dev->list);
> >  			scsi_remove_host(dev->host);
> >  			scsi_host_put(dev->host);
> 
> This fix looks bogus: if there's an active workqueue on ppa it's
> because there's an outstanding command and it's emulating polling.  If
> you stop the polling by disabling the workqueue, the command will never
> return and the host will never get freed, so this will leak resources,
> won't it?

I believe that disabling the ppa_tq delayed work will not affect the Scsi_Host
release process. The lifetime of the Scsi_Host is managed by tagset_refcnt.
The tagset_refcnt is initialized to 1 in scsi_add_host_with_dma() and decreased
to 0 in scsi_remove_host(). since the delayed work callback ppa_interrupt() 
does not modify tagset_refcnt in any way, the host could be freed as expected.

> Also the race condition you identify is one of many tied to an
> incorrect ppa_struct lifetime: it should never be free'd before the
> host itself is gone because a live host may do a callback which will
> get the ppa_struct from hostdata, so if the host is still alive for any
> reason when ppa_detach() is called, we'll get the same problem.

The ppa_struct is properly freed only after ensuring the complete removal
of the associated Scsi_Host in ppa_detach(). The detailed sequence is as
follows:

...
   scsi_remove_host(dev->host);
   scsi_host_put(dev->host); //the host is gone
   parport_unregister_device(dev->dev);
   kfree(dev); //free ppa_struct
...

The scsi_remove_host() initiates the host removal process, transitioning
it through appropriate states(SHOST_CANCEL, SHOST_DEL) and ensuring all
pending I/O operations are properly handled. This sequence ensures that
all resources associated with the Scsi_Host are properly cleaned up before
the ppa_struct is freed.

Best regards,
Duoming Zhou

Re: [PATCH RESEND] scsi: ppa: Fix use-after-free caused by unfinished delayed work

Posted by James Bottomley 1 month ago

On Sat, 2026-01-03 at 10:24 +0800, duoming@zju.edu.cn wrote:
> On Thu, 01 Jan 2026 10:21:28 -0500, James Bottomley wrote:
> > > diff --git a/drivers/scsi/ppa.c b/drivers/scsi/ppa.c
> > > index ea682f3044b..8da2a78ebac 100644
> > > --- a/drivers/scsi/ppa.c
> > > +++ b/drivers/scsi/ppa.c
> > > @@ -1136,6 +1136,7 @@ static void ppa_detach(struct parport *pb)
> > >  	ppa_struct *dev;
> > >  	list_for_each_entry(dev, &ppa_hosts, list) {
> > >  		if (dev->dev->port == pb) {
> > > +			disable_delayed_work_sync(&dev->ppa_tq);
> > >  			list_del_init(&dev->list);
> > >  			scsi_remove_host(dev->host);
> > >  			scsi_host_put(dev->host);
> > 
> > This fix looks bogus: if there's an active workqueue on ppa it's
> > because there's an outstanding command and it's emulating polling. 
> > If you stop the polling by disabling the workqueue, the command
> > will never return and the host will never get freed, so this will
> > leak resources, won't it?
> 
> I believe that disabling the ppa_tq delayed work will not affect the
> Scsi_Host release process. The lifetime of the Scsi_Host is managed
> by tagset_refcnt. The tagset_refcnt is initialized to 1 in
> scsi_add_host_with_dma() and decreased to 0 in scsi_remove_host().
> since the delayed work callback ppa_interrupt() does not modify
> tagset_refcnt in any way, the host could be freed as expected.

Not if something else holds a reference to the host, which an
outstanding command does.  That's the point I was making above: as long
as the command doesn't return, everything is pinned and never gets
freed (well, possibly until timeout).  You cause that because the work
queue is only active if a command is outstanding, so when you disable
the queue in that situation the command remains outstanding and can
never complete normally.

> > Also the race condition you identify is one of many tied to an
> > incorrect ppa_struct lifetime: it should never be free'd before the
> > host itself is gone because a live host may do a callback which
> > will get the ppa_struct from hostdata, so if the host is still
> > alive for any reason when ppa_detach() is called, we'll get the
> > same problem.
> 
> The ppa_struct is properly freed only after ensuring the complete
> removal of the associated Scsi_Host in ppa_detach(). The detailed
> sequence is as follows:
> 
> ...
>    scsi_remove_host(dev->host);
>    scsi_host_put(dev->host); //the host is gone

The host is not gone if that put is not the final one as it won't be if
there's an outstanding command pinning the device.

Regards,

James

Re: [PATCH RESEND] scsi: ppa: Fix use-after-free caused by unfinished delayed work

Posted by duoming@zju.edu.cn 1 month ago

On Sat, 03 Jan 2026 14:55:54 -0500, James Bottomley wrote:
> > > > diff --git a/drivers/scsi/ppa.c b/drivers/scsi/ppa.c
> > > > index ea682f3044b..8da2a78ebac 100644
> > > > --- a/drivers/scsi/ppa.c
> > > > +++ b/drivers/scsi/ppa.c
> > > > @@ -1136,6 +1136,7 @@ static void ppa_detach(struct parport *pb)
> > > >  	ppa_struct *dev;
> > > >  	list_for_each_entry(dev, &ppa_hosts, list) {
> > > >  		if (dev->dev->port == pb) {
> > > > +			disable_delayed_work_sync(&dev->ppa_tq);
> > > >  			list_del_init(&dev->list);
> > > >  			scsi_remove_host(dev->host);
> > > >  			scsi_host_put(dev->host);
> > > 
> > > This fix looks bogus: if there's an active workqueue on ppa it's
> > > because there's an outstanding command and it's emulating polling. 
> > > If you stop the polling by disabling the workqueue, the command
> > > will never return and the host will never get freed, so this will
> > > leak resources, won't it?
> > 
> > I believe that disabling the ppa_tq delayed work will not affect the
> > Scsi_Host release process. The lifetime of the Scsi_Host is managed
> > by tagset_refcnt. The tagset_refcnt is initialized to 1 in
> > scsi_add_host_with_dma() and decreased to 0 in scsi_remove_host().
> > since the delayed work callback ppa_interrupt() does not modify
> > tagset_refcnt in any way, the host could be freed as expected.
> 
> Not if something else holds a reference to the host, which an
> outstanding command does.  That's the point I was making above: as long
> as the command doesn't return, everything is pinned and never gets
> freed (well, possibly until timeout).  You cause that because the work
> queue is only active if a command is outstanding, so when you disable
> the queue in that situation the command remains outstanding and can
> never complete normally.

When a request is sent to a SCSI device, a timer is set for the request.
If it times out, the request is directly terminated and removed from the
queue. The specific function call chain is as follows: 

scsi_queue_rq() -> blk_mq_start_request() -> blk_add_timer() 
-> blk_mq_timeout_work()

/**
 * blk_add_timer - Start timeout timer for a single request
 * @req:	request that is about to start running.
 *
 * Notes:
 *    Each request has its own timer, and as it is added to the queue, we
 *    set up the timer. When the request completes, we cancel the timer.
 */
void blk_add_timer(struct request *req)
{
    ...
	/*
	 * Some LLDs, like scsi, peek at the timeout to prevent a
	 * command from being retried forever.
	 */
    ...
}

static void blk_mq_timeout_work(struct work_struct *work)
{
    ...
    blk_mq_wait_quiesce_done();
    blk_mq_handle_expired();
    blk_queue_exit();
    ...
}

Therefore, this patch will not cause memory leaks or issues where
commands never return.

> > > Also the race condition you identify is one of many tied to an
> > > incorrect ppa_struct lifetime: it should never be free'd before the
> > > host itself is gone because a live host may do a callback which
> > > will get the ppa_struct from hostdata, so if the host is still
> > > alive for any reason when ppa_detach() is called, we'll get the
> > > same problem.
> > 
> > The ppa_struct is properly freed only after ensuring the complete
> > removal of the associated Scsi_Host in ppa_detach(). The detailed
> > sequence is as follows:
> > 
> > ...
> >    scsi_remove_host(dev->host);
> >    scsi_host_put(dev->host); //the host is gone
> 
> The host is not gone if that put is not the final one as it won't be if
> there's an outstanding command pinning the device.

The commands have already been handled by the timeout handler, so it will
not cause issues with the Scsi_Host failing to release properly.

Best regards,
Duoming Zhou

Re: [PATCH RESEND] scsi: ppa: Fix use-after-free caused by unfinished delayed work

Posted by James Bottomley 1 month ago

On Sun, 2026-01-04 at 22:35 +0800, duoming@zju.edu.cn wrote:
> On Sat, 03 Jan 2026 14:55:54 -0500, James Bottomley wrote:
> > > > > diff --git a/drivers/scsi/ppa.c b/drivers/scsi/ppa.c
> > > > > index ea682f3044b..8da2a78ebac 100644
> > > > > --- a/drivers/scsi/ppa.c
> > > > > +++ b/drivers/scsi/ppa.c
> > > > > @@ -1136,6 +1136,7 @@ static void ppa_detach(struct parport
> > > > > *pb)
> > > > >  	ppa_struct *dev;
> > > > >  	list_for_each_entry(dev, &ppa_hosts, list) {
> > > > >  		if (dev->dev->port == pb) {
> > > > > +			disable_delayed_work_sync(&dev-
> > > > > >ppa_tq);
> > > > >  			list_del_init(&dev->list);
> > > > >  			scsi_remove_host(dev->host);
> > > > >  			scsi_host_put(dev->host);
> > > > 
> > > > This fix looks bogus: if there's an active workqueue on ppa
> > > > it's because there's an outstanding command and it's emulating
> > > > polling. If you stop the polling by disabling the workqueue,
> > > > the command will never return and the host will never get
> > > > freed, so this will leak resources, won't it?
> > > 
> > > I believe that disabling the ppa_tq delayed work will not affect
> > > the Scsi_Host release process. The lifetime of the Scsi_Host is
> > > managed by tagset_refcnt. The tagset_refcnt is initialized to 1
> > > in scsi_add_host_with_dma() and decreased to 0 in
> > > scsi_remove_host(). since the delayed work callback
> > > ppa_interrupt() does not modify tagset_refcnt in any way, the
> > > host could be freed as expected.
> > 
> > Not if something else holds a reference to the host, which an
> > outstanding command does.  That's the point I was making above: as
> > long as the command doesn't return, everything is pinned and never
> > gets freed (well, possibly until timeout).  You cause that because
> > the work queue is only active if a command is outstanding, so when
> > you disable the queue in that situation the command remains
> > outstanding and can never complete normally.
> 
> When a request is sent to a SCSI device, a timer is set for the
> request. If it times out, the request is directly terminated and
> removed from the queue. The specific function call chain is as
> follows: 

I know how timeouts work ... I don't need the AI summary.  if there is
an outstanding command, the timeout will fire long after you've
disabled the queue and run through ppa_detach and the next thing that
will happen is the error handler will try to abort the command,
eventually causing ppa_abort to be called, which is going to
dereference the ppa_struct that ppa_detach freed.

Regards,

James

Re: [PATCH RESEND] scsi: ppa: Fix use-after-free caused by unfinished delayed work

Posted by duoming@zju.edu.cn 1 month ago

On Sun, 04 Jan 2026 18:30:48 -0500 James Bottomley wrote:
> > > > > > diff --git a/drivers/scsi/ppa.c b/drivers/scsi/ppa.c
> > > > > > index ea682f3044b..8da2a78ebac 100644
> > > > > > --- a/drivers/scsi/ppa.c
> > > > > > +++ b/drivers/scsi/ppa.c
> > > > > > @@ -1136,6 +1136,7 @@ static void ppa_detach(struct parport
> > > > > > *pb)
> > > > > >  	ppa_struct *dev;
> > > > > >  	list_for_each_entry(dev, &ppa_hosts, list) {
> > > > > >  		if (dev->dev->port == pb) {
> > > > > > +			disable_delayed_work_sync(&dev-
> > > > > > >ppa_tq);
> > > > > >  			list_del_init(&dev->list);
> > > > > >  			scsi_remove_host(dev->host);
> > > > > >  			scsi_host_put(dev->host);
> > > > > 
> > > > > This fix looks bogus: if there's an active workqueue on ppa
> > > > > it's because there's an outstanding command and it's emulating
> > > > > polling. If you stop the polling by disabling the workqueue,
> > > > > the command will never return and the host will never get
> > > > > freed, so this will leak resources, won't it?
> > > > 
> > > > I believe that disabling the ppa_tq delayed work will not affect
> > > > the Scsi_Host release process. The lifetime of the Scsi_Host is
> > > > managed by tagset_refcnt. The tagset_refcnt is initialized to 1
> > > > in scsi_add_host_with_dma() and decreased to 0 in
> > > > scsi_remove_host(). since the delayed work callback
> > > > ppa_interrupt() does not modify tagset_refcnt in any way, the
> > > > host could be freed as expected.
> > > 
> > > Not if something else holds a reference to the host, which an
> > > outstanding command does.  That's the point I was making above: as
> > > long as the command doesn't return, everything is pinned and never
> > > gets freed (well, possibly until timeout).  You cause that because
> > > the work queue is only active if a command is outstanding, so when
> > > you disable the queue in that situation the command remains
> > > outstanding and can never complete normally.
> > 
> > When a request is sent to a SCSI device, a timer is set for the
> > request. If it times out, the request is directly terminated and
> > removed from the queue. The specific function call chain is as
> > follows: 
> 
> I know how timeouts work ... I don't need the AI summary.  if there is
> an outstanding command, the timeout will fire long after you've
> disabled the queue and run through ppa_detach and the next thing that
> will happen is the error handler will try to abort the command,
> eventually causing ppa_abort to be called, which is going to
> dereference the ppa_struct that ppa_detach freed.

What do you think of removing the ppa_abort() callback function? 
This callback function is not necessary. The eh_abort_handler 
function pointer is checked in scsi_abort_command() function, 
and if the function pointer does not exist, it directly returns 
FAILED. The subsequent process will be handled by SCSI error 
handler thread - scsi_error_handler(). Therefore the issue you
mentioned could be avoided.

Best regards,
Duoming Zhou

Re: [PATCH RESEND] scsi: ppa: Fix use-after-free caused by unfinished delayed work

Posted by James Bottomley 1 month ago

On Tue, 2026-01-06 at 13:35 +0800, duoming@zju.edu.cn wrote:
> On Sun, 04 Jan 2026 18:30:48 -0500 James Bottomley wrote:
[...]
> > I know how timeouts work ... I don't need the AI summary.  if there
> > is an outstanding command, the timeout will fire long after you've
> > disabled the queue and run through ppa_detach and the next thing
> > that will happen is the error handler will try to abort the
> > command, eventually causing ppa_abort to be called, which is going
> > to dereference the ppa_struct that ppa_detach freed.
> 
> What do you think of removing the ppa_abort() callback function? 
> This callback function is not necessary. The eh_abort_handler 
> function pointer is checked in scsi_abort_command() function, 
> and if the function pointer does not exist, it directly returns 
> FAILED. The subsequent process will be handled by SCSI error 
> handler thread - scsi_error_handler(). Therefore the issue you
> mentioned could be avoided.

Well, no, because that would lose us runtime error handling which is
pretty essential for a flakey device like parport.  Also, if you remove
that callback, it will escalate to the reset handler, which does
exactly the same thing with ppa_struct.

I was originally looking at this as a reference counting problem, but
there is another way of solving it and that's to stop requests and
drain the queue before freeing the resources.  It turns out that's
exactly what scsi_remove_host() does (via __scsi_remove_device which
calls blk_mq_destroy_queue) so, since there can't be any outstanding
commands, it's actually impossible the delayed work queue is active
after scsi_remove_host() is called and thus the original race you
identified can't actually happen.

Regards,

James