[v1] scsi: pm8001: Fix potential TOCTOU race in pm8001_find_tag

[PATCH] scsi: pm8001: Fix potential TOCTOU race in pm8001_find_tag

Posted by Chengfeng Ye 3 weeks ago

A potential time-of-check-time-of-use (TOCTOU) race condition in
pm8001_find_tag() where task->lldd_task is checked for non-NULL
and then dereferenced without synchronization to ensure atomicity.

Since the check of NULL and dereference in pm8001_find_tag() is not
executed atomically, a race could occur if the callback is executed in
response to an error or timeout on a SAS task issued from the SCSI
midlayer, while the SAS command is completed and calls
pm8001_ccb_task_free(), which sets task->lldd_task to NULL, resulting
in a null pointer being dereferenced in pm8001_find_tag().

Possible race scenario:
 CPU0 (Error Handler)           CPU1 (Interrupt Handler)
  --------------------           ------------------------
  [SCSI command timeout/error]
  sas_scsi_recover_host()
    sas_scsi_find_task()
      lldd_query_task()
        pm8001_query_task()
          pm8001_find_tag()
            if (task->lldd_task)
                                   [Hardware interrupt]
                                     mpi_ssp_completion()
                                       pm8001_ccb_task_free()
                                         task->lldd_task = NULL
            ccb = task->lldd_task
            *tag = ccb->ccb_tag
            <- NULL dereference

Fix this by using READ_ONCE() to read task->lldd_task exactly once,
eliminating the TOCTOU window. Also use WRITE_ONCE() in
pm8001_ccb_task_free() for proper memory ordering.

Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
---
 drivers/scsi/pm8001/pm8001_sas.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index 6a8d35aea93a..2d73e65db4c0 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -49,9 +49,10 @@
  */
 static int pm8001_find_tag(struct sas_task *task, u32 *tag)
 {
-	if (task->lldd_task) {
-		struct pm8001_ccb_info *ccb;
-		ccb = task->lldd_task;
+	struct pm8001_ccb_info *ccb;
+
+	ccb = READ_ONCE(task->lldd_task);
+	if (ccb) {
 		*tag = ccb->ccb_tag;
 		return 1;
 	}
@@ -617,7 +618,7 @@ void pm8001_ccb_task_free(struct pm8001_hba_info *pm8001_ha,
 			pm8001_dev ? atomic_read(&pm8001_dev->running_req) : -1);
 	}
 
-	task->lldd_task = NULL;
+	WRITE_ONCE(task->lldd_task, NULL);
 	pm8001_ccb_free(pm8001_ha, ccb);
 }
 
-- 
2.25.1

Re: [PATCH] scsi: pm8001: Fix potential TOCTOU race in pm8001_find_tag

Posted by James Bottomley 2 weeks, 4 days ago

On Sat, 2026-01-17 at 10:19 +0000, Chengfeng Ye wrote:
> A potential time-of-check-time-of-use (TOCTOU) race condition in
> pm8001_find_tag() where task->lldd_task is checked for non-NULL
> and then dereferenced without synchronization to ensure atomicity.
> 
> Since the check of NULL and dereference in pm8001_find_tag() is not
> executed atomically, a race could occur if the callback is executed
> in
> response to an error or timeout on a SAS task issued from the SCSI
> midlayer, while the SAS command is completed and calls
> pm8001_ccb_task_free(), which sets task->lldd_task to NULL, resulting
> in a null pointer being dereferenced in pm8001_find_tag().
> 
> Possible race scenario:
>  CPU0 (Error Handler)           CPU1 (Interrupt Handler)
>   --------------------           ------------------------
>   [SCSI command timeout/error]
>   sas_scsi_recover_host()
>     sas_scsi_find_task()
>       lldd_query_task()
>         pm8001_query_task()
>           pm8001_find_tag()
>             if (task->lldd_task)
>                                    [Hardware interrupt]
>                                      mpi_ssp_completion()
>                                        pm8001_ccb_task_free()
>                                          task->lldd_task = NULL
>             ccb = task->lldd_task
>             *tag = ccb->ccb_tag
>             <- NULL dereference
> 

I don't get how a race is possible here.  Before the query function
begins, the sas logic calls abort task on the tag, which means the
controller should ensure there are no further completion functions for
it regardless of whether the abort succeeds or not.

Regards,

James

Re: [PATCH] scsi: pm8001: Fix potential TOCTOU race in pm8001_find_tag

Posted by Chengfeng Ye 2 weeks, 4 days ago

> I don't get how a race is possible here.  Before the query function
> begins, the sas logic calls abort task on the tag, which means the
> controller should ensure there are no further completion functions for
> it regardless of whether the abort succeeds or not.

Thanks a lot for looking into it!

Sorry that I might miss something as I am not very familiar with the
code. But I also notice the find_tag() function is also invoked inside
the abort function (and invoked before the completion).  For the
find_tag() invoked inside the abort path will it be a race?
https://github.com/torvalds/linux/blob/master/drivers/scsi/pm8001/pm8001_sas.c#L1085

Best regards,
Chengfeng

Re: [PATCH] scsi: pm8001: Fix potential TOCTOU race in pm8001_find_tag

Posted by James Bottomley 2 weeks, 4 days ago

On Tue, 2026-01-20 at 00:50 +0800, Chengfeng Ye wrote:
> > I don't get how a race is possible here.  Before the query function
> > begins, the sas logic calls abort task on the tag, which means the
> > controller should ensure there are no further completion functions
> > for
> > it regardless of whether the abort succeeds or not.
> 
> Thanks a lot for looking into it!
> 
> Sorry that I might miss something as I am not very familiar with the
> code. But I also notice the find_tag() function is also invoked
> inside the abort function (and invoked before the completion).

This is part of the problem, though: you're apparently using some tool
looking for data races in an ancient driver but most of what you find
isn't significant and costs us review cycles to check.

> For the find_tag() invoked inside the abort path will it be a race?
> https://github.com/torvalds/linux/blob/master/drivers/scsi/pm8001/pm8001_sas.c#L1085

So the theory now is that in the couple of instruction cycles between
checking lldd_task and dereferencing it to find the tag, it goes null?
That's so astronomically unlikely precisely because abort is only
called on a task that timed out anyway and the completion function sets
the state done flags, which sas error handling checks, long before it
begins to free the lldd_task.

Regards,

James

Re: [PATCH] scsi: pm8001: Fix potential TOCTOU race in pm8001_find_tag

Posted by Chengfeng Ye 2 weeks, 4 days ago

> > Sorry that I might miss something as I am not very familiar with the
> > code. But I also notice the find_tag() function is also invoked
> > inside the abort function (and invoked before the completion).
>
> This is part of the problem, though: you're apparently using some tool
> looking for data races in an ancient driver but most of what you find
> isn't significant and costs us review cycles to check.

Sorry indeed for the extra efforts caused. I am implementing an
experimental tool to check for concurrency issues. I didn't mean to
bother you on purpose (but apologize if it did happen), as I just like
to report some potential issues and improve the security of the
codebase by fixing them.

> So the theory now is that in the couple of instruction cycles between
> checking lldd_task and dereferencing it to find the tag, it goes null?
> That's so astronomically unlikely precisely because abort is only
> called on a task that timed out anyway and the completion function sets
> the state done flags, which sas error handling checks, long before it
> begins to free the lldd_task.

That is also the point that bothers me: some races like this one can
only happen under rare scheduling due to the small race windows
(despite not being totally impossible and might be reported by lockdep
one day), and could introduce a security impact (like a crash in this
case) when they happen.
Also like this one:
https://lore.kernel.org/linux-scsi/CAAo+4rVWOzkz+HMc99c2D8tf2ZuwYHq39+jejaXWxD-PvUAuOA@mail.gmail.com/T/#t,
it is a UAF race between device removal and ioctl path (if I did not
miss anything), if measured by CVSS, it could be scored as a
low-severity vulnerability.

Will be more than appreciated if can learn more about the community's
attitude toward this kind of data race issues: do you prefer to fix
them as a preventative measure (especially if the fix is harmless), or
only like to fix them until they are encountered during execution, or
like to fix them if it is in important code like subsystem core
instead of an ancient driver?

Thanks again for your reply.

Best regards,
Chengfeng

Re: [PATCH] scsi: pm8001: Fix potential TOCTOU race in pm8001_find_tag

Posted by James Bottomley 2 weeks, 4 days ago

On Tue, 2026-01-20 at 12:39 +0800, Chengfeng Ye wrote:
> > > Sorry that I might miss something as I am not very familiar with
> > > the code. But I also notice the find_tag() function is also
> > > invoked inside the abort function (and invoked before the
> > > completion).
> > 
> > This is part of the problem, though: you're apparently using some
> > tool looking for data races in an ancient driver but most of what
> > you find isn't significant and costs us review cycles to check.
> 
> Sorry indeed for the extra efforts caused. I am implementing an
> experimental tool to check for concurrency issues. I didn't mean to
> bother you on purpose (but apologize if it did happen), as I just
> like to report some potential issues and improve the security of the
> codebase by fixing them.

But this too is a problem: fixes aren't free.  In fact a portion of the
patches sold as a bug fix eventually turn out to introduce a bug ...
and that new bug is one we didn't have before.  This is just a sad
consequence of the fact that all code produced by humans contains bugs.
The longer code is used, the more chance the bugs are found and the
less buggy it becomes (even with the bug fixes introducing bugs).  So
for really old drivers we assume most of the significant bugs have been
found and we try not to perturb the code base to avoid introducing new
bugs that, given the small and decreasing user base, will take ages to
find and eliminate.

On the scale of serious problems in older drivers, theoretical data
races that cause a crash don't rank highly simply because if the race
window were significant we'd already have seen it (the detection signal
is obvious and users aren't shy about reporting driver crashes). That
makes the probability of encountering the issue in the field way lower
than the probability that any fix will introduce a new bug.  So the
balance of risks argues against applying any fix.

Regards,

James

Re: [PATCH] scsi: pm8001: Fix potential TOCTOU race in pm8001_find_tag

Posted by Finn Thain 2 weeks, 3 days ago

On Tue, 20 Jan 2026, James Bottomley wrote:

> 
> But this too is a problem: fixes aren't free.  In fact a portion of the 
> patches sold as a bug fix eventually turn out to introduce a bug ... and 
> that new bug is one we didn't have before. This is just a sad 
> consequence of the fact that all code produced by humans contains bugs.

Yes. And if the shiny new tool has agency inasmuchas it tests patches in 
emulators, then the code produced by the new tool will also contain bugs.

Perhaps there are cycle-accurate, hardware-bug-compatible emulators for a 
few hardware designs. But asserting the correctness (hardware-equivalence) 
of such emulators runs into the very same problem.

So, unless tested on suitable hardware, one might expect driver patch 
submissions to be rejected with little or no review.

Re: [PATCH] scsi: pm8001: Fix potential TOCTOU race in pm8001_find_tag

Posted by Chengfeng Ye 2 weeks, 3 days ago

> But this too is a problem: fixes aren't free.  In fact a portion of the
> patches sold as a bug fix eventually turn out to introduce a bug ...
> and that new bug is one we didn't have before.  This is just a sad
> consequence of the fact that all code produced by humans contains bugs.
> The longer code is used, the more chance the bugs are found and the
> less buggy it becomes (even with the bug fixes introducing bugs).  So
> for really old drivers we assume most of the significant bugs have been
> found and we try not to perturb the code base to avoid introducing new
> bugs that, given the small and decreasing user base, will take ages to
> find and eliminate.
>
> On the scale of serious problems in older drivers, theoretical data
> races that cause a crash don't rank highly simply because if the race
> window were significant we'd already have seen it (the detection signal
> is obvious and users aren't shy about reporting driver crashes). That
> makes the probability of encountering the issue in the field way lower
> than the probability that any fix will introduce a new bug.  So the
> balance of risks argues against applying any fix.

Thank you indeed for the detailed response. It helps a lot, appreciate
much of the insightful reply that clarifies my long-standing
confusion...

Best regards,
Chengfeng

Re: [PATCH] scsi: pm8001: Fix potential TOCTOU race in pm8001_find_tag

Posted by Chengfeng Ye 2 weeks, 5 days ago

>  drivers/scsi/pm8001/pm8001_sas.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
> index 6a8d35aea93a..2d73e65db4c0 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -49,9 +49,10 @@
>   */
>  static int pm8001_find_tag(struct sas_task *task, u32 *tag)
>  {
> -       if (task->lldd_task) {
> -               struct pm8001_ccb_info *ccb;
> -               ccb = task->lldd_task;
> +       struct pm8001_ccb_info *ccb;
> +
> +       ccb = READ_ONCE(task->lldd_task);
> +       if (ccb) {
>                 *tag = ccb->ccb_tag;
>                 return 1;
>         }
> @@ -617,7 +618,7 @@ void pm8001_ccb_task_free(struct pm8001_hba_info *pm8001_ha,
>                         pm8001_dev ? atomic_read(&pm8001_dev->running_req) : -1);
>         }
>
> -       task->lldd_task = NULL;
> +       WRITE_ONCE(task->lldd_task, NULL);
>         pm8001_ccb_free(pm8001_ha, ccb);
>  }
>
> --
> 2.25.1
>

> Fix this by using READ_ONCE() to read task->lldd_task exactly once,
> eliminating the TOCTOU window. Also use WRITE_ONCE() in
> pm8001_ccb_task_free() for proper memory ordering.

It just hit me that the patch can only eliminate the race on
task->lldd_task but cannot guarantee cbb has not released by
pm8001_ccb_free(pm8001_ha, ccb) when accssed. Maybe we still require a
lock synchronization?

Chengfeng

Re: [PATCH] scsi: pm8001: Fix potential TOCTOU race in pm8001_find_tag

Posted by Markus Elfring 2 weeks, 6 days ago

…
> Fix this by using READ_ONCE() to read task->lldd_task exactly once,
> eliminating the TOCTOU window. Also use WRITE_ONCE() in
> pm8001_ccb_task_free() for proper memory ordering.

How do you think about to add any tags (like “Fixes” and “Cc”) accordingly?

Regards,
Markus