[v2] nvme: correctly fix admin request_queue lifetime

[PATCH 6.1.y v2 0/6] nvme: correctly fix admin request_queue lifetime

Posted by Heyne, Maximilian 20 hours ago

The initial attempt to backport upstream commit 03b3bcd319b3 ("nvme: fix
admin request_queue lifetime") was not correct leading to refcount
underflows and not even fixing the problem.

I've tested the reproduction steps from [1] (adding a delay to
nvme_submit_user_cmd and 'echo 1 | sudo tee
/sys/class/nvme/nvme0/delete_controller') on the nvme-tcp driver which
printed the KASAN UAF blurb.

Fixing the issue in the 6.1 series requires a few dependent patches.
This is mainly the upstream commit 2b3f056f72e5 ("blk-mq: move the call
to blk_put_queue out of blk_mq_destroy_queue") which allows to move the
blk_put_queue to a different location.

The backport of commit 03b3bcd319b3 ("nvme: fix admin
request_queue lifetime") needed a tweak to the nvme pci driver.

Furthermore, in this patch series I've also included a follow-up fixup
from upstream commit b84bb7bd913d ("nvme: fix admin queue leak on
controller reset"), again with an adaption to the nvme pci driver. This
issue could easily be reproduced by resetting the controller (no need to
run full blktests):

  echo 1 > /sys/class/nvme/nvme0/reset_controller

[1] https://lore.kernel.org/all/20251029210853.20768-1-cachen@purestorage.com/

---
Changes in v2:
    - dropped 2 patches from the series that are unnecessary (scsi and
      apple). The apple-nvme patch was even wrong (Thanks Fedor for
      pointing that out)

Christoph Hellwig (3):
  blk-mq: move the call to blk_put_queue out of blk_mq_destroy_queue
  nvme-pci: remove an extra queue reference
  nvme-pci: put the admin queue in nvme_dev_remove_admin

Keith Busch (1):
  nvme: fix admin request_queue lifetime

Maximilian Heyne (1):
  Revert "nvme: fix admin request_queue lifetime"

Ming Lei (1):
  nvme: fix admin queue leak on controller reset

 block/blk-mq.c            |  4 +---
 block/bsg-lib.c           |  2 ++
 drivers/nvme/host/apple.c |  1 +
 drivers/nvme/host/core.c  | 16 ++++++++++++++--
 drivers/nvme/host/pci.c   | 14 +++++++-------
 drivers/scsi/scsi_sysfs.c |  1 +
 drivers/ufs/core/ufshcd.c |  2 ++
 7 files changed, 28 insertions(+), 12 deletions(-)

-- 
2.50.1




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

Re: [PATCH 6.1.y v2 0/6] nvme: correctly fix admin request_queue lifetime

Posted by Fedor Pchelkin 31 minutes ago

"Heyne, Maximilian" <mheyne@amazon.de> wrote:
> The initial attempt to backport upstream commit 03b3bcd319b3 ("nvme: fix
> admin request_queue lifetime") was not correct leading to refcount
> underflows and not even fixing the problem.
> 
> I've tested the reproduction steps from [1] (adding a delay to
> nvme_submit_user_cmd and 'echo 1 | sudo tee
> /sys/class/nvme/nvme0/delete_controller') on the nvme-tcp driver which
> printed the KASAN UAF blurb.
> 
> Fixing the issue in the 6.1 series requires a few dependent patches.
> This is mainly the upstream commit 2b3f056f72e5 ("blk-mq: move the call
> to blk_put_queue out of blk_mq_destroy_queue") which allows to move the
> blk_put_queue to a different location.
> 
> The backport of commit 03b3bcd319b3 ("nvme: fix admin
> request_queue lifetime") needed a tweak to the nvme pci driver.
> 
> Furthermore, in this patch series I've also included a follow-up fixup
> from upstream commit b84bb7bd913d ("nvme: fix admin queue leak on
> controller reset"), again with an adaption to the nvme pci driver. This
> issue could easily be reproduced by resetting the controller (no need to
> run full blktests):
> 
>   echo 1 > /sys/class/nvme/nvme0/reset_controller

For the series

Tested-by: Fedor Pchelkin <pchelkin@ispras.ru>

Thanks for the prompt fix.