drivers/nvme/host/core.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
nvme_update_ns_info_block() trusts id->lbaf[lbaf].ds from the
controller and assigns it directly to ns->head->lba_shift without
bounds checking. nvme_lba_to_sect() then does:
return lba << (head->lba_shift - SECTOR_SHIFT);
When called with lba = le64_to_cpu(id->nsze) to compute the device
capacity, an attacker-controlled controller can choose ds < 9 or a
combination of (ds, nsze) that makes the left shift overflow
sector_t. The former is a C undefined behaviour that UBSAN reports
as a BUG; the latter silently yields a bogus capacity that the
block layer then trusts for bounds checking.
Validate ds against SECTOR_SHIFT and use check_shl_overflow() to
compute capacity so that any (ds, nsze) combination that would
overflow sector_t is rejected. The namespace is skipped with -EIO
instead of crashing the kernel. This is reachable by a malicious
NVMe device, a buggy firmware, or an attacker-controlled NVMe-oF
target.
Stack trace (UBSAN, ds < 9 variant):
RIP: nvme_lba_to_sect drivers/nvme/host/nvme.h:699 [inline]
RIP: nvme_update_ns_info_block.cold+0x5/0x7
Call Trace:
nvme_update_ns_info+0x175/0xd90 drivers/nvme/host/core.c:2467
nvme_validate_ns drivers/nvme/host/core.c:4299 [inline]
nvme_scan_ns drivers/nvme/host/core.c:4350
nvme_scan_ns_async+0xa5/0xe0 drivers/nvme/host/core.c:4383
async_run_entry_fn
process_one_work
worker_thread
kthread
Found by Syzkaller.
Fixes: 9419e71b8d67 ("nvme: move ns id info to struct nvme_ns_head")
Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---
drivers/nvme/host/core.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1e33af94c24..9b3bf3e4075 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2407,9 +2407,19 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
lim = queue_limits_start_update(ns->disk->queue);
memflags = blk_mq_freeze_queue(ns->disk->queue);
+ if (id->lbaf[lbaf].ds < SECTOR_SHIFT ||
+ check_shl_overflow(le64_to_cpu(id->nsze),
+ id->lbaf[lbaf].ds - SECTOR_SHIFT,
+ &capacity)) {
+ dev_warn_once(ns->ctrl->device,
+ "invalid LBA data size %u, skipping namespace\n",
+ id->lbaf[lbaf].ds);
+ ret = -EIO;
+ blk_mq_unfreeze_queue(ns->disk->queue, memflags);
+ goto out;
+ }
ns->head->lba_shift = id->lbaf[lbaf].ds;
ns->head->nuse = le64_to_cpu(id->nuse);
- capacity = nvme_lba_to_sect(ns->head, le64_to_cpu(id->nsze));
nvme_set_ctrl_limits(ns->ctrl, &lim, false);
nvme_configure_metadata(ns->ctrl, ns->head, id, nvm, info);
nvme_set_chunk_sectors(ns, id, &lim);
--
2.43.0
Thanks for sharing comments! I did such changes in v2: - Remove Fixes tag(Keith Busch: ds has never been validated) - Change -EIO to -ENODEV (Keith Busch) - Add missing "queue_limits_cancel_update()" in error path (Keith Busch) - Add warning for ds == 0 (LBA format not available) (Maurizio Lombardi) --- On Mon, Apr 20, 2026 at 02:51:00PM -0700, Keith Busch wrote: > I think ENODEV is more appropriate errno. Fixed in v2. > you're missing the corresponding queue_limits_cancel_update() for this > error case. Added in both error paths in v2. > This fixes tag is wrong. Removed. On Mon, Apr 20, 2026 at 02:22:04PM +0200, Daniel Wagner wrote: > This fixes tag is wrong. Probably a5b1cd61820e is the better choice here. Keith pointed out that ds has never been validated so no commit introduced a regression — removed the Fixes tag entirely. On Mon, Apr 20, 2026 at 02:22:04PM +0200, Maurizio Lombardi wrote: > ds == 0 has a special meaning: 'LBA format is not currently available.' Thanks,added dev_warn_once() for ds==0 in v2. Chao Shi (1): nvme: core: reject invalid LBA data size from Identify Namespace drivers/nvme/host/core.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) -- 2.43.0
On Mon, Apr 20, 2026 at 07:11:13PM -0400, Chao Shi wrote: > On Mon, Apr 20, 2026 at 02:51:00PM -0700, Keith Busch wrote: > > Fixed in v2. > > > you're missing the corresponding queue_limits_cancel_update() for this > > error case. > > Added in both error paths in v2. Not quite what I suggested. Instead, if you move up the validation checks, you don't have any cleanup in the error case to do. And don't bother splitting the error cases. The message about the "0" LBADS is in reference to the the broadcast identification. It's never valid for a controller to return an unusable flbas index value for an attached namespace.
Hi Keith, Thanks for the follow-up — I misread your v1 comment. > Not quite what I suggested. Instead, if you move up the validation > checks, you don't have any cleanup in the error case to do. Understood now. In v3 the validation is hoisted above queue_limits_start_update() and blk_mq_freeze_queue(), so the error path is a plain `goto out` with no cancel_update / unfreeze needed. (Please let me know if I things on the right track) > And don't bother splitting the error cases. The message about the > "0" LBADS is in reference to the broadcast identification. It's > never valid for a controller to return an unusable flbas index > value for an attached namespace. Collapsed into a single check in v3. Sending v3 shortly. Thanks, Chao On Mon, Apr 20, 2026 at 8:25 PM Keith Busch <kbusch@kernel.org> wrote: > > On Mon, Apr 20, 2026 at 07:11:13PM -0400, Chao Shi wrote: > > On Mon, Apr 20, 2026 at 02:51:00PM -0700, Keith Busch wrote: > > > > Fixed in v2. > > > > > you're missing the corresponding queue_limits_cancel_update() for this > > > error case. > > > > Added in both error paths in v2. > > Not quite what I suggested. Instead, if you move up the validation > checks, you don't have any cleanup in the error case to do. > > And don't bother splitting the error cases. The message about the "0" > LBADS is in reference to the the broadcast identification. It's never > valid for a controller to return an unusable flbas index value for an > attached namespace.
nvme_update_ns_info_block() trusts id->lbaf[lbaf].ds from the
controller and assigns it directly to ns->head->lba_shift without
bounds checking. nvme_lba_to_sect() then does:
return lba << (head->lba_shift - SECTOR_SHIFT);
When called with lba = le64_to_cpu(id->nsze) to compute the device
capacity, an attacker-controlled controller can choose ds < 9 or a
combination of (ds, nsze) that makes the left shift overflow
sector_t. The former is a C undefined behaviour that UBSAN reports
as a BUG; the latter silently yields a bogus capacity that the
block layer then trusts for bounds checking.
Validate ds against SECTOR_SHIFT and use check_shl_overflow() to
compute capacity so that any (ds, nsze) combination that would
overflow sector_t is rejected. The namespace is skipped with -ENODEV
instead of crashing the kernel. This is reachable by a malicious
NVMe device, a buggy firmware, or an attacker-controlled NVMe-oF
target.
Stack trace (UBSAN, ds < 9 variant):
RIP: nvme_lba_to_sect drivers/nvme/host/nvme.h:699 [inline]
RIP: nvme_update_ns_info_block.cold+0x5/0x7
Call Trace:
nvme_update_ns_info+0x175/0xd90 drivers/nvme/host/core.c:2467
nvme_validate_ns drivers/nvme/host/core.c:4299 [inline]
nvme_scan_ns drivers/nvme/host/core.c:4350
nvme_scan_ns_async+0xa5/0xe0 drivers/nvme/host/core.c:4383
async_run_entry_fn
process_one_work
worker_thread
kthread
Found by Syzkaller.
Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---
drivers/nvme/host/core.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1e33af94c24..d1711ef59fb 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2407,9 +2407,28 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
lim = queue_limits_start_update(ns->disk->queue);
memflags = blk_mq_freeze_queue(ns->disk->queue);
+ if (id->lbaf[lbaf].ds == 0) {
+ dev_warn_once(ns->ctrl->device,
+ "LBA format not available, skipping namespace\n");
+ ret = -ENODEV;
+ queue_limits_cancel_update(ns->disk->queue);
+ blk_mq_unfreeze_queue(ns->disk->queue, memflags);
+ goto out;
+ }
+ if (id->lbaf[lbaf].ds < SECTOR_SHIFT ||
+ check_shl_overflow(le64_to_cpu(id->nsze),
+ id->lbaf[lbaf].ds - SECTOR_SHIFT,
+ &capacity)) {
+ dev_warn_once(ns->ctrl->device,
+ "invalid LBA data size %u, skipping namespace\n",
+ id->lbaf[lbaf].ds);
+ ret = -ENODEV;
+ queue_limits_cancel_update(ns->disk->queue);
+ blk_mq_unfreeze_queue(ns->disk->queue, memflags);
+ goto out;
+ }
ns->head->lba_shift = id->lbaf[lbaf].ds;
ns->head->nuse = le64_to_cpu(id->nuse);
- capacity = nvme_lba_to_sect(ns->head, le64_to_cpu(id->nsze));
nvme_set_ctrl_limits(ns->ctrl, &lim, false);
nvme_configure_metadata(ns->ctrl, ns->head, id, nvm, info);
nvme_set_chunk_sectors(ns, id, &lim);
--
2.43.0
On Sat, Apr 18, 2026 at 12:28:34AM -0400, Chao Shi wrote:
> memflags = blk_mq_freeze_queue(ns->disk->queue);
> + if (id->lbaf[lbaf].ds < SECTOR_SHIFT ||
> + check_shl_overflow(le64_to_cpu(id->nsze),
> + id->lbaf[lbaf].ds - SECTOR_SHIFT,
> + &capacity)) {
> + dev_warn_once(ns->ctrl->device,
> + "invalid LBA data size %u, skipping namespace\n",
> + id->lbaf[lbaf].ds);
> + ret = -EIO;
I think ENODEV is more appropriate errno.
> + blk_mq_unfreeze_queue(ns->disk->queue, memflags);
> + goto out;
I don't see any particular reason why we shouldn't validate this value
before starting the queue updates and freezing the queue, like we for
the ncap field up higher. Doing that would make the error case much
simpler. Case in point, you're missing the corresponding
queue_limits_cancel_update() for this error case.
> + }
> ns->head->lba_shift = id->lbaf[lbaf].ds;
> ns->head->nuse = le64_to_cpu(id->nuse);
> - capacity = nvme_lba_to_sect(ns->head, le64_to_cpu(id->nsze));
> nvme_set_ctrl_limits(ns->ctrl, &lim, false);
> nvme_configure_metadata(ns->ctrl, ns->head, id, nvm, info);
> nvme_set_chunk_sectors(ns, id, &lim);
> --
> 2.43.0
>
On Sat Apr 18, 2026 at 6:28 AM CEST, Chao Shi wrote:
> nvme_update_ns_info_block() trusts id->lbaf[lbaf].ds from the
> controller and assigns it directly to ns->head->lba_shift without
> bounds checking. nvme_lba_to_sect() then does:
>
> return lba << (head->lba_shift - SECTOR_SHIFT);
>
> When called with lba = le64_to_cpu(id->nsze) to compute the device
> capacity, an attacker-controlled controller can choose ds < 9 or a
> combination of (ds, nsze) that makes the left shift overflow
> sector_t. The former is a C undefined behaviour that UBSAN reports
> as a BUG; the latter silently yields a bogus capacity that the
> block layer then trusts for bounds checking.
>
> Validate ds against SECTOR_SHIFT and use check_shl_overflow() to
> compute capacity so that any (ds, nsze) combination that would
> overflow sector_t is rejected. The namespace is skipped with -EIO
> instead of crashing the kernel. This is reachable by a malicious
> NVMe device, a buggy firmware, or an attacker-controlled NVMe-oF
> target.
>
> Stack trace (UBSAN, ds < 9 variant):
>
> RIP: nvme_lba_to_sect drivers/nvme/host/nvme.h:699 [inline]
> RIP: nvme_update_ns_info_block.cold+0x5/0x7
> Call Trace:
> nvme_update_ns_info+0x175/0xd90 drivers/nvme/host/core.c:2467
> nvme_validate_ns drivers/nvme/host/core.c:4299 [inline]
> nvme_scan_ns drivers/nvme/host/core.c:4350
> nvme_scan_ns_async+0xa5/0xe0 drivers/nvme/host/core.c:4383
> async_run_entry_fn
> process_one_work
> worker_thread
> kthread
>
> Found by Syzkaller.
>
> Fixes: 9419e71b8d67 ("nvme: move ns id info to struct nvme_ns_head")
> Acked-by: Sungwoo Kim <iam@sung-woo.kim>
> Acked-by: Dave Tian <daveti@purdue.edu>
> Acked-by: Weidong Zhu <weizhu@fiu.edu>
> Signed-off-by: Chao Shi <coshi036@gmail.com>
> ---
> drivers/nvme/host/core.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 1e33af94c24..9b3bf3e4075 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2407,9 +2407,19 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
> lim = queue_limits_start_update(ns->disk->queue);
>
> memflags = blk_mq_freeze_queue(ns->disk->queue);
> + if (id->lbaf[lbaf].ds < SECTOR_SHIFT ||
> + check_shl_overflow(le64_to_cpu(id->nsze),
> + id->lbaf[lbaf].ds - SECTOR_SHIFT,
> + &capacity)) {
> + dev_warn_once(ns->ctrl->device,
> + "invalid LBA data size %u, skipping namespace\n",
> + id->lbaf[lbaf].ds);
Just a nit:
If I'm reading the NVMe spec correctly, ds == 0 has a special meaning:
'LBA format is not currently available.'
maybe we should use a different dev_warn() for ds == 0 ?
Maurizio
Hi Maurizio,
Thanks for the nit on v1.
> If I'm reading the NVMe spec correctly, ds == 0 has a special
> meaning: 'LBA format is not currently available.'
> maybe we should use a different dev_warn() for ds == 0 ?
I added the separate warning in v2, but Keith pointed out on the
v2 thread that the "LBA format not available" wording in the spec
refers to broadcast identification. For an attached namespace,
any unusable flbas index value (including 0) is invalid, so v3
merges both cases into a single check.
Apologies for the back-and-forth.
Also: v2 bounced to your old redhat.com address — using
mlombard@arkamax.eu from here on.
Thanks,
Chao
On Mon, Apr 20, 2026 at 8:22 AM Maurizio Lombardi <mlombard@arkamax.eu> wrote:
>
> On Sat Apr 18, 2026 at 6:28 AM CEST, Chao Shi wrote:
> > nvme_update_ns_info_block() trusts id->lbaf[lbaf].ds from the
> > controller and assigns it directly to ns->head->lba_shift without
> > bounds checking. nvme_lba_to_sect() then does:
> >
> > return lba << (head->lba_shift - SECTOR_SHIFT);
> >
> > When called with lba = le64_to_cpu(id->nsze) to compute the device
> > capacity, an attacker-controlled controller can choose ds < 9 or a
> > combination of (ds, nsze) that makes the left shift overflow
> > sector_t. The former is a C undefined behaviour that UBSAN reports
> > as a BUG; the latter silently yields a bogus capacity that the
> > block layer then trusts for bounds checking.
> >
> > Validate ds against SECTOR_SHIFT and use check_shl_overflow() to
> > compute capacity so that any (ds, nsze) combination that would
> > overflow sector_t is rejected. The namespace is skipped with -EIO
> > instead of crashing the kernel. This is reachable by a malicious
> > NVMe device, a buggy firmware, or an attacker-controlled NVMe-oF
> > target.
> >
> > Stack trace (UBSAN, ds < 9 variant):
> >
> > RIP: nvme_lba_to_sect drivers/nvme/host/nvme.h:699 [inline]
> > RIP: nvme_update_ns_info_block.cold+0x5/0x7
> > Call Trace:
> > nvme_update_ns_info+0x175/0xd90 drivers/nvme/host/core.c:2467
> > nvme_validate_ns drivers/nvme/host/core.c:4299 [inline]
> > nvme_scan_ns drivers/nvme/host/core.c:4350
> > nvme_scan_ns_async+0xa5/0xe0 drivers/nvme/host/core.c:4383
> > async_run_entry_fn
> > process_one_work
> > worker_thread
> > kthread
> >
> > Found by Syzkaller.
> >
> > Fixes: 9419e71b8d67 ("nvme: move ns id info to struct nvme_ns_head")
> > Acked-by: Sungwoo Kim <iam@sung-woo.kim>
> > Acked-by: Dave Tian <daveti@purdue.edu>
> > Acked-by: Weidong Zhu <weizhu@fiu.edu>
> > Signed-off-by: Chao Shi <coshi036@gmail.com>
> > ---
> > drivers/nvme/host/core.c | 12 +++++++++++-
> > 1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 1e33af94c24..9b3bf3e4075 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -2407,9 +2407,19 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
> > lim = queue_limits_start_update(ns->disk->queue);
> >
> > memflags = blk_mq_freeze_queue(ns->disk->queue);
> > + if (id->lbaf[lbaf].ds < SECTOR_SHIFT ||
> > + check_shl_overflow(le64_to_cpu(id->nsze),
> > + id->lbaf[lbaf].ds - SECTOR_SHIFT,
> > + &capacity)) {
> > + dev_warn_once(ns->ctrl->device,
> > + "invalid LBA data size %u, skipping namespace\n",
> > + id->lbaf[lbaf].ds);
>
> Just a nit:
>
> If I'm reading the NVMe spec correctly, ds == 0 has a special meaning:
> 'LBA format is not currently available.'
> maybe we should use a different dev_warn() for ds == 0 ?
>
> Maurizio
>
On Sat, Apr 18, 2026 at 12:28:34AM -0400, Chao Shi wrote:
> Fixes: 9419e71b8d67 ("nvme: move ns id info to struct nvme_ns_head")
This fixes tag is wrong. The above commit doesn't touch the code line
in question.
Probably a5b1cd61820e ("nvme: move a few things out of
nvme_update_disk_info") is the better choice here.
Hi Daniel,
Thanks for the catch on the Fixes tag.
> This fixes tag is wrong. The above commit doesn't touch the code
> line in question.
> Probably a5b1cd61820e ("nvme: move a few things out of
> nvme_update_disk_info") is the better choice here.
Keith subsequently pointed out that ds has never been validated
upstream, so no single commit introduced a regression to fix. I
dropped the Fixes tag entirely in v2 (and v3 keeps it dropped).
Thanks,
Chao
On Mon, Apr 20, 2026 at 8:22 AM Daniel Wagner <dwagner@suse.de> wrote:
>
> On Sat, Apr 18, 2026 at 12:28:34AM -0400, Chao Shi wrote:
> > Fixes: 9419e71b8d67 ("nvme: move ns id info to struct nvme_ns_head")
>
> This fixes tag is wrong. The above commit doesn't touch the code line
> in question.
>
> Probably a5b1cd61820e ("nvme: move a few things out of
> nvme_update_disk_info") is the better choice here.
On Mon, Apr 20, 2026 at 02:22:04PM +0200, Daniel Wagner wrote:
> On Sat, Apr 19, 2026 at 12:28:34AM -0400, Chao Shi wrote:
> > Fixes: 9419e71b8d67 ("nvme: move ns id info to struct nvme_ns_head")
>
> This fixes tag is wrong. The above commit doesn't touch the code line
> in question.
>
> Probably a5b1cd61820e ("nvme: move a few things out of
> nvme_update_disk_info") is the better choice here.
We've never validated the 'ds' field though, so not sure anything
introduced a regression here that warrants a fixing tag.
© 2016 - 2026 Red Hat, Inc.