drivers/nvdimm/label.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
If a CXL memory device returns a broken zero LSA size in its memory
device information (Identify Memory Device (Opcode 4000h), CXL
spec. 3.1, 8.2.9.9.1.1), a divide error occurs in the libnvdimm
driver:
Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:nd_label_data_init+0x10e/0x800 [libnvdimm]
Code and flow:
1) CXL Command 4000h returns LSA size = 0,
2) config_size is assigned to zero LSA size (CXL pmem driver):
drivers/cxl/pmem.c: .config_size = mds->lsa_size,
3) max_xfer is set to zero (nvdimm driver):
drivers/nvdimm/label.c: max_xfer = min_t(size_t, ndd->nsarea.max_xfer, config_size);
drivers/nvdimm/label.c: if (read_size < max_xfer) {
drivers/nvdimm/label.c- /* trim waste */
4) DIV_ROUND_UP() causes division by zero:
drivers/nvdimm/label.c: max_xfer -= ((max_xfer - 1) - (config_size - 1) % max_xfer) /
drivers/nvdimm/label.c: DIV_ROUND_UP(config_size, max_xfer);
drivers/nvdimm/label.c- /* make certain we read indexes in exactly 1 read */
drivers/nvdimm/label.c: if (max_xfer < read_size)
drivers/nvdimm/label.c: max_xfer = read_size;
drivers/nvdimm/label.c- }
Fix this by checking the config size parameter by extending an
existing check.
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/nvdimm/label.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
index 082253a3a956..04f4a049599a 100644
--- a/drivers/nvdimm/label.c
+++ b/drivers/nvdimm/label.c
@@ -442,7 +442,8 @@ int nd_label_data_init(struct nvdimm_drvdata *ndd)
if (ndd->data)
return 0;
- if (ndd->nsarea.status || ndd->nsarea.max_xfer == 0) {
+ if (ndd->nsarea.status || ndd->nsarea.max_xfer == 0 ||
+ ndd->nsarea.config_size == 0) {
dev_dbg(ndd->dev, "failed to init config data area: (%u:%u)\n",
ndd->nsarea.max_xfer, ndd->nsarea.config_size);
return -ENXIO;
--
2.39.5
Robert Richter wrote:
> If a CXL memory device returns a broken zero LSA size in its memory
> device information (Identify Memory Device (Opcode 4000h), CXL
> spec. 3.1, 8.2.9.9.1.1), a divide error occurs in the libnvdimm
> driver:
>
> Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI
> RIP: 0010:nd_label_data_init+0x10e/0x800 [libnvdimm]
>
> Code and flow:
>
> 1) CXL Command 4000h returns LSA size = 0,
> 2) config_size is assigned to zero LSA size (CXL pmem driver):
>
> drivers/cxl/pmem.c: .config_size = mds->lsa_size,
>
> 3) max_xfer is set to zero (nvdimm driver):
>
> drivers/nvdimm/label.c: max_xfer = min_t(size_t, ndd->nsarea.max_xfer, config_size);
> drivers/nvdimm/label.c: if (read_size < max_xfer) {
> drivers/nvdimm/label.c- /* trim waste */
>
> 4) DIV_ROUND_UP() causes division by zero:
>
> drivers/nvdimm/label.c: max_xfer -= ((max_xfer - 1) - (config_size - 1) % max_xfer) /
> drivers/nvdimm/label.c: DIV_ROUND_UP(config_size, max_xfer);
I think this is the wrong DIV_ROUND_UP which is failing because read_size is
never less than max_xfer is it?
I believe the failing DIV_ROUND_UP is after if statement here:
489 /* Make our initial read size a multiple of max_xfer size */
490 read_size = min(DIV_ROUND_UP(read_size, max_xfer) * max_xfer,
491 config_size);
Apparently nvdimm_get_config_data() was intended to check for this implicitly
but it is too late.
Anyway all this side tracked me a bit.
I assume this is a broken device which is in the real world? The fix looks
fine. But could you re-spin with a clean up of the commit message and I'll
queue it up.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
[snip]
On 19.03.25 14:33:54, Ira Weiny wrote:
> Robert Richter wrote:
> > If a CXL memory device returns a broken zero LSA size in its memory
> > device information (Identify Memory Device (Opcode 4000h), CXL
> > spec. 3.1, 8.2.9.9.1.1), a divide error occurs in the libnvdimm
> > driver:
> >
> > Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI
> > RIP: 0010:nd_label_data_init+0x10e/0x800 [libnvdimm]
> >
> > Code and flow:
> >
> > 1) CXL Command 4000h returns LSA size = 0,
> > 2) config_size is assigned to zero LSA size (CXL pmem driver):
> >
> > drivers/cxl/pmem.c: .config_size = mds->lsa_size,
> >
> > 3) max_xfer is set to zero (nvdimm driver):
> >
> > drivers/nvdimm/label.c: max_xfer = min_t(size_t, ndd->nsarea.max_xfer, config_size);
> > drivers/nvdimm/label.c: if (read_size < max_xfer) {
> > drivers/nvdimm/label.c- /* trim waste */
> >
> > 4) DIV_ROUND_UP() causes division by zero:
> >
> > drivers/nvdimm/label.c: max_xfer -= ((max_xfer - 1) - (config_size - 1) % max_xfer) /
> > drivers/nvdimm/label.c: DIV_ROUND_UP(config_size, max_xfer);
>
> I think this is the wrong DIV_ROUND_UP which is failing because read_size is
> never less than max_xfer is it?
>
> I believe the failing DIV_ROUND_UP is after if statement here:
>
> 489 /* Make our initial read size a multiple of max_xfer size */
> 490 read_size = min(DIV_ROUND_UP(read_size, max_xfer) * max_xfer,
> 491 config_size);
Yes, it is this one.
>
> Apparently nvdimm_get_config_data() was intended to check for this implicitly
> but it is too late.
>
> Anyway all this side tracked me a bit.
>
> I assume this is a broken device which is in the real world? The fix looks
> fine. But could you re-spin with a clean up of the commit message and I'll
> queue it up.
Yes, it was caused by a faulty device.
Sure, will update description and resend.
>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Thanks for review,
-Robert
On 3/19/2025 12:32 PM, Robert Richter wrote:
> If a CXL memory device returns a broken zero LSA size in its memory
> device information (Identify Memory Device (Opcode 4000h), CXL
> spec. 3.1, 8.2.9.9.1.1), a divide error occurs in the libnvdimm
> driver:
>
> Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI
> RIP: 0010:nd_label_data_init+0x10e/0x800 [libnvdimm]
>
> Code and flow:
>
> 1) CXL Command 4000h returns LSA size = 0,
> 2) config_size is assigned to zero LSA size (CXL pmem driver):
>
> drivers/cxl/pmem.c: .config_size = mds->lsa_size,
>
> 3) max_xfer is set to zero (nvdimm driver):
>
> drivers/nvdimm/label.c: max_xfer = min_t(size_t, ndd->nsarea.max_xfer, config_size);
> drivers/nvdimm/label.c: if (read_size < max_xfer) {
> drivers/nvdimm/label.c- /* trim waste */
>
> 4) DIV_ROUND_UP() causes division by zero:
>
> drivers/nvdimm/label.c: max_xfer -= ((max_xfer - 1) - (config_size - 1) % max_xfer) /
> drivers/nvdimm/label.c: DIV_ROUND_UP(config_size, max_xfer);
> drivers/nvdimm/label.c- /* make certain we read indexes in exactly 1 read */
> drivers/nvdimm/label.c: if (max_xfer < read_size)
> drivers/nvdimm/label.c: max_xfer = read_size;
> drivers/nvdimm/label.c- }
>
> Fix this by checking the config size parameter by extending an
> existing check.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
LGTM
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
> ---
> drivers/nvdimm/label.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
> index 082253a3a956..04f4a049599a 100644
> --- a/drivers/nvdimm/label.c
> +++ b/drivers/nvdimm/label.c
> @@ -442,7 +442,8 @@ int nd_label_data_init(struct nvdimm_drvdata *ndd)
> if (ndd->data)
> return 0;
>
> - if (ndd->nsarea.status || ndd->nsarea.max_xfer == 0) {
> + if (ndd->nsarea.status || ndd->nsarea.max_xfer == 0 ||
> + ndd->nsarea.config_size == 0) {
> dev_dbg(ndd->dev, "failed to init config data area: (%u:%u)\n",
> ndd->nsarea.max_xfer, ndd->nsarea.config_size);
> return -ENXIO;
© 2016 - 2025 Red Hat, Inc.