kernel/resource.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
walk_system_ram_res_rev() erroneously discards resource flags when
passing the information to the callback.
This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to
have these resources selected during kexec to store kexec buffers
if that memory happens to be at placed above normal system ram.
This leads to undefined behavior after reboot. If the kexec buffer
is never touched, nothing happens. If the kexec buffer is touched,
it could lead to a crash (like below) or undefined behavior.
Tested on a system with CXL memory expanders with driver managed
memory, TPM enabled, and CONFIG_IMA_KEXEC=y. Adding printk's
showed the flags were being discarded and as a result the check
for IORESOURCE_SYSRAM_DRIVER_MANAGED passes.
find_next_iomem_res: name(System RAM (kmem))
start(10000000000)
end(1034fffffff)
flags(83000200)
locate_mem_hole_top_down: start(10000000000) end(1034fffffff) flags(0)
[.] BUG: unable to handle page fault for address: ffff89834ffff000
[.] #PF: supervisor read access in kernel mode
[.] #PF: error_code(0x0000) - not-present page
[.] PGD c04c8bf067 P4D c04c8bf067 PUD c04c8be067 PMD 0
[.] Oops: 0000 [#1] SMP
[.] RIP: 0010:ima_restore_measurement_list+0x95/0x4b0
[.] RSP: 0018:ffffc900000d3a80 EFLAGS: 00010286
[.] RAX: 0000000000001000 RBX: 0000000000000000 RCX: ffff89834ffff000
[.] RDX: 0000000000000018 RSI: ffff89834ffff000 RDI: ffff89834ffff018
[.] RBP: ffffc900000d3ba0 R08: 0000000000000020 R09: ffff888132b8a900
[.] R10: 4000000000000000 R11: 000000003a616d69 R12: 0000000000000000
[.] R13: ffffffff8404ac28 R14: 0000000000000000 R15: ffff89834ffff000
[.] FS: 0000000000000000(0000) GS:ffff893d44640000(0000) knlGS:0000000000000000
[.] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[.] ata5: SATA link down (SStatus 0 SControl 300)
[.] CR2: ffff89834ffff000 CR3: 000001034d00f001 CR4: 0000000000770ef0
[.] PKRU: 55555554
[.] Call Trace:
[.] <TASK>
[.] ? __die+0x78/0xc0
[.] ? page_fault_oops+0x2a8/0x3a0
[.] ? exc_page_fault+0x84/0x130
[.] ? asm_exc_page_fault+0x22/0x30
[.] ? ima_restore_measurement_list+0x95/0x4b0
[.] ? template_desc_init_fields+0x317/0x410
[.] ? crypto_alloc_tfm_node+0x9c/0xc0
[.] ? init_ima_lsm+0x30/0x30
[.] ima_load_kexec_buffer+0x72/0xa0
[.] ima_init+0x44/0xa0
[.] __initstub__kmod_ima__373_1201_init_ima7+0x1e/0xb0
[.] ? init_ima_lsm+0x30/0x30
[.] do_one_initcall+0xad/0x200
[.] ? idr_alloc_cyclic+0xaa/0x110
[.] ? new_slab+0x12c/0x420
[.] ? new_slab+0x12c/0x420
[.] ? number+0x12a/0x430
[.] ? sysvec_apic_timer_interrupt+0xa/0x80
[.] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[.] ? parse_args+0xd4/0x380
[.] ? parse_args+0x14b/0x380
[.] kernel_init_freeable+0x1c1/0x2b0
[.] ? rest_init+0xb0/0xb0
[.] kernel_init+0x16/0x1a0
[.] ret_from_fork+0x2f/0x40
[.] ? rest_init+0xb0/0xb0
[.] ret_from_fork_asm+0x11/0x20
[.] </TASK>
Link: https://lore.kernel.org/all/20231114091658.228030-1-bhe@redhat.com/
Fixes: 7acf164b259d ("resource: add walk_system_ram_res_rev()")
Cc: stable@vger.kernel.org
Signed-off-by: Gregory Price <gourry@gourry.net>
---
kernel/resource.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/kernel/resource.c b/kernel/resource.c
index b730bd28b422..4101016e8b20 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -459,9 +459,7 @@ int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
rams_size += 16;
}
- rams[i].start = res.start;
- rams[i++].end = res.end;
-
+ rams[i++] = res;
start = res.end + 1;
}
--
2.43.0
On Thu, Oct 17, 2024 at 03:03:47PM -0400, Gregory Price wrote: > walk_system_ram_res_rev() erroneously discards resource flags when > passing the information to the callback. > > This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to > have these resources selected during kexec to store kexec buffers > if that memory happens to be at placed above normal system ram. > > This leads to undefined behavior after reboot. If the kexec buffer > is never touched, nothing happens. If the kexec buffer is touched, > it could lead to a crash (like below) or undefined behavior. > > Tested on a system with CXL memory expanders with driver managed > memory, TPM enabled, and CONFIG_IMA_KEXEC=y. Adding printk's > showed the flags were being discarded and as a result the check > for IORESOURCE_SYSRAM_DRIVER_MANAGED passes. > > find_next_iomem_res: name(System RAM (kmem)) > start(10000000000) > end(1034fffffff) > flags(83000200) > > locate_mem_hole_top_down: start(10000000000) end(1034fffffff) flags(0) > > [.] BUG: unable to handle page fault for address: ffff89834ffff000 Please, cut this down to only important ~3-5 lines as suggested in the Submitting Patches documentation. Yeah, I see that Andrew applied it to hist testing branch, if it's not going to be updated there, consider above as a hint for the future contributions with backtraces. -- With Best Regards, Andy Shevchenko
On Fri, Oct 18, 2024 at 03:57:30PM +0300, Andy Shevchenko wrote: > On Thu, Oct 17, 2024 at 03:03:47PM -0400, Gregory Price wrote: > > walk_system_ram_res_rev() erroneously discards resource flags when > > passing the information to the callback. > > > > This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to > > have these resources selected during kexec to store kexec buffers > > if that memory happens to be at placed above normal system ram. > > > > This leads to undefined behavior after reboot. If the kexec buffer > > is never touched, nothing happens. If the kexec buffer is touched, > > it could lead to a crash (like below) or undefined behavior. > > > > Tested on a system with CXL memory expanders with driver managed > > memory, TPM enabled, and CONFIG_IMA_KEXEC=y. Adding printk's > > showed the flags were being discarded and as a result the check > > for IORESOURCE_SYSRAM_DRIVER_MANAGED passes. > > > > find_next_iomem_res: name(System RAM (kmem)) > > start(10000000000) > > end(1034fffffff) > > flags(83000200) > > > > locate_mem_hole_top_down: start(10000000000) end(1034fffffff) flags(0) > > > > [.] BUG: unable to handle page fault for address: ffff89834ffff000 > > Please, cut this down to only important ~3-5 lines as suggested in > the Submitting Patches documentation. > > Yeah, I see that Andrew applied it to hist testing branch, if it's not going to > be updated there, consider above as a hint for the future contributions with > backtraces. > noted, thank you! > -- > With Best Regards, > Andy Shevchenko > >
HI Gregory, On 10/17/24 at 03:03pm, Gregory Price wrote: > walk_system_ram_res_rev() erroneously discards resource flags when > passing the information to the callback. > > This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to > have these resources selected during kexec to store kexec buffers > if that memory happens to be at placed above normal system ram. Sorry about that. I haven't checked IORESOURCE_SYSRAM_DRIVER_MANAGED memory carefully, wondering if res could be set as 'IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY' plus IORESOURCE_SYSRAM_DRIVER_MANAGED in iomem_resource tree. Anyway, the change in this patch is certainly better. Thanks. Acked-by: Baoquan He <bhe@redhat.com> > > This leads to undefined behavior after reboot. If the kexec buffer > is never touched, nothing happens. If the kexec buffer is touched, > it could lead to a crash (like below) or undefined behavior. > > Tested on a system with CXL memory expanders with driver managed > memory, TPM enabled, and CONFIG_IMA_KEXEC=y. Adding printk's > showed the flags were being discarded and as a result the check > for IORESOURCE_SYSRAM_DRIVER_MANAGED passes. > > find_next_iomem_res: name(System RAM (kmem)) > start(10000000000) > end(1034fffffff) > flags(83000200) > > locate_mem_hole_top_down: start(10000000000) end(1034fffffff) flags(0) > > [.] BUG: unable to handle page fault for address: ffff89834ffff000 > [.] #PF: supervisor read access in kernel mode > [.] #PF: error_code(0x0000) - not-present page > [.] PGD c04c8bf067 P4D c04c8bf067 PUD c04c8be067 PMD 0 > [.] Oops: 0000 [#1] SMP > [.] RIP: 0010:ima_restore_measurement_list+0x95/0x4b0 > [.] RSP: 0018:ffffc900000d3a80 EFLAGS: 00010286 > [.] RAX: 0000000000001000 RBX: 0000000000000000 RCX: ffff89834ffff000 > [.] RDX: 0000000000000018 RSI: ffff89834ffff000 RDI: ffff89834ffff018 > [.] RBP: ffffc900000d3ba0 R08: 0000000000000020 R09: ffff888132b8a900 > [.] R10: 4000000000000000 R11: 000000003a616d69 R12: 0000000000000000 > [.] R13: ffffffff8404ac28 R14: 0000000000000000 R15: ffff89834ffff000 > [.] FS: 0000000000000000(0000) GS:ffff893d44640000(0000) knlGS:0000000000000000 > [.] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [.] ata5: SATA link down (SStatus 0 SControl 300) > [.] CR2: ffff89834ffff000 CR3: 000001034d00f001 CR4: 0000000000770ef0 > [.] PKRU: 55555554 > [.] Call Trace: > [.] <TASK> > [.] ? __die+0x78/0xc0 > [.] ? page_fault_oops+0x2a8/0x3a0 > [.] ? exc_page_fault+0x84/0x130 > [.] ? asm_exc_page_fault+0x22/0x30 > [.] ? ima_restore_measurement_list+0x95/0x4b0 > [.] ? template_desc_init_fields+0x317/0x410 > [.] ? crypto_alloc_tfm_node+0x9c/0xc0 > [.] ? init_ima_lsm+0x30/0x30 > [.] ima_load_kexec_buffer+0x72/0xa0 > [.] ima_init+0x44/0xa0 > [.] __initstub__kmod_ima__373_1201_init_ima7+0x1e/0xb0 > [.] ? init_ima_lsm+0x30/0x30 > [.] do_one_initcall+0xad/0x200 > [.] ? idr_alloc_cyclic+0xaa/0x110 > [.] ? new_slab+0x12c/0x420 > [.] ? new_slab+0x12c/0x420 > [.] ? number+0x12a/0x430 > [.] ? sysvec_apic_timer_interrupt+0xa/0x80 > [.] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 > [.] ? parse_args+0xd4/0x380 > [.] ? parse_args+0x14b/0x380 > [.] kernel_init_freeable+0x1c1/0x2b0 > [.] ? rest_init+0xb0/0xb0 > [.] kernel_init+0x16/0x1a0 > [.] ret_from_fork+0x2f/0x40 > [.] ? rest_init+0xb0/0xb0 > [.] ret_from_fork_asm+0x11/0x20 > [.] </TASK> > > Link: https://lore.kernel.org/all/20231114091658.228030-1-bhe@redhat.com/ > Fixes: 7acf164b259d ("resource: add walk_system_ram_res_rev()") > Cc: stable@vger.kernel.org > Signed-off-by: Gregory Price <gourry@gourry.net> > --- > kernel/resource.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/kernel/resource.c b/kernel/resource.c > index b730bd28b422..4101016e8b20 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -459,9 +459,7 @@ int walk_system_ram_res_rev(u64 start, u64 end, void *arg, > rams_size += 16; > } > > - rams[i].start = res.start; > - rams[i++].end = res.end; > - > + rams[i++] = res; > start = res.end + 1; > } > > -- > 2.43.0 >
On Fri, Oct 18, 2024 at 10:18:42AM +0800, Baoquan He wrote: > HI Gregory, > > On 10/17/24 at 03:03pm, Gregory Price wrote: > > walk_system_ram_res_rev() erroneously discards resource flags when > > passing the information to the callback. > > > > This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to > > have these resources selected during kexec to store kexec buffers > > if that memory happens to be at placed above normal system ram. > > Sorry about that. I haven't checked IORESOURCE_SYSRAM_DRIVER_MANAGED > memory carefully, wondering if res could be set as > 'IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY' plus > IORESOURCE_SYSRAM_DRIVER_MANAGED in iomem_resource tree. > > Anyway, the change in this patch is certainly better. Thanks. Can we get more test cases in the respective module, please? -- With Best Regards, Andy Shevchenko
On 10/18/24 at 03:22pm, Andy Shevchenko wrote: > On Fri, Oct 18, 2024 at 10:18:42AM +0800, Baoquan He wrote: > > HI Gregory, > > > > On 10/17/24 at 03:03pm, Gregory Price wrote: > > > walk_system_ram_res_rev() erroneously discards resource flags when > > > passing the information to the callback. > > > > > > This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to > > > have these resources selected during kexec to store kexec buffers > > > if that memory happens to be at placed above normal system ram. > > > > Sorry about that. I haven't checked IORESOURCE_SYSRAM_DRIVER_MANAGED > > memory carefully, wondering if res could be set as > > 'IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY' plus > > IORESOURCE_SYSRAM_DRIVER_MANAGED in iomem_resource tree. > > > > Anyway, the change in this patch is certainly better. Thanks. > > Can we get more test cases in the respective module, please? Do you mean testing CXL memory in kexec/kdump? No, we can't. Kexec/kdump test cases basically is system testing, not unit test or module test. It needs run system and then jump to 2nd kernel, vm can be used but it can't cover many cases existing only on baremetal. Currenly, Redhat's CKI is heavily relied on to test them, however I am not sure if system with CXL support is available in our LAB. Not sure if I got you right.
On Fri, Oct 18, 2024 at 09:52:47PM +0800, Baoquan He wrote: > On 10/18/24 at 03:22pm, Andy Shevchenko wrote: > > On Fri, Oct 18, 2024 at 10:18:42AM +0800, Baoquan He wrote: > > > On 10/17/24 at 03:03pm, Gregory Price wrote: > > > > walk_system_ram_res_rev() erroneously discards resource flags when > > > > passing the information to the callback. > > > > > > > > This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to > > > > have these resources selected during kexec to store kexec buffers > > > > if that memory happens to be at placed above normal system ram. > > > > > > Sorry about that. I haven't checked IORESOURCE_SYSRAM_DRIVER_MANAGED > > > memory carefully, wondering if res could be set as > > > 'IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY' plus > > > IORESOURCE_SYSRAM_DRIVER_MANAGED in iomem_resource tree. > > > > > > Anyway, the change in this patch is certainly better. Thanks. > > > > Can we get more test cases in the respective module, please? > > Do you mean testing CXL memory in kexec/kdump? No, we can't. Kexec/kdump > test cases basically is system testing, not unit test or module test. It > needs run system and then jump to 2nd kernel, vm can be used but it > can't cover many cases existing only on baremetal. Currenly, Redhat's > CKI is heavily relied on to test them, however I am not sure if system > with CXL support is available in our LAB. > > Not sure if I got you right. I meant since we touch resource.c, we should really touch resource_kunit.c *in addition to*. -- With Best Regards, Andy Shevchenko
On Fri, Oct 18, 2024 at 05:51:09PM +0300, Andy Shevchenko wrote: > On Fri, Oct 18, 2024 at 09:52:47PM +0800, Baoquan He wrote: > > On 10/18/24 at 03:22pm, Andy Shevchenko wrote: > > > On Fri, Oct 18, 2024 at 10:18:42AM +0800, Baoquan He wrote: > > > > On 10/17/24 at 03:03pm, Gregory Price wrote: > > > > > walk_system_ram_res_rev() erroneously discards resource flags when > > > > > passing the information to the callback. > > > > > > > > > > This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to > > > > > have these resources selected during kexec to store kexec buffers > > > > > if that memory happens to be at placed above normal system ram. > > > > > > > > Sorry about that. I haven't checked IORESOURCE_SYSRAM_DRIVER_MANAGED > > > > memory carefully, wondering if res could be set as > > > > 'IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY' plus > > > > IORESOURCE_SYSRAM_DRIVER_MANAGED in iomem_resource tree. > > > > > > > > Anyway, the change in this patch is certainly better. Thanks. > > > > > > Can we get more test cases in the respective module, please? > > > > Do you mean testing CXL memory in kexec/kdump? No, we can't. Kexec/kdump > > test cases basically is system testing, not unit test or module test. It > > needs run system and then jump to 2nd kernel, vm can be used but it > > can't cover many cases existing only on baremetal. Currenly, Redhat's > > CKI is heavily relied on to test them, however I am not sure if system > > with CXL support is available in our LAB. > > > > Not sure if I got you right. > > I meant since we touch resource.c, we should really touch resource_kunit.c > *in addition to*. And to be more clear, there is no best time to add test cases than as early as possible. So, can we add the test cases to the (new) APIs, so we want have an issue like the one this patch fixes? -- With Best Regards, Andy Shevchenko
On 10/18/24 at 05:52pm, Andy Shevchenko wrote: > On Fri, Oct 18, 2024 at 05:51:09PM +0300, Andy Shevchenko wrote: > > On Fri, Oct 18, 2024 at 09:52:47PM +0800, Baoquan He wrote: > > > On 10/18/24 at 03:22pm, Andy Shevchenko wrote: > > > > On Fri, Oct 18, 2024 at 10:18:42AM +0800, Baoquan He wrote: > > > > > On 10/17/24 at 03:03pm, Gregory Price wrote: > > > > > > walk_system_ram_res_rev() erroneously discards resource flags when > > > > > > passing the information to the callback. > > > > > > > > > > > > This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to > > > > > > have these resources selected during kexec to store kexec buffers > > > > > > if that memory happens to be at placed above normal system ram. > > > > > > > > > > Sorry about that. I haven't checked IORESOURCE_SYSRAM_DRIVER_MANAGED > > > > > memory carefully, wondering if res could be set as > > > > > 'IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY' plus > > > > > IORESOURCE_SYSRAM_DRIVER_MANAGED in iomem_resource tree. > > > > > > > > > > Anyway, the change in this patch is certainly better. Thanks. > > > > > > > > Can we get more test cases in the respective module, please? > > > > > > Do you mean testing CXL memory in kexec/kdump? No, we can't. Kexec/kdump > > > test cases basically is system testing, not unit test or module test. It > > > needs run system and then jump to 2nd kernel, vm can be used but it > > > can't cover many cases existing only on baremetal. Currenly, Redhat's > > > CKI is heavily relied on to test them, however I am not sure if system > > > with CXL support is available in our LAB. > > > > > > Not sure if I got you right. > > > > I meant since we touch resource.c, we should really touch resource_kunit.c > > *in addition to*. > > And to be more clear, there is no best time to add test cases than > as early as possible. So, can we add the test cases to the (new) APIs, > so we want have an issue like the one this patch fixes? I will have a look at kernel/resource_kunit.c to see if I can add something for walk_system_ram_res_rev(). Thanks.
On Fri, Oct 18, 2024 at 11:39:00PM +0800, Baoquan He wrote: > On 10/18/24 at 05:52pm, Andy Shevchenko wrote: > > On Fri, Oct 18, 2024 at 05:51:09PM +0300, Andy Shevchenko wrote: > > > On Fri, Oct 18, 2024 at 09:52:47PM +0800, Baoquan He wrote: > > > > On 10/18/24 at 03:22pm, Andy Shevchenko wrote: > > > > > On Fri, Oct 18, 2024 at 10:18:42AM +0800, Baoquan He wrote: ... > > > > > Can we get more test cases in the respective module, please? > > > > > > > > Do you mean testing CXL memory in kexec/kdump? No, we can't. Kexec/kdump > > > > test cases basically is system testing, not unit test or module test. It > > > > needs run system and then jump to 2nd kernel, vm can be used but it > > > > can't cover many cases existing only on baremetal. Currenly, Redhat's > > > > CKI is heavily relied on to test them, however I am not sure if system > > > > with CXL support is available in our LAB. > > > > > > > > Not sure if I got you right. > > > > > > I meant since we touch resource.c, we should really touch resource_kunit.c > > > *in addition to*. > > > > And to be more clear, there is no best time to add test cases than > > as early as possible. So, can we add the test cases to the (new) APIs, > > so we want have an issue like the one this patch fixes? > > I will have a look at kernel/resource_kunit.c to see if I can add > something for walk_system_ram_res_rev(). Thanks. Thank you! I will appreciate that. -- With Best Regards, Andy Shevchenko
Gregory Price wrote: > walk_system_ram_res_rev() erroneously discards resource flags when > passing the information to the callback. > > This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to > have these resources selected during kexec to store kexec buffers > if that memory happens to be at placed above normal system ram. > > This leads to undefined behavior after reboot. If the kexec buffer > is never touched, nothing happens. If the kexec buffer is touched, > it could lead to a crash (like below) or undefined behavior. > > Tested on a system with CXL memory expanders with driver managed > memory, TPM enabled, and CONFIG_IMA_KEXEC=y. Adding printk's > showed the flags were being discarded and as a result the check > for IORESOURCE_SYSRAM_DRIVER_MANAGED passes. > > find_next_iomem_res: name(System RAM (kmem)) > start(10000000000) > end(1034fffffff) > flags(83000200) > > locate_mem_hole_top_down: start(10000000000) end(1034fffffff) flags(0) > [..] > Link: https://lore.kernel.org/all/20231114091658.228030-1-bhe@redhat.com/ > Fixes: 7acf164b259d ("resource: add walk_system_ram_res_rev()") > Cc: stable@vger.kernel.org > Signed-off-by: Gregory Price <gourry@gourry.net> > --- > kernel/resource.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/kernel/resource.c b/kernel/resource.c > index b730bd28b422..4101016e8b20 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -459,9 +459,7 @@ int walk_system_ram_res_rev(u64 start, u64 end, void *arg, > rams_size += 16; > } > > - rams[i].start = res.start; > - rams[i++].end = res.end; > - > + rams[i++] = res; > start = res.end + 1; Looks good to me, makes it obvious that everything that find_next_iomem_res() would return for walk_system_ram_range() users is the same as what walk_system_ram_res_rev() returns. Reviewed-by: Dan Williams <dan.j.williams@intel.com>
© 2016 - 2024 Red Hat, Inc.