drivers/cxl/cxl.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
The CXL RAS Header Log is 64 bytes, but CXL_HEADERLOG_SIZE is SZ_512,
which is 512 bytes, not 512 bits, so the kernel treats it as 8x times
bigger.
header_log_copy() reads 448 bytes of MMIO past the register, and
cxl_*_aer_uncorrectable_error() tracepoints memcpy 512 bytes from the
64-byte header log. On the CPER path the source is a heap object, so the
copy runs 448 bytes past it and leaks kernel memory into a trace record
that userspace can read:
"""
[ 297.704020] BUG: KASAN: slab-out-of-bounds in trace_event_raw_event_cxl_port_aer_uncorrectable_error+0x318/0x4b0
[ 297.704032] Read of size 512 at addr ffff0000dd6ee118 by task bash/3078
[ 297.704038] CPU: 116 UID: 0 PID: 3078 Comm: bash Not tainted 7.1.0-rc6+ #1 PREEMPT(full)
[ 297.704041] Hardware name: , BIOS buildbrain-gcid-sbios-45660680 Wed May 27 08:27:58 AM UTC 2026
[ 297.704042] Call trace:
[ 297.704043] show_stack+0x24/0x50 (C)
[ 297.704049] dump_stack_lvl+0x80/0x140
[ 297.704053] print_report+0x100/0x630
[ 297.704057] kasan_report+0xb8/0x130
[ 297.704059] kasan_check_range+0x15c/0x240
[ 297.704061] __asan_memcpy+0x40/0xc8
[ 297.704064] trace_event_raw_event_cxl_port_aer_uncorrectable_error+0x318/0x4b0
[ 297.704066] __traceiter_cxl_port_aer_uncorrectable_error+0x90/0x108
[ 297.704068] cxl_ras_inject_set+0x278/0x3d0
[ 297.704070] simple_attr_write_xsigned.isra.0+0x198/0x298
[ 297.704074] simple_attr_write+0x44/0x88
[ 297.704076] debugfs_attr_write+0x78/0xd0
[ 297.704080] vfs_write+0x1f4/0x960
[ 297.704083] ksys_write+0x100/0x220
[ 297.704085] __arm64_sys_write+0x78/0xc8
[ 297.704087] invoke_syscall.constprop.0+0x150/0x200
[ 297.704090] do_el0_svc+0xd0/0x210
[ 297.704091] el0_svc+0x44/0x138
[ 297.704095] el0t_64_sync_handler+0xc0/0x108
[ 297.704097] el0t_64_sync+0x1b8/0x1c0
[ 297.704100] Allocated by task 3078:
[ 297.704102] kasan_save_stack+0x40/0x80
[ 297.704104] kasan_save_track+0x24/0x58
[ 297.704105] kasan_save_alloc_info+0x44/0x88
[ 297.704107] __kasan_kmalloc+0x108/0x110
[ 297.704108] __kmalloc_cache_noprof+0x1bc/0x588
[ 297.704111] cxl_ras_inject_set+0xcc/0x3d0
[ 297.704112] simple_attr_write_xsigned.isra.0+0x198/0x298
[ 297.704114] simple_attr_write+0x44/0x88
[ 297.704116] debugfs_attr_write+0x78/0xd0
[ 297.704117] vfs_write+0x1f4/0x960
[ 297.704119] ksys_write+0x100/0x220
[ 297.704120] __arm64_sys_write+0x78/0xc8
[ 297.704122] invoke_syscall.constprop.0+0x150/0x200
[ 297.704123] do_el0_svc+0xd0/0x210
[ 297.704124] el0_svc+0x44/0x138
[ 297.704125] el0t_64_sync_handler+0xc0/0x108
[ 297.704127] el0t_64_sync+0x1b8/0x1c0
[ 297.704129] The buggy address belongs to the object at ffff0000dd6ee100
which belongs to the cache kmalloc-rnd-09-96 of size 96
[ 297.704132] The buggy address is located 24 bytes inside of
allocated 88-byte region [ffff0000dd6ee100, ffff0000dd6ee158)
[ 297.704135] The buggy address belongs to the physical page:
[ 297.704138] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x15d6e
[ 297.704140] flags: 0x17fffc000000000(node=0|zone=2|lastcpupid=0x1ffff)
[ 297.704143] page_type: f5(slab)
[ 297.704147] raw: 017fffc000000000 ffff00008001c1c0 dead000000000100 dead000000000122
[ 297.704148] raw: 0000000000000000 0000000802000200 00000000f5000000 0000000000000000
[ 297.704149] page dumped because: kasan: bad access detected
[ 297.704150] Memory state around the buggy address:
[ 297.704151] ffff0000dd6ee000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 297.704152] ffff0000dd6ee080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 297.704153] >ffff0000dd6ee100: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
[ 297.704154] ^
[ 297.704155] ffff0000dd6ee180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 297.704155] ffff0000dd6ee200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 297.704156] =================================================================
"""
Define CXL_HEADERLOG_SIZE as SZ_64. The trace record's header_log field
shrinks from 128 to 16 dwords, but only those 16 were ever real data,
the rest was always junk.
Fixes: 2f6e9c305127 ("cxl/pci: add tracepoint events for CXL RAS")
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
drivers/cxl/cxl.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1297594beaec..f322d7c79ed2 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -158,8 +158,8 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXL_RAS_CAP_CONTROL_FE_MASK GENMASK(5, 0)
#define CXL_RAS_HEADER_LOG_OFFSET 0x18
#define CXL_RAS_CAPABILITY_LENGTH 0x58
-#define CXL_HEADERLOG_SIZE SZ_512
-#define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
+#define CXL_HEADERLOG_SIZE SZ_64
+#define CXL_HEADERLOG_SIZE_U32 (CXL_HEADERLOG_SIZE / sizeof(u32))
/* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
#define CXLDEV_CAP_ARRAY_OFFSET 0x0
base-commit: 6f3ed7fec72fc8979b2a8c7219c0a9fcfc8d07b5
--
2.43.0
On 6/4/26 9:16 PM, Richard Cheng wrote:
> The CXL RAS Header Log is 64 bytes, but CXL_HEADERLOG_SIZE is SZ_512,
> which is 512 bytes, not 512 bits, so the kernel treats it as 8x times
> bigger.
>
> header_log_copy() reads 448 bytes of MMIO past the register, and
> cxl_*_aer_uncorrectable_error() tracepoints memcpy 512 bytes from the
> 64-byte header log. On the CPER path the source is a heap object, so the
> copy runs 448 bytes past it and leaks kernel memory into a trace record
> that userspace can read:
I think Terry raised the same issue and this [1] is what Dan suggested for the fix as there's rasdaemon (user) impact.
[1]: https://lore.kernel.org/linux-cxl/6a0e33507e961_1717cc100f6@djbw-dev.notmuch/
DJ
>
> """
> [ 297.704020] BUG: KASAN: slab-out-of-bounds in trace_event_raw_event_cxl_port_aer_uncorrectable_error+0x318/0x4b0
> [ 297.704032] Read of size 512 at addr ffff0000dd6ee118 by task bash/3078
>
> [ 297.704038] CPU: 116 UID: 0 PID: 3078 Comm: bash Not tainted 7.1.0-rc6+ #1 PREEMPT(full)
> [ 297.704041] Hardware name: , BIOS buildbrain-gcid-sbios-45660680 Wed May 27 08:27:58 AM UTC 2026
> [ 297.704042] Call trace:
> [ 297.704043] show_stack+0x24/0x50 (C)
> [ 297.704049] dump_stack_lvl+0x80/0x140
> [ 297.704053] print_report+0x100/0x630
> [ 297.704057] kasan_report+0xb8/0x130
> [ 297.704059] kasan_check_range+0x15c/0x240
> [ 297.704061] __asan_memcpy+0x40/0xc8
> [ 297.704064] trace_event_raw_event_cxl_port_aer_uncorrectable_error+0x318/0x4b0
> [ 297.704066] __traceiter_cxl_port_aer_uncorrectable_error+0x90/0x108
> [ 297.704068] cxl_ras_inject_set+0x278/0x3d0
> [ 297.704070] simple_attr_write_xsigned.isra.0+0x198/0x298
> [ 297.704074] simple_attr_write+0x44/0x88
> [ 297.704076] debugfs_attr_write+0x78/0xd0
> [ 297.704080] vfs_write+0x1f4/0x960
> [ 297.704083] ksys_write+0x100/0x220
> [ 297.704085] __arm64_sys_write+0x78/0xc8
> [ 297.704087] invoke_syscall.constprop.0+0x150/0x200
> [ 297.704090] do_el0_svc+0xd0/0x210
> [ 297.704091] el0_svc+0x44/0x138
> [ 297.704095] el0t_64_sync_handler+0xc0/0x108
> [ 297.704097] el0t_64_sync+0x1b8/0x1c0
>
> [ 297.704100] Allocated by task 3078:
> [ 297.704102] kasan_save_stack+0x40/0x80
> [ 297.704104] kasan_save_track+0x24/0x58
> [ 297.704105] kasan_save_alloc_info+0x44/0x88
> [ 297.704107] __kasan_kmalloc+0x108/0x110
> [ 297.704108] __kmalloc_cache_noprof+0x1bc/0x588
> [ 297.704111] cxl_ras_inject_set+0xcc/0x3d0
> [ 297.704112] simple_attr_write_xsigned.isra.0+0x198/0x298
> [ 297.704114] simple_attr_write+0x44/0x88
> [ 297.704116] debugfs_attr_write+0x78/0xd0
> [ 297.704117] vfs_write+0x1f4/0x960
> [ 297.704119] ksys_write+0x100/0x220
> [ 297.704120] __arm64_sys_write+0x78/0xc8
> [ 297.704122] invoke_syscall.constprop.0+0x150/0x200
> [ 297.704123] do_el0_svc+0xd0/0x210
> [ 297.704124] el0_svc+0x44/0x138
> [ 297.704125] el0t_64_sync_handler+0xc0/0x108
> [ 297.704127] el0t_64_sync+0x1b8/0x1c0
>
> [ 297.704129] The buggy address belongs to the object at ffff0000dd6ee100
> which belongs to the cache kmalloc-rnd-09-96 of size 96
> [ 297.704132] The buggy address is located 24 bytes inside of
> allocated 88-byte region [ffff0000dd6ee100, ffff0000dd6ee158)
>
> [ 297.704135] The buggy address belongs to the physical page:
> [ 297.704138] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x15d6e
> [ 297.704140] flags: 0x17fffc000000000(node=0|zone=2|lastcpupid=0x1ffff)
> [ 297.704143] page_type: f5(slab)
> [ 297.704147] raw: 017fffc000000000 ffff00008001c1c0 dead000000000100 dead000000000122
> [ 297.704148] raw: 0000000000000000 0000000802000200 00000000f5000000 0000000000000000
> [ 297.704149] page dumped because: kasan: bad access detected
>
> [ 297.704150] Memory state around the buggy address:
> [ 297.704151] ffff0000dd6ee000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 297.704152] ffff0000dd6ee080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 297.704153] >ffff0000dd6ee100: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
> [ 297.704154] ^
> [ 297.704155] ffff0000dd6ee180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 297.704155] ffff0000dd6ee200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 297.704156] =================================================================
> """
>
> Define CXL_HEADERLOG_SIZE as SZ_64. The trace record's header_log field
> shrinks from 128 to 16 dwords, but only those 16 were ever real data,
> the rest was always junk.
>
> Fixes: 2f6e9c305127 ("cxl/pci: add tracepoint events for CXL RAS")
> Signed-off-by: Richard Cheng <icheng@nvidia.com>
> ---
> drivers/cxl/cxl.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1297594beaec..f322d7c79ed2 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -158,8 +158,8 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
> #define CXL_RAS_CAP_CONTROL_FE_MASK GENMASK(5, 0)
> #define CXL_RAS_HEADER_LOG_OFFSET 0x18
> #define CXL_RAS_CAPABILITY_LENGTH 0x58
> -#define CXL_HEADERLOG_SIZE SZ_512
> -#define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
> +#define CXL_HEADERLOG_SIZE SZ_64
> +#define CXL_HEADERLOG_SIZE_U32 (CXL_HEADERLOG_SIZE / sizeof(u32))
>
> /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
> #define CXLDEV_CAP_ARRAY_OFFSET 0x0
>
> base-commit: 6f3ed7fec72fc8979b2a8c7219c0a9fcfc8d07b5
On Fri, Jun 05, 2026 at 08:30:55AM +0800, Dave Jiang wrote:
>
>
> On 6/4/26 9:16 PM, Richard Cheng wrote:
> > The CXL RAS Header Log is 64 bytes, but CXL_HEADERLOG_SIZE is SZ_512,
> > which is 512 bytes, not 512 bits, so the kernel treats it as 8x times
> > bigger.
> >
> > header_log_copy() reads 448 bytes of MMIO past the register, and
> > cxl_*_aer_uncorrectable_error() tracepoints memcpy 512 bytes from the
> > 64-byte header log. On the CPER path the source is a heap object, so the
> > copy runs 448 bytes past it and leaks kernel memory into a trace record
> > that userspace can read:
>
> I think Terry raised the same issue and this [1] is what Dan suggested for the fix as there's rasdaemon (user) impact.
>
> [1]: https://lore.kernel.org/linux-cxl/6a0e33507e961_1717cc100f6@djbw-dev.notmuch/
>
> DJ
Hi Dave,
I see, so is he gonna do the fix or I can send a v2 with the suggested approach ?
Best regards,
Richard Cheng.
>
> >
> > """
> > [ 297.704020] BUG: KASAN: slab-out-of-bounds in trace_event_raw_event_cxl_port_aer_uncorrectable_error+0x318/0x4b0
> > [ 297.704032] Read of size 512 at addr ffff0000dd6ee118 by task bash/3078
> >
> > [ 297.704038] CPU: 116 UID: 0 PID: 3078 Comm: bash Not tainted 7.1.0-rc6+ #1 PREEMPT(full)
> > [ 297.704041] Hardware name: , BIOS buildbrain-gcid-sbios-45660680 Wed May 27 08:27:58 AM UTC 2026
> > [ 297.704042] Call trace:
> > [ 297.704043] show_stack+0x24/0x50 (C)
> > [ 297.704049] dump_stack_lvl+0x80/0x140
> > [ 297.704053] print_report+0x100/0x630
> > [ 297.704057] kasan_report+0xb8/0x130
> > [ 297.704059] kasan_check_range+0x15c/0x240
> > [ 297.704061] __asan_memcpy+0x40/0xc8
> > [ 297.704064] trace_event_raw_event_cxl_port_aer_uncorrectable_error+0x318/0x4b0
> > [ 297.704066] __traceiter_cxl_port_aer_uncorrectable_error+0x90/0x108
> > [ 297.704068] cxl_ras_inject_set+0x278/0x3d0
> > [ 297.704070] simple_attr_write_xsigned.isra.0+0x198/0x298
> > [ 297.704074] simple_attr_write+0x44/0x88
> > [ 297.704076] debugfs_attr_write+0x78/0xd0
> > [ 297.704080] vfs_write+0x1f4/0x960
> > [ 297.704083] ksys_write+0x100/0x220
> > [ 297.704085] __arm64_sys_write+0x78/0xc8
> > [ 297.704087] invoke_syscall.constprop.0+0x150/0x200
> > [ 297.704090] do_el0_svc+0xd0/0x210
> > [ 297.704091] el0_svc+0x44/0x138
> > [ 297.704095] el0t_64_sync_handler+0xc0/0x108
> > [ 297.704097] el0t_64_sync+0x1b8/0x1c0
> >
> > [ 297.704100] Allocated by task 3078:
> > [ 297.704102] kasan_save_stack+0x40/0x80
> > [ 297.704104] kasan_save_track+0x24/0x58
> > [ 297.704105] kasan_save_alloc_info+0x44/0x88
> > [ 297.704107] __kasan_kmalloc+0x108/0x110
> > [ 297.704108] __kmalloc_cache_noprof+0x1bc/0x588
> > [ 297.704111] cxl_ras_inject_set+0xcc/0x3d0
> > [ 297.704112] simple_attr_write_xsigned.isra.0+0x198/0x298
> > [ 297.704114] simple_attr_write+0x44/0x88
> > [ 297.704116] debugfs_attr_write+0x78/0xd0
> > [ 297.704117] vfs_write+0x1f4/0x960
> > [ 297.704119] ksys_write+0x100/0x220
> > [ 297.704120] __arm64_sys_write+0x78/0xc8
> > [ 297.704122] invoke_syscall.constprop.0+0x150/0x200
> > [ 297.704123] do_el0_svc+0xd0/0x210
> > [ 297.704124] el0_svc+0x44/0x138
> > [ 297.704125] el0t_64_sync_handler+0xc0/0x108
> > [ 297.704127] el0t_64_sync+0x1b8/0x1c0
> >
> > [ 297.704129] The buggy address belongs to the object at ffff0000dd6ee100
> > which belongs to the cache kmalloc-rnd-09-96 of size 96
> > [ 297.704132] The buggy address is located 24 bytes inside of
> > allocated 88-byte region [ffff0000dd6ee100, ffff0000dd6ee158)
> >
> > [ 297.704135] The buggy address belongs to the physical page:
> > [ 297.704138] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x15d6e
> > [ 297.704140] flags: 0x17fffc000000000(node=0|zone=2|lastcpupid=0x1ffff)
> > [ 297.704143] page_type: f5(slab)
> > [ 297.704147] raw: 017fffc000000000 ffff00008001c1c0 dead000000000100 dead000000000122
> > [ 297.704148] raw: 0000000000000000 0000000802000200 00000000f5000000 0000000000000000
> > [ 297.704149] page dumped because: kasan: bad access detected
> >
> > [ 297.704150] Memory state around the buggy address:
> > [ 297.704151] ffff0000dd6ee000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [ 297.704152] ffff0000dd6ee080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [ 297.704153] >ffff0000dd6ee100: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
> > [ 297.704154] ^
> > [ 297.704155] ffff0000dd6ee180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [ 297.704155] ffff0000dd6ee200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [ 297.704156] =================================================================
> > """
> >
> > Define CXL_HEADERLOG_SIZE as SZ_64. The trace record's header_log field
> > shrinks from 128 to 16 dwords, but only those 16 were ever real data,
> > the rest was always junk.
> >
> > Fixes: 2f6e9c305127 ("cxl/pci: add tracepoint events for CXL RAS")
> > Signed-off-by: Richard Cheng <icheng@nvidia.com>
> > ---
> > drivers/cxl/cxl.h | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 1297594beaec..f322d7c79ed2 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -158,8 +158,8 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
> > #define CXL_RAS_CAP_CONTROL_FE_MASK GENMASK(5, 0)
> > #define CXL_RAS_HEADER_LOG_OFFSET 0x18
> > #define CXL_RAS_CAPABILITY_LENGTH 0x58
> > -#define CXL_HEADERLOG_SIZE SZ_512
> > -#define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
> > +#define CXL_HEADERLOG_SIZE SZ_64
> > +#define CXL_HEADERLOG_SIZE_U32 (CXL_HEADERLOG_SIZE / sizeof(u32))
> >
> > /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
> > #define CXLDEV_CAP_ARRAY_OFFSET 0x0
> >
> > base-commit: 6f3ed7fec72fc8979b2a8c7219c0a9fcfc8d07b5
>
On 6/5/2026 11:59 AM, Richard Cheng wrote:
> On Fri, Jun 05, 2026 at 08:30:55AM +0800, Dave Jiang wrote:
>>
>>
>> On 6/4/26 9:16 PM, Richard Cheng wrote:
>>> The CXL RAS Header Log is 64 bytes, but CXL_HEADERLOG_SIZE is SZ_512,
>>> which is 512 bytes, not 512 bits, so the kernel treats it as 8x times
>>> bigger.
>>>
>>> header_log_copy() reads 448 bytes of MMIO past the register, and
>>> cxl_*_aer_uncorrectable_error() tracepoints memcpy 512 bytes from the
>>> 64-byte header log. On the CPER path the source is a heap object, so the
>>> copy runs 448 bytes past it and leaks kernel memory into a trace record
>>> that userspace can read:
>>
>> I think Terry raised the same issue and this [1] is what Dan suggested for the fix as there's rasdaemon (user) impact.
>>
>> [1]: https://lore.kernel.org/linux-cxl/6a0e33507e961_1717cc100f6@djbw-dev.notmuch/
>>
>> DJ
>
> Hi Dave,
>
> I see, so is he gonna do the fix or I can send a v2 with the suggested approach ?
>
> Best regards,
> Richard Cheng.
>
Hi Richard,
I will send that fix today.
Regards,
Terry
>>
>>>
>>> """
>>> [ 297.704020] BUG: KASAN: slab-out-of-bounds in trace_event_raw_event_cxl_port_aer_uncorrectable_error+0x318/0x4b0
>>> [ 297.704032] Read of size 512 at addr ffff0000dd6ee118 by task bash/3078
>>>
>>> [ 297.704038] CPU: 116 UID: 0 PID: 3078 Comm: bash Not tainted 7.1.0-rc6+ #1 PREEMPT(full)
>>> [ 297.704041] Hardware name: , BIOS buildbrain-gcid-sbios-45660680 Wed May 27 08:27:58 AM UTC 2026
>>> [ 297.704042] Call trace:
>>> [ 297.704043] show_stack+0x24/0x50 (C)
>>> [ 297.704049] dump_stack_lvl+0x80/0x140
>>> [ 297.704053] print_report+0x100/0x630
>>> [ 297.704057] kasan_report+0xb8/0x130
>>> [ 297.704059] kasan_check_range+0x15c/0x240
>>> [ 297.704061] __asan_memcpy+0x40/0xc8
>>> [ 297.704064] trace_event_raw_event_cxl_port_aer_uncorrectable_error+0x318/0x4b0
>>> [ 297.704066] __traceiter_cxl_port_aer_uncorrectable_error+0x90/0x108
>>> [ 297.704068] cxl_ras_inject_set+0x278/0x3d0
>>> [ 297.704070] simple_attr_write_xsigned.isra.0+0x198/0x298
>>> [ 297.704074] simple_attr_write+0x44/0x88
>>> [ 297.704076] debugfs_attr_write+0x78/0xd0
>>> [ 297.704080] vfs_write+0x1f4/0x960
>>> [ 297.704083] ksys_write+0x100/0x220
>>> [ 297.704085] __arm64_sys_write+0x78/0xc8
>>> [ 297.704087] invoke_syscall.constprop.0+0x150/0x200
>>> [ 297.704090] do_el0_svc+0xd0/0x210
>>> [ 297.704091] el0_svc+0x44/0x138
>>> [ 297.704095] el0t_64_sync_handler+0xc0/0x108
>>> [ 297.704097] el0t_64_sync+0x1b8/0x1c0
>>>
>>> [ 297.704100] Allocated by task 3078:
>>> [ 297.704102] kasan_save_stack+0x40/0x80
>>> [ 297.704104] kasan_save_track+0x24/0x58
>>> [ 297.704105] kasan_save_alloc_info+0x44/0x88
>>> [ 297.704107] __kasan_kmalloc+0x108/0x110
>>> [ 297.704108] __kmalloc_cache_noprof+0x1bc/0x588
>>> [ 297.704111] cxl_ras_inject_set+0xcc/0x3d0
>>> [ 297.704112] simple_attr_write_xsigned.isra.0+0x198/0x298
>>> [ 297.704114] simple_attr_write+0x44/0x88
>>> [ 297.704116] debugfs_attr_write+0x78/0xd0
>>> [ 297.704117] vfs_write+0x1f4/0x960
>>> [ 297.704119] ksys_write+0x100/0x220
>>> [ 297.704120] __arm64_sys_write+0x78/0xc8
>>> [ 297.704122] invoke_syscall.constprop.0+0x150/0x200
>>> [ 297.704123] do_el0_svc+0xd0/0x210
>>> [ 297.704124] el0_svc+0x44/0x138
>>> [ 297.704125] el0t_64_sync_handler+0xc0/0x108
>>> [ 297.704127] el0t_64_sync+0x1b8/0x1c0
>>>
>>> [ 297.704129] The buggy address belongs to the object at ffff0000dd6ee100
>>> which belongs to the cache kmalloc-rnd-09-96 of size 96
>>> [ 297.704132] The buggy address is located 24 bytes inside of
>>> allocated 88-byte region [ffff0000dd6ee100, ffff0000dd6ee158)
>>>
>>> [ 297.704135] The buggy address belongs to the physical page:
>>> [ 297.704138] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x15d6e
>>> [ 297.704140] flags: 0x17fffc000000000(node=0|zone=2|lastcpupid=0x1ffff)
>>> [ 297.704143] page_type: f5(slab)
>>> [ 297.704147] raw: 017fffc000000000 ffff00008001c1c0 dead000000000100 dead000000000122
>>> [ 297.704148] raw: 0000000000000000 0000000802000200 00000000f5000000 0000000000000000
>>> [ 297.704149] page dumped because: kasan: bad access detected
>>>
>>> [ 297.704150] Memory state around the buggy address:
>>> [ 297.704151] ffff0000dd6ee000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>> [ 297.704152] ffff0000dd6ee080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>> [ 297.704153] >ffff0000dd6ee100: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
>>> [ 297.704154] ^
>>> [ 297.704155] ffff0000dd6ee180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>> [ 297.704155] ffff0000dd6ee200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>> [ 297.704156] =================================================================
>>> """
>>>
>>> Define CXL_HEADERLOG_SIZE as SZ_64. The trace record's header_log field
>>> shrinks from 128 to 16 dwords, but only those 16 were ever real data,
>>> the rest was always junk.
>>>
>>> Fixes: 2f6e9c305127 ("cxl/pci: add tracepoint events for CXL RAS")
>>> Signed-off-by: Richard Cheng <icheng@nvidia.com>
>>> ---
>>> drivers/cxl/cxl.h | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>>> index 1297594beaec..f322d7c79ed2 100644
>>> --- a/drivers/cxl/cxl.h
>>> +++ b/drivers/cxl/cxl.h
>>> @@ -158,8 +158,8 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>>> #define CXL_RAS_CAP_CONTROL_FE_MASK GENMASK(5, 0)
>>> #define CXL_RAS_HEADER_LOG_OFFSET 0x18
>>> #define CXL_RAS_CAPABILITY_LENGTH 0x58
>>> -#define CXL_HEADERLOG_SIZE SZ_512
>>> -#define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
>>> +#define CXL_HEADERLOG_SIZE SZ_64
>>> +#define CXL_HEADERLOG_SIZE_U32 (CXL_HEADERLOG_SIZE / sizeof(u32))
>>>
>>> /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
>>> #define CXLDEV_CAP_ARRAY_OFFSET 0x0
>>>
>>> base-commit: 6f3ed7fec72fc8979b2a8c7219c0a9fcfc8d07b5
>>
On Fri, Jun 05, 2026 at 12:41:15PM +0800, Bowman, Terry wrote:
> On 6/5/2026 11:59 AM, Richard Cheng wrote:
> > On Fri, Jun 05, 2026 at 08:30:55AM +0800, Dave Jiang wrote:
> >>
> >>
> >> On 6/4/26 9:16 PM, Richard Cheng wrote:
> >>> The CXL RAS Header Log is 64 bytes, but CXL_HEADERLOG_SIZE is SZ_512,
> >>> which is 512 bytes, not 512 bits, so the kernel treats it as 8x times
> >>> bigger.
> >>>
> >>> header_log_copy() reads 448 bytes of MMIO past the register, and
> >>> cxl_*_aer_uncorrectable_error() tracepoints memcpy 512 bytes from the
> >>> 64-byte header log. On the CPER path the source is a heap object, so the
> >>> copy runs 448 bytes past it and leaks kernel memory into a trace record
> >>> that userspace can read:
> >>
> >> I think Terry raised the same issue and this [1] is what Dan suggested for the fix as there's rasdaemon (user) impact.
> >>
> >> [1]: https://lore.kernel.org/linux-cxl/6a0e33507e961_1717cc100f6@djbw-dev.notmuch/
> >>
> >> DJ
> >
> > Hi Dave,
> >
> > I see, so is he gonna do the fix or I can send a v2 with the suggested approach ?
> >
> > Best regards,
> > Richard Cheng.
> >
>
> Hi Richard,
>
> I will send that fix today.
>
> Regards,
> Terry
>
Hi Terry,
That would be awesome !
If you don't mind, can you add me in the cc list ?
I would love to learn from your approach.
Thanks,
Richard Cheng
> >>
> >>>
> >>> """
> >>> [ 297.704020] BUG: KASAN: slab-out-of-bounds in trace_event_raw_event_cxl_port_aer_uncorrectable_error+0x318/0x4b0
> >>> [ 297.704032] Read of size 512 at addr ffff0000dd6ee118 by task bash/3078
> >>>
> >>> [ 297.704038] CPU: 116 UID: 0 PID: 3078 Comm: bash Not tainted 7.1.0-rc6+ #1 PREEMPT(full)
> >>> [ 297.704041] Hardware name: , BIOS buildbrain-gcid-sbios-45660680 Wed May 27 08:27:58 AM UTC 2026
> >>> [ 297.704042] Call trace:
> >>> [ 297.704043] show_stack+0x24/0x50 (C)
> >>> [ 297.704049] dump_stack_lvl+0x80/0x140
> >>> [ 297.704053] print_report+0x100/0x630
> >>> [ 297.704057] kasan_report+0xb8/0x130
> >>> [ 297.704059] kasan_check_range+0x15c/0x240
> >>> [ 297.704061] __asan_memcpy+0x40/0xc8
> >>> [ 297.704064] trace_event_raw_event_cxl_port_aer_uncorrectable_error+0x318/0x4b0
> >>> [ 297.704066] __traceiter_cxl_port_aer_uncorrectable_error+0x90/0x108
> >>> [ 297.704068] cxl_ras_inject_set+0x278/0x3d0
> >>> [ 297.704070] simple_attr_write_xsigned.isra.0+0x198/0x298
> >>> [ 297.704074] simple_attr_write+0x44/0x88
> >>> [ 297.704076] debugfs_attr_write+0x78/0xd0
> >>> [ 297.704080] vfs_write+0x1f4/0x960
> >>> [ 297.704083] ksys_write+0x100/0x220
> >>> [ 297.704085] __arm64_sys_write+0x78/0xc8
> >>> [ 297.704087] invoke_syscall.constprop.0+0x150/0x200
> >>> [ 297.704090] do_el0_svc+0xd0/0x210
> >>> [ 297.704091] el0_svc+0x44/0x138
> >>> [ 297.704095] el0t_64_sync_handler+0xc0/0x108
> >>> [ 297.704097] el0t_64_sync+0x1b8/0x1c0
> >>>
> >>> [ 297.704100] Allocated by task 3078:
> >>> [ 297.704102] kasan_save_stack+0x40/0x80
> >>> [ 297.704104] kasan_save_track+0x24/0x58
> >>> [ 297.704105] kasan_save_alloc_info+0x44/0x88
> >>> [ 297.704107] __kasan_kmalloc+0x108/0x110
> >>> [ 297.704108] __kmalloc_cache_noprof+0x1bc/0x588
> >>> [ 297.704111] cxl_ras_inject_set+0xcc/0x3d0
> >>> [ 297.704112] simple_attr_write_xsigned.isra.0+0x198/0x298
> >>> [ 297.704114] simple_attr_write+0x44/0x88
> >>> [ 297.704116] debugfs_attr_write+0x78/0xd0
> >>> [ 297.704117] vfs_write+0x1f4/0x960
> >>> [ 297.704119] ksys_write+0x100/0x220
> >>> [ 297.704120] __arm64_sys_write+0x78/0xc8
> >>> [ 297.704122] invoke_syscall.constprop.0+0x150/0x200
> >>> [ 297.704123] do_el0_svc+0xd0/0x210
> >>> [ 297.704124] el0_svc+0x44/0x138
> >>> [ 297.704125] el0t_64_sync_handler+0xc0/0x108
> >>> [ 297.704127] el0t_64_sync+0x1b8/0x1c0
> >>>
> >>> [ 297.704129] The buggy address belongs to the object at ffff0000dd6ee100
> >>> which belongs to the cache kmalloc-rnd-09-96 of size 96
> >>> [ 297.704132] The buggy address is located 24 bytes inside of
> >>> allocated 88-byte region [ffff0000dd6ee100, ffff0000dd6ee158)
> >>>
> >>> [ 297.704135] The buggy address belongs to the physical page:
> >>> [ 297.704138] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x15d6e
> >>> [ 297.704140] flags: 0x17fffc000000000(node=0|zone=2|lastcpupid=0x1ffff)
> >>> [ 297.704143] page_type: f5(slab)
> >>> [ 297.704147] raw: 017fffc000000000 ffff00008001c1c0 dead000000000100 dead000000000122
> >>> [ 297.704148] raw: 0000000000000000 0000000802000200 00000000f5000000 0000000000000000
> >>> [ 297.704149] page dumped because: kasan: bad access detected
> >>>
> >>> [ 297.704150] Memory state around the buggy address:
> >>> [ 297.704151] ffff0000dd6ee000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >>> [ 297.704152] ffff0000dd6ee080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >>> [ 297.704153] >ffff0000dd6ee100: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
> >>> [ 297.704154] ^
> >>> [ 297.704155] ffff0000dd6ee180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >>> [ 297.704155] ffff0000dd6ee200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >>> [ 297.704156] =================================================================
> >>> """
> >>>
> >>> Define CXL_HEADERLOG_SIZE as SZ_64. The trace record's header_log field
> >>> shrinks from 128 to 16 dwords, but only those 16 were ever real data,
> >>> the rest was always junk.
> >>>
> >>> Fixes: 2f6e9c305127 ("cxl/pci: add tracepoint events for CXL RAS")
> >>> Signed-off-by: Richard Cheng <icheng@nvidia.com>
> >>> ---
> >>> drivers/cxl/cxl.h | 4 ++--
> >>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> >>> index 1297594beaec..f322d7c79ed2 100644
> >>> --- a/drivers/cxl/cxl.h
> >>> +++ b/drivers/cxl/cxl.h
> >>> @@ -158,8 +158,8 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
> >>> #define CXL_RAS_CAP_CONTROL_FE_MASK GENMASK(5, 0)
> >>> #define CXL_RAS_HEADER_LOG_OFFSET 0x18
> >>> #define CXL_RAS_CAPABILITY_LENGTH 0x58
> >>> -#define CXL_HEADERLOG_SIZE SZ_512
> >>> -#define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
> >>> +#define CXL_HEADERLOG_SIZE SZ_64
> >>> +#define CXL_HEADERLOG_SIZE_U32 (CXL_HEADERLOG_SIZE / sizeof(u32))
> >>>
> >>> /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
> >>> #define CXLDEV_CAP_ARRAY_OFFSET 0x0
> >>>
> >>> base-commit: 6f3ed7fec72fc8979b2a8c7219c0a9fcfc8d07b5
> >>
>
On 6/5/2026 12:59 PM, Richard Cheng wrote:
> On Fri, Jun 05, 2026 at 12:41:15PM +0800, Bowman, Terry wrote:
>> On 6/5/2026 11:59 AM, Richard Cheng wrote:
>>> On Fri, Jun 05, 2026 at 08:30:55AM +0800, Dave Jiang wrote:
>>>>
>>>>
>>>> On 6/4/26 9:16 PM, Richard Cheng wrote:
>>>>> The CXL RAS Header Log is 64 bytes, but CXL_HEADERLOG_SIZE is SZ_512,
>>>>> which is 512 bytes, not 512 bits, so the kernel treats it as 8x times
>>>>> bigger.
>>>>>
>>>>> header_log_copy() reads 448 bytes of MMIO past the register, and
>>>>> cxl_*_aer_uncorrectable_error() tracepoints memcpy 512 bytes from the
>>>>> 64-byte header log. On the CPER path the source is a heap object, so the
>>>>> copy runs 448 bytes past it and leaks kernel memory into a trace record
>>>>> that userspace can read:
>>>>
>>>> I think Terry raised the same issue and this [1] is what Dan suggested for the fix as there's rasdaemon (user) impact.
>>>>
>>>> [1]: https://lore.kernel.org/linux-cxl/6a0e33507e961_1717cc100f6@djbw-dev.notmuch/
>>>>
>>>> DJ
>>>
>>> Hi Dave,
>>>
>>> I see, so is he gonna do the fix or I can send a v2 with the suggested approach ?
>>>
>>> Best regards,
>>> Richard Cheng.
>>>
>>
>> Hi Richard,
>>
>> I will send that fix today.
>>
>> Regards,
>> Terry
>>
>
> Hi Terry,
>
> That would be awesome !
> If you don't mind, can you add me in the cc list ?
> I would love to learn from your approach.
>
> Thanks,
> Richard Cheng
>
Hi Richard,
I just sent but missed adding you to the list. My apologies. The patch is here:
https://lore.kernel.org/linux-cxl/20260605180610.2249458-1-terry.bowman@amd.com/T/#u
Regards,
Terry
>>>>
>>>>>
>>>>> """
>>>>> [ 297.704020] BUG: KASAN: slab-out-of-bounds in trace_event_raw_event_cxl_port_aer_uncorrectable_error+0x318/0x4b0
>>>>> [ 297.704032] Read of size 512 at addr ffff0000dd6ee118 by task bash/3078
>>>>>
>>>>> [ 297.704038] CPU: 116 UID: 0 PID: 3078 Comm: bash Not tainted 7.1.0-rc6+ #1 PREEMPT(full)
>>>>> [ 297.704041] Hardware name: , BIOS buildbrain-gcid-sbios-45660680 Wed May 27 08:27:58 AM UTC 2026
>>>>> [ 297.704042] Call trace:
>>>>> [ 297.704043] show_stack+0x24/0x50 (C)
>>>>> [ 297.704049] dump_stack_lvl+0x80/0x140
>>>>> [ 297.704053] print_report+0x100/0x630
>>>>> [ 297.704057] kasan_report+0xb8/0x130
>>>>> [ 297.704059] kasan_check_range+0x15c/0x240
>>>>> [ 297.704061] __asan_memcpy+0x40/0xc8
>>>>> [ 297.704064] trace_event_raw_event_cxl_port_aer_uncorrectable_error+0x318/0x4b0
>>>>> [ 297.704066] __traceiter_cxl_port_aer_uncorrectable_error+0x90/0x108
>>>>> [ 297.704068] cxl_ras_inject_set+0x278/0x3d0
>>>>> [ 297.704070] simple_attr_write_xsigned.isra.0+0x198/0x298
>>>>> [ 297.704074] simple_attr_write+0x44/0x88
>>>>> [ 297.704076] debugfs_attr_write+0x78/0xd0
>>>>> [ 297.704080] vfs_write+0x1f4/0x960
>>>>> [ 297.704083] ksys_write+0x100/0x220
>>>>> [ 297.704085] __arm64_sys_write+0x78/0xc8
>>>>> [ 297.704087] invoke_syscall.constprop.0+0x150/0x200
>>>>> [ 297.704090] do_el0_svc+0xd0/0x210
>>>>> [ 297.704091] el0_svc+0x44/0x138
>>>>> [ 297.704095] el0t_64_sync_handler+0xc0/0x108
>>>>> [ 297.704097] el0t_64_sync+0x1b8/0x1c0
>>>>>
>>>>> [ 297.704100] Allocated by task 3078:
>>>>> [ 297.704102] kasan_save_stack+0x40/0x80
>>>>> [ 297.704104] kasan_save_track+0x24/0x58
>>>>> [ 297.704105] kasan_save_alloc_info+0x44/0x88
>>>>> [ 297.704107] __kasan_kmalloc+0x108/0x110
>>>>> [ 297.704108] __kmalloc_cache_noprof+0x1bc/0x588
>>>>> [ 297.704111] cxl_ras_inject_set+0xcc/0x3d0
>>>>> [ 297.704112] simple_attr_write_xsigned.isra.0+0x198/0x298
>>>>> [ 297.704114] simple_attr_write+0x44/0x88
>>>>> [ 297.704116] debugfs_attr_write+0x78/0xd0
>>>>> [ 297.704117] vfs_write+0x1f4/0x960
>>>>> [ 297.704119] ksys_write+0x100/0x220
>>>>> [ 297.704120] __arm64_sys_write+0x78/0xc8
>>>>> [ 297.704122] invoke_syscall.constprop.0+0x150/0x200
>>>>> [ 297.704123] do_el0_svc+0xd0/0x210
>>>>> [ 297.704124] el0_svc+0x44/0x138
>>>>> [ 297.704125] el0t_64_sync_handler+0xc0/0x108
>>>>> [ 297.704127] el0t_64_sync+0x1b8/0x1c0
>>>>>
>>>>> [ 297.704129] The buggy address belongs to the object at ffff0000dd6ee100
>>>>> which belongs to the cache kmalloc-rnd-09-96 of size 96
>>>>> [ 297.704132] The buggy address is located 24 bytes inside of
>>>>> allocated 88-byte region [ffff0000dd6ee100, ffff0000dd6ee158)
>>>>>
>>>>> [ 297.704135] The buggy address belongs to the physical page:
>>>>> [ 297.704138] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x15d6e
>>>>> [ 297.704140] flags: 0x17fffc000000000(node=0|zone=2|lastcpupid=0x1ffff)
>>>>> [ 297.704143] page_type: f5(slab)
>>>>> [ 297.704147] raw: 017fffc000000000 ffff00008001c1c0 dead000000000100 dead000000000122
>>>>> [ 297.704148] raw: 0000000000000000 0000000802000200 00000000f5000000 0000000000000000
>>>>> [ 297.704149] page dumped because: kasan: bad access detected
>>>>>
>>>>> [ 297.704150] Memory state around the buggy address:
>>>>> [ 297.704151] ffff0000dd6ee000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>>>> [ 297.704152] ffff0000dd6ee080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>>>> [ 297.704153] >ffff0000dd6ee100: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
>>>>> [ 297.704154] ^
>>>>> [ 297.704155] ffff0000dd6ee180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>>>> [ 297.704155] ffff0000dd6ee200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>>>> [ 297.704156] =================================================================
>>>>> """
>>>>>
>>>>> Define CXL_HEADERLOG_SIZE as SZ_64. The trace record's header_log field
>>>>> shrinks from 128 to 16 dwords, but only those 16 were ever real data,
>>>>> the rest was always junk.
>>>>>
>>>>> Fixes: 2f6e9c305127 ("cxl/pci: add tracepoint events for CXL RAS")
>>>>> Signed-off-by: Richard Cheng <icheng@nvidia.com>
>>>>> ---
>>>>> drivers/cxl/cxl.h | 4 ++--
>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>>>>> index 1297594beaec..f322d7c79ed2 100644
>>>>> --- a/drivers/cxl/cxl.h
>>>>> +++ b/drivers/cxl/cxl.h
>>>>> @@ -158,8 +158,8 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>>>>> #define CXL_RAS_CAP_CONTROL_FE_MASK GENMASK(5, 0)
>>>>> #define CXL_RAS_HEADER_LOG_OFFSET 0x18
>>>>> #define CXL_RAS_CAPABILITY_LENGTH 0x58
>>>>> -#define CXL_HEADERLOG_SIZE SZ_512
>>>>> -#define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
>>>>> +#define CXL_HEADERLOG_SIZE SZ_64
>>>>> +#define CXL_HEADERLOG_SIZE_U32 (CXL_HEADERLOG_SIZE / sizeof(u32))
>>>>>
>>>>> /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
>>>>> #define CXLDEV_CAP_ARRAY_OFFSET 0x0
>>>>>
>>>>> base-commit: 6f3ed7fec72fc8979b2a8c7219c0a9fcfc8d07b5
>>>>
>>
© 2016 - 2026 Red Hat, Inc.