[v3] ceph: fix OOB read in ceph_osdc_list_watchers via uncapped outdata_len

[PATCH v3] ceph: fix OOB read in ceph_osdc_list_watchers via uncapped outdata_len

Posted by Pavitra Jha 5 days, 22 hours ago

The OSD reply header field op->payload_len is wire-controlled and is
copied directly into m->outdata_len[i] without any bounds check:

  m->outdata_len[i] = le32_to_cpu(op->payload_len);

This value propagates unchecked to req->r_ops[0].outdata_len and is
then used to set the decode boundary in ceph_osdc_list_watchers():

  void *const end = p + req->r_ops[0].outdata_len;

The actual data allocation is always exactly one page:
  ceph_alloc_page_vector(1, GFP_NOIO)
  ceph_osd_data_pages_init(..., PAGE_SIZE, ...)

The messenger caps the copy to PAGE_SIZE bytes, but the decode window
end is set from the uncapped wire value. A malicious OSD can send
outdata_len=0x10000, causing _safe decoder boundary checks to pass
while the physical reads cross the slab allocation boundary.

KASAN report (kernel 7.0.0-rc7, QEMU/x86_64, KASLR disabled):

  ==================================================================
  BUG: KASAN: slab-out-of-bounds in ceph_oob2_init+0x23d/0xff0 [ceph_oob2_poc]
  Read of size 4 at addr ffff88800a229f9e by task insmod/57

  CPU: 0 UID: 0 PID: 57 Comm: insmod Tainted: G           O        7.0.0-rc7-g9c2abf69da83-dirty #15 PREEMPT(lazy)
  Tainted: [O]=OOT_MODULE
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4d/0x70
   print_report+0x170/0x4f3
   ? __pfx__raw_spin_lock_irqsave+0x10/0x10
   kasan_report+0xda/0x110
   ? ceph_oob2_init+0x23d/0xff0 [ceph_oob2_poc]
   ? ceph_oob2_init+0x23d/0xff0 [ceph_oob2_poc]
   ? __pfx_ceph_oob2_init+0x10/0x10 [ceph_oob2_poc]
   ceph_oob2_init+0x23d/0xff0 [ceph_oob2_poc]
   do_one_initcall+0x9a/0x3a0
   ? __pfx_do_one_initcall+0x10/0x10
   ? kasan_unpoison+0x44/0x70
   do_init_module+0x27c/0x790
   ? __pfx_do_init_module+0x10/0x10
   ? __kasan_slab_free+0x47/0x70
   ? kfree+0x15f/0x3b0
   load_module+0x4a9a/0x6350
   ? __pfx_load_module+0x10/0x10
   ? security_file_permission+0x24/0x50
   ? kernel_read_file+0x2ed/0x770
   ? init_module_from_file+0x15c/0x180
   init_module_from_file+0x15c/0x180
   ? __pfx_init_module_from_file+0x10/0x10
   ? tick_nohz_handler+0x2a3/0x640
   ? _raw_spin_lock+0x7e/0xd0
   idempotent_init_module+0x21f/0x750
   ? __pfx_idempotent_init_module+0x10/0x10
   ? fdget+0x4e/0x4a0
   ? fdget+0x4e/0x4a0
   __x64_sys_finit_module+0xba/0x120
   do_syscall_64+0xe2/0x570
   ? exc_page_fault+0x66/0xb0
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  Allocated by task 57:
   kasan_save_stack+0x30/0x50
   kasan_save_track+0x14/0x30
   __kasan_kmalloc+0x7f/0x90
   ceph_oob2_init+0x44/0xff0 [ceph_oob2_poc]
   do_one_initcall+0x9a/0x3a0
   do_init_module+0x27c/0x790
   load_module+0x4a9a/0x6350
   init_module_from_file+0x15c/0x180
   idempotent_init_module+0x21f/0x750
   __x64_sys_finit_module+0xba/0x120
   do_syscall_64+0xe2/0x570
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  The buggy address belongs to the object at ffff88800a229000
   which belongs to the cache kmalloc-4k of size 4096
  The buggy address is located 3998 bytes inside of
   allocated 4000-byte region [ffff88800a229000, ffff88800a229fa0)

  Memory state around the buggy address:
   ffff88800a229e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   ffff88800a229f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  >ffff88800a229f80: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
                                 ^
   ffff88800a22a000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   ffff88800a22a080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ==================================================================

  val=0xccccaaaa (OOB garbage from KASAN redzone)

Fix by introducing buf_len to hold the allocation size, using it in
both ceph_osd_data_pages_init() and the min_t() decode boundary cap,
so the two are guaranteed to stay in sync if the buffer size changes.

Attacker model: a malicious or compromised OSD in a multi-tenant
Ceph deployment can trigger this against any client issuing
CEPH_OSD_OP_LIST_WATCHERS without further privileges beyond OSD
session establishment.

Fixes: a4ed38d7a180 ("libceph: support for CEPH_OSD_OP_LIST_WATCHERS")
Cc: stable@vger.kernel.org
Signed-off-by: Pavitra Jha <jhapavitra98@gmail.com>
---
v3: Split overlong min_t() line to fit 80-column limit,
    per Viacheslav Dubeyko's review of v2.
v2: Introduce buf_len variable instead of hardcoding PAGE_SIZE
    independently in ceph_osd_data_pages_init() and the min_t() cap,
    per Viacheslav Dubeyko's review.
---
 net/ceph/osd_client.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index a67093cf4..0a55bc1f9 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -5063,6 +5063,7 @@ int ceph_osdc_list_watchers(struct ceph_osd_client *osdc,
 	struct ceph_osd_request *req;
 	struct page **pages;
 	int ret;
+	const size_t buf_len = PAGE_SIZE;
 
 	req = ceph_osdc_alloc_request(osdc, NULL, 1, false, GFP_NOIO);
 	if (!req)
@@ -5081,7 +5082,7 @@ int ceph_osdc_list_watchers(struct ceph_osd_client *osdc,
 	osd_req_op_init(req, 0, CEPH_OSD_OP_LIST_WATCHERS, 0);
 	ceph_osd_data_pages_init(osd_req_op_data(req, 0, list_watchers,
 						 response_data),
-				 pages, PAGE_SIZE, 0, false, true);
+				 pages, buf_len, 0, false, true);
 
 	ret = ceph_osdc_alloc_messages(req, GFP_NOIO);
 	if (ret)
@@ -5091,7 +5092,8 @@ int ceph_osdc_list_watchers(struct ceph_osd_client *osdc,
 	ret = ceph_osdc_wait_request(osdc, req);
 	if (ret >= 0) {
 		void *p = page_address(pages[0]);
-		void *const end = p + min_t(u32, req->r_ops[0].outdata_len, PAGE_SIZE);
+		void *const end = p +
+			min_t(u32, req->r_ops[0].outdata_len, buf_len);
 
 		ret = decode_watchers(&p, end, watchers, num_watchers);
 	}
-- 
2.53.0

Re: [PATCH v3] ceph: fix OOB read in ceph_osdc_list_watchers via uncapped outdata_len

Posted by Viacheslav Dubeyko 5 days, 10 hours ago

On Tue, 2026-06-02 at 00:54 -0400, Pavitra Jha wrote:
> The OSD reply header field op->payload_len is wire-controlled and is
> copied directly into m->outdata_len[i] without any bounds check:
> 
>   m->outdata_len[i] = le32_to_cpu(op->payload_len);
> 
> This value propagates unchecked to req->r_ops[0].outdata_len and is
> then used to set the decode boundary in ceph_osdc_list_watchers():
> 
>   void *const end = p + req->r_ops[0].outdata_len;
> 
> The actual data allocation is always exactly one page:
>   ceph_alloc_page_vector(1, GFP_NOIO)
>   ceph_osd_data_pages_init(..., PAGE_SIZE, ...)
> 
> The messenger caps the copy to PAGE_SIZE bytes, but the decode window
> end is set from the uncapped wire value. A malicious OSD can send
> outdata_len=0x10000, causing _safe decoder boundary checks to pass
> while the physical reads cross the slab allocation boundary.
> 
> KASAN report (kernel 7.0.0-rc7, QEMU/x86_64, KASLR disabled):
> 
>   ==================================================================
>   BUG: KASAN: slab-out-of-bounds in ceph_oob2_init+0x23d/0xff0
> [ceph_oob2_poc]
>   Read of size 4 at addr ffff88800a229f9e by task insmod/57
> 
>   CPU: 0 UID: 0 PID: 57 Comm: insmod Tainted: G           O       
> 7.0.0-rc7-g9c2abf69da83-dirty #15 PREEMPT(lazy)
>   Tainted: [O]=OOT_MODULE
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
> debian-1.17.0-1 04/01/2014
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x4d/0x70
>    print_report+0x170/0x4f3
>    ? __pfx__raw_spin_lock_irqsave+0x10/0x10
>    kasan_report+0xda/0x110
>    ? ceph_oob2_init+0x23d/0xff0 [ceph_oob2_poc]
>    ? ceph_oob2_init+0x23d/0xff0 [ceph_oob2_poc]
>    ? __pfx_ceph_oob2_init+0x10/0x10 [ceph_oob2_poc]
>    ceph_oob2_init+0x23d/0xff0 [ceph_oob2_poc]
>    do_one_initcall+0x9a/0x3a0
>    ? __pfx_do_one_initcall+0x10/0x10
>    ? kasan_unpoison+0x44/0x70
>    do_init_module+0x27c/0x790
>    ? __pfx_do_init_module+0x10/0x10
>    ? __kasan_slab_free+0x47/0x70
>    ? kfree+0x15f/0x3b0
>    load_module+0x4a9a/0x6350
>    ? __pfx_load_module+0x10/0x10
>    ? security_file_permission+0x24/0x50
>    ? kernel_read_file+0x2ed/0x770
>    ? init_module_from_file+0x15c/0x180
>    init_module_from_file+0x15c/0x180
>    ? __pfx_init_module_from_file+0x10/0x10
>    ? tick_nohz_handler+0x2a3/0x640
>    ? _raw_spin_lock+0x7e/0xd0
>    idempotent_init_module+0x21f/0x750
>    ? __pfx_idempotent_init_module+0x10/0x10
>    ? fdget+0x4e/0x4a0
>    ? fdget+0x4e/0x4a0
>    __x64_sys_finit_module+0xba/0x120
>    do_syscall_64+0xe2/0x570
>    ? exc_page_fault+0x66/0xb0
>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
>   Allocated by task 57:
>    kasan_save_stack+0x30/0x50
>    kasan_save_track+0x14/0x30
>    __kasan_kmalloc+0x7f/0x90
>    ceph_oob2_init+0x44/0xff0 [ceph_oob2_poc]
>    do_one_initcall+0x9a/0x3a0
>    do_init_module+0x27c/0x790
>    load_module+0x4a9a/0x6350
>    init_module_from_file+0x15c/0x180
>    idempotent_init_module+0x21f/0x750
>    __x64_sys_finit_module+0xba/0x120
>    do_syscall_64+0xe2/0x570
>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
>   The buggy address belongs to the object at ffff88800a229000
>    which belongs to the cache kmalloc-4k of size 4096
>   The buggy address is located 3998 bytes inside of
>    allocated 4000-byte region [ffff88800a229000, ffff88800a229fa0)
> 
>   Memory state around the buggy address:
>    ffff88800a229e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>    ffff88800a229f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   >ffff88800a229f80: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
>                                  ^
>    ffff88800a22a000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>    ffff88800a22a080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>   ==================================================================
> 
>   val=0xccccaaaa (OOB garbage from KASAN redzone)
> 
> Fix by introducing buf_len to hold the allocation size, using it in
> both ceph_osd_data_pages_init() and the min_t() decode boundary cap,
> so the two are guaranteed to stay in sync if the buffer size changes.
> 
> Attacker model: a malicious or compromised OSD in a multi-tenant
> Ceph deployment can trigger this against any client issuing
> CEPH_OSD_OP_LIST_WATCHERS without further privileges beyond OSD
> session establishment.
> 
> Fixes: a4ed38d7a180 ("libceph: support for
> CEPH_OSD_OP_LIST_WATCHERS")
> Cc: stable@vger.kernel.org
> Signed-off-by: Pavitra Jha <jhapavitra98@gmail.com>
> ---
> v3: Split overlong min_t() line to fit 80-column limit,
>     per Viacheslav Dubeyko's review of v2.
> v2: Introduce buf_len variable instead of hardcoding PAGE_SIZE
>     independently in ceph_osd_data_pages_init() and the min_t() cap,
>     per Viacheslav Dubeyko's review.
> ---
>  net/ceph/osd_client.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index a67093cf4..0a55bc1f9 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -5063,6 +5063,7 @@ int ceph_osdc_list_watchers(struct
> ceph_osd_client *osdc,
>  	struct ceph_osd_request *req;
>  	struct page **pages;
>  	int ret;
> +	const size_t buf_len = PAGE_SIZE;
>  
>  	req = ceph_osdc_alloc_request(osdc, NULL, 1, false,
> GFP_NOIO);
>  	if (!req)
> @@ -5081,7 +5082,7 @@ int ceph_osdc_list_watchers(struct
> ceph_osd_client *osdc,
>  	osd_req_op_init(req, 0, CEPH_OSD_OP_LIST_WATCHERS, 0);
>  	ceph_osd_data_pages_init(osd_req_op_data(req, 0,
> list_watchers,
>  						 response_data),
> -				 pages, PAGE_SIZE, 0, false, true);
> +				 pages, buf_len, 0, false, true);
>  
>  	ret = ceph_osdc_alloc_messages(req, GFP_NOIO);
>  	if (ret)
> @@ -5091,7 +5092,8 @@ int ceph_osdc_list_watchers(struct
> ceph_osd_client *osdc,
>  	ret = ceph_osdc_wait_request(osdc, req);
>  	if (ret >= 0) {
>  		void *p = page_address(pages[0]);
> -		void *const end = p + min_t(u32, req-
> >r_ops[0].outdata_len, PAGE_SIZE);
> +		void *const end = p +
> +			min_t(u32, req->r_ops[0].outdata_len,
> buf_len);

Now, min_t() worries me slightly because req->r_ops[0].outdata_len is
u32 data type, but buf_len is size_t. Could we have the same data type
for both variables?

Thanks,
Slava.

>  
>  		ret = decode_watchers(&p, end, watchers,
> num_watchers);
>  	}