ceph: fix two unsafe bare decodes in decode_lockers()

[PATCH v3] ceph: fix two unsafe bare decodes in decode_lockers()

Posted by Pavitra Jha 5 days, 22 hours ago

decode_lockers() in cls_lock_client.c contains two bare decode operations
that allow a malicious or compromised OSD to trigger slab-out-of-bounds
reads:

1. ceph_decode_32(p) at the num_lockers field has no preceding bounds
   check. ceph_start_decoding() accepts struct_len=0 as valid -- the
   internal ceph_decode_need(p, end, 0, bad) always passes -- so when an
   OSD sends struct_len=0, ceph_start_decoding() returns success with
   p == end. The immediately following bare ceph_decode_32(p) then reads
   4 bytes past the validated buffer boundary. The garbage value is
   passed directly to kzalloc_objs() as the locker count.

   The sibling function decode_watchers() in osd_client.c already uses
   ceph_decode_32_safe() after its own ceph_start_decoding() call.
   decode_lockers() was the only site using the bare variant.

2. ceph_decode_8(p) after the decode_locker() loop has no preceding
   bounds check. If an OSD crafts num_lockers such that the loop
   advances p exactly to end, the subsequent bare ceph_decode_8(p) reads
   one byte past the validated buffer boundary. The result is passed
   directly into *type, which is used as a lock type discriminator by
   callers, giving an OSD-controlled one-byte OOB read with direct
   influence over the lock type field.

Fix both by replacing bare operations with their safe variants:
  ceph_decode_32(p) -> ceph_decode_32_safe(p, end, *num_lockers,
                                           err_inval)
  ceph_decode_8(p)  -> ceph_decode_8_safe(p, end, *type,
                                          err_free_lockers)

The goto targets differ intentionally:
  err_inval: is a new label returning -EINVAL directly. It is used for
  the pre-allocation failure path where *lockers is not yet allocated
  and must not be passed to ceph_free_lockers().

  err_free_lockers: is the existing label. It is used for the
  post-allocation failure path where *lockers is allocated and must
  be freed.

ret is set to -EINVAL before ceph_decode_8_safe() so that
err_free_lockers returns the correct error code on bounds violation.
Without this, err_free_lockers would return a stale ret value (0 from
the successful decode_locker() loop), silently swallowing the error.

-EINVAL is correct for both failure paths. The data received from the
OSD is structurally malformed. -ENOMEM would misrepresent the failure
class to callers and to stable@ backporters triaging error paths.

KASAN report for bug 1 (kernel 7.0.0-rc7, QEMU/x86_64, KASLR disabled):
  ==================================================================
  BUG: KASAN: slab-out-of-bounds in ceph_oob3_init+0x251/0xff0 [ceph_oob3_poc]
  Read of size 4 at addr ffff88800a29b76e by task insmod/58

  CPU: 0 UID: 0 PID: 58 Comm: insmod Tainted: G           O        7.0.0-rc7-g9c2abf69da83-dirty #15 PREEMPT(lazy)
  Tainted: [O]=OOT_MODULE
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4d/0x70
   print_report+0x170/0x4f3
   kasan_report+0xda/0x110
   ceph_oob3_init+0x251/0xff0 [ceph_oob3_poc]
   do_one_initcall+0x9a/0x3a0
   do_init_module+0x27c/0x790
   load_module+0x4a9a/0x6350
   init_module_from_file+0x15c/0x180
   idempotent_init_module+0x21f/0x750
   __x64_sys_finit_module+0xba/0x120
   do_syscall_64+0xe2/0x570
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  Allocated by task 58:
   kasan_save_stack+0x30/0x50
   kasan_save_track+0x14/0x30
   __kasan_kmalloc+0x7f/0x90
   ceph_oob3_init+0x4d/0xff0 [ceph_oob3_poc]
   do_one_initcall+0x9a/0x3a0
   do_init_module+0x27c/0x790
   load_module+0x4a9a/0x6350
   init_module_from_file+0x15c/0x180
   idempotent_init_module+0x21f/0x750
   __x64_sys_finit_module+0xba/0x120
   do_syscall_64+0xe2/0x570
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  The buggy address belongs to the object at ffff88800a29a000
   which belongs to the cache kmalloc-8k of size 8192
  The buggy address is located 5998 bytes inside of
   allocated 6000-byte region [ffff88800a29a000, ffff88800a29b770)

  Memory state around the buggy address:
   ffff88800a29b600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   ffff88800a29b680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  >ffff88800a29b700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc
                                                               ^
   ffff88800a29b780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ==================================================================

  num_lockers=0xccccaaaa (OOB garbage from KASAN redzone)

Bug 2 (ceph_decode_8) follows from the identical precondition. A
dedicated PoC is available on request.

Attacker model: a malicious or compromised OSD in a multi-tenant Ceph
deployment can trigger this against any kernel client that issues the
lock.get_info class method (e.g. during RBD exclusive lock acquisition)
without any further privileges beyond OSD session establishment.

Fixes: d4ed4a530562 ("libceph: support for lock.lock_info")
Cc: stable@vger.kernel.org
Signed-off-by: Pavitra Jha <jhapavitra98@gmail.com>
---
v3: Combine both fixes (ceph_decode_32 and ceph_decode_8) into a single
    patch per Viacheslav Dubeyko's review. Set ret = -EINVAL before
    ceph_decode_8_safe() so err_free_lockers returns the correct error
    code, not stale ret (caught by Dan Carpenter / smatch). Clarify
    err_inval vs err_free_lockers goto selection rationale and
    -EINVAL justification.
---
 net/ceph/cls_lock_client.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ceph/cls_lock_client.c b/net/ceph/cls_lock_client.c
index c6956f1df..4e6a6d3e4 100644
--- a/net/ceph/cls_lock_client.c
+++ b/net/ceph/cls_lock_client.c
@@ -299,7 +299,7 @@ static int decode_lockers(void **p, void *end, u8 *type, char **tag,
 	if (ret)
 		return ret;
 
-	*num_lockers = ceph_decode_32(p);
+	ceph_decode_32_safe(p, end, *num_lockers, err_inval);
 	*lockers = kzalloc_objs(**lockers, *num_lockers, GFP_NOIO);
 	if (!*lockers)
 		return -ENOMEM;
@@ -310,7 +310,8 @@ static int decode_lockers(void **p, void *end, u8 *type, char **tag,
 			goto err_free_lockers;
 	}
 
-	*type = ceph_decode_8(p);
+	ret = -EINVAL;
+	ceph_decode_8_safe(p, end, *type, err_free_lockers);
 	s = ceph_extract_encoded_string(p, end, NULL, GFP_NOIO);
 	if (IS_ERR(s)) {
 		ret = PTR_ERR(s);
@@ -320,6 +321,8 @@ static int decode_lockers(void **p, void *end, u8 *type, char **tag,
 	*tag = s;
 	return 0;
 
+err_inval:
+	return -EINVAL;
 err_free_lockers:
 	ceph_free_lockers(*lockers, *num_lockers);
 	return ret;
-- 
2.53.0

Re: [PATCH v3] ceph: fix two unsafe bare decodes in decode_lockers()

Posted by Viacheslav Dubeyko 5 days, 10 hours ago

On Tue, 2026-06-02 at 00:17 -0400, Pavitra Jha wrote:
> decode_lockers() in cls_lock_client.c contains two bare decode
> operations
> that allow a malicious or compromised OSD to trigger slab-out-of-
> bounds
> reads:
> 
> 1. ceph_decode_32(p) at the num_lockers field has no preceding bounds
>    check. ceph_start_decoding() accepts struct_len=0 as valid -- the
>    internal ceph_decode_need(p, end, 0, bad) always passes -- so when
> an
>    OSD sends struct_len=0, ceph_start_decoding() returns success with
>    p == end. The immediately following bare ceph_decode_32(p) then
> reads
>    4 bytes past the validated buffer boundary. The garbage value is
>    passed directly to kzalloc_objs() as the locker count.
> 
>    The sibling function decode_watchers() in osd_client.c already
> uses
>    ceph_decode_32_safe() after its own ceph_start_decoding() call.
>    decode_lockers() was the only site using the bare variant.
> 
> 2. ceph_decode_8(p) after the decode_locker() loop has no preceding
>    bounds check. If an OSD crafts num_lockers such that the loop
>    advances p exactly to end, the subsequent bare ceph_decode_8(p)
> reads
>    one byte past the validated buffer boundary. The result is passed
>    directly into *type, which is used as a lock type discriminator by
>    callers, giving an OSD-controlled one-byte OOB read with direct
>    influence over the lock type field.
> 
> Fix both by replacing bare operations with their safe variants:
>   ceph_decode_32(p) -> ceph_decode_32_safe(p, end, *num_lockers,
>                                            err_inval)
>   ceph_decode_8(p)  -> ceph_decode_8_safe(p, end, *type,
>                                           err_free_lockers)
> 
> The goto targets differ intentionally:
>   err_inval: is a new label returning -EINVAL directly. It is used
> for
>   the pre-allocation failure path where *lockers is not yet allocated
>   and must not be passed to ceph_free_lockers().
> 
>   err_free_lockers: is the existing label. It is used for the
>   post-allocation failure path where *lockers is allocated and must
>   be freed.
> 
> ret is set to -EINVAL before ceph_decode_8_safe() so that
> err_free_lockers returns the correct error code on bounds violation.
> Without this, err_free_lockers would return a stale ret value (0 from
> the successful decode_locker() loop), silently swallowing the error.
> 
> -EINVAL is correct for both failure paths. The data received from the
> OSD is structurally malformed. -ENOMEM would misrepresent the failure
> class to callers and to stable@ backporters triaging error paths.
> 
> KASAN report for bug 1 (kernel 7.0.0-rc7, QEMU/x86_64, KASLR
> disabled):
>   ==================================================================
>   BUG: KASAN: slab-out-of-bounds in ceph_oob3_init+0x251/0xff0
> [ceph_oob3_poc]
>   Read of size 4 at addr ffff88800a29b76e by task insmod/58
> 
>   CPU: 0 UID: 0 PID: 58 Comm: insmod Tainted: G           O       
> 7.0.0-rc7-g9c2abf69da83-dirty #15 PREEMPT(lazy)
>   Tainted: [O]=OOT_MODULE
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
> debian-1.17.0-1 04/01/2014
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x4d/0x70
>    print_report+0x170/0x4f3
>    kasan_report+0xda/0x110
>    ceph_oob3_init+0x251/0xff0 [ceph_oob3_poc]
>    do_one_initcall+0x9a/0x3a0
>    do_init_module+0x27c/0x790
>    load_module+0x4a9a/0x6350
>    init_module_from_file+0x15c/0x180
>    idempotent_init_module+0x21f/0x750
>    __x64_sys_finit_module+0xba/0x120
>    do_syscall_64+0xe2/0x570
>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
>   Allocated by task 58:
>    kasan_save_stack+0x30/0x50
>    kasan_save_track+0x14/0x30
>    __kasan_kmalloc+0x7f/0x90
>    ceph_oob3_init+0x4d/0xff0 [ceph_oob3_poc]
>    do_one_initcall+0x9a/0x3a0
>    do_init_module+0x27c/0x790
>    load_module+0x4a9a/0x6350
>    init_module_from_file+0x15c/0x180
>    idempotent_init_module+0x21f/0x750
>    __x64_sys_finit_module+0xba/0x120
>    do_syscall_64+0xe2/0x570
>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
>   The buggy address belongs to the object at ffff88800a29a000
>    which belongs to the cache kmalloc-8k of size 8192
>   The buggy address is located 5998 bytes inside of
>    allocated 6000-byte region [ffff88800a29a000, ffff88800a29b770)
> 
>   Memory state around the buggy address:
>    ffff88800a29b600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>    ffff88800a29b680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   >ffff88800a29b700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc
>                                                                ^
>    ffff88800a29b780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>   ==================================================================
> 
>   num_lockers=0xccccaaaa (OOB garbage from KASAN redzone)
> 
> Bug 2 (ceph_decode_8) follows from the identical precondition. A
> dedicated PoC is available on request.
> 
> Attacker model: a malicious or compromised OSD in a multi-tenant Ceph
> deployment can trigger this against any kernel client that issues the
> lock.get_info class method (e.g. during RBD exclusive lock
> acquisition)
> without any further privileges beyond OSD session establishment.
> 
> Fixes: d4ed4a530562 ("libceph: support for lock.lock_info")
> Cc: stable@vger.kernel.org
> Signed-off-by: Pavitra Jha <jhapavitra98@gmail.com>
> ---
> v3: Combine both fixes (ceph_decode_32 and ceph_decode_8) into a
> single
>     patch per Viacheslav Dubeyko's review. Set ret = -EINVAL before
>     ceph_decode_8_safe() so err_free_lockers returns the correct
> error
>     code, not stale ret (caught by Dan Carpenter / smatch). Clarify
>     err_inval vs err_free_lockers goto selection rationale and
>     -EINVAL justification.
> ---
>  net/ceph/cls_lock_client.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ceph/cls_lock_client.c b/net/ceph/cls_lock_client.c
> index c6956f1df..4e6a6d3e4 100644
> --- a/net/ceph/cls_lock_client.c
> +++ b/net/ceph/cls_lock_client.c
> @@ -299,7 +299,7 @@ static int decode_lockers(void **p, void *end, u8
> *type, char **tag,
>  	if (ret)
>  		return ret;
>  
> -	*num_lockers = ceph_decode_32(p);
> +	ceph_decode_32_safe(p, end, *num_lockers, err_inval);
>  	*lockers = kzalloc_objs(**lockers, *num_lockers, GFP_NOIO);
>  	if (!*lockers)
>  		return -ENOMEM;
> @@ -310,7 +310,8 @@ static int decode_lockers(void **p, void *end, u8
> *type, char **tag,
>  			goto err_free_lockers;
>  	}
>  
> -	*type = ceph_decode_8(p);
> +	ret = -EINVAL;
> +	ceph_decode_8_safe(p, end, *type, err_free_lockers);
>  	s = ceph_extract_encoded_string(p, end, NULL, GFP_NOIO);
>  	if (IS_ERR(s)) {
>  		ret = PTR_ERR(s);
> @@ -320,6 +321,8 @@ static int decode_lockers(void **p, void *end, u8
> *type, char **tag,
>  	*tag = s;
>  	return 0;
>  
> +err_inval:
> +	return -EINVAL;
>  err_free_lockers:
>  	ceph_free_lockers(*lockers, *num_lockers);
>  	return ret;

Looks good.

Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

Thanks,
Slava.