[PATCH] drm/amdkfd: fix integer overflow in get_queue_ids()

Muhammad Bilal posted 1 patch 1 day, 4 hours ago
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
[PATCH] drm/amdkfd: fix integer overflow in get_queue_ids()
Posted by Muhammad Bilal 1 day, 4 hours ago
get_queue_ids() computes the allocation size as:

    size_t array_size = num_queues * sizeof(uint32_t);

num_queues is a user-controlled u32 copied directly from the ioctl
argument (args.suspend_queues.num_queues or args.resume_queues.num_queues)
via kfd_ioctl_set_debug_trap() with no prior validation or clamping.

On 32-bit kernels, size_t is 32 bits wide.  A caller supplying
num_queues = 0x40000001 causes the multiplication to silently wrap:

    0x40000001 * 4 = 0x100000004  ->  truncated to 0x4

memdup_user() then allocates only 4 bytes.  q_array_invalidate() is
called immediately after with the original num_queues value and
iterates 0x40000001 times writing KFD_DBG_QUEUE_INVALID_MASK into the
4-byte buffer, producing an unbounded heap buffer overflow.
q_array_get_index() in both callers walks the same buffer using the
same unchecked count.

Both call sites are affected:
- suspend_queues() calls get_queue_ids() unconditionally
- resume_queues() calls it only when usr_queue_id_array is non-NULL

Both callers already propagate IS_ERR() returns to userspace, so
returning ERR_PTR(-EINVAL) on overflow requires no new error handling.

The copy_to_user() calls at the tail of both functions also compute
num_queues * sizeof(uint32_t), but are only reachable after a
successful get_queue_ids() return, so they are safe once the
allocation is correctly bounded.

Fix by replacing the unchecked multiplication with check_mul_overflow().
Cast num_queues to size_t so all three arguments match the destination
type, avoiding implicit type mismatch on compilers that implement the
macro with typeof() rather than __builtin_mul_overflow() directly.
Add an explicit #include <linux/overflow.h> rather than relying on the
transitive pull through linux/slab.h.

Fixes: a70a93fa568b ("drm/amdkfd: add debug suspend and resume process queues operation")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e0a31e11f0ff..c08ad718dbd7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -25,6 +25,7 @@
 #include <linux/ratelimit.h>
 #include <linux/printk.h>
 #include <linux/slab.h>
+#include <linux/overflow.h>
 #include <linux/list.h>
 #include <linux/types.h>
 #include <linux/bitops.h>
@@ -3308,11 +3309,14 @@ static void copy_context_work_handler(struct work_struct *work)
 
 static uint32_t *get_queue_ids(uint32_t num_queues, uint32_t *usr_queue_id_array)
 {
-	size_t array_size = num_queues * sizeof(uint32_t);
+	size_t array_size;
 
 	if (!usr_queue_id_array)
 		return NULL;
 
+	if (check_mul_overflow((size_t)num_queues, sizeof(uint32_t), &array_size))
+		return ERR_PTR(-EINVAL);
+
 	return memdup_user(usr_queue_id_array, array_size);
 }
 
-- 
2.53.0
[PATCH] drm/amdkfd: fix NULL dereference in get_queue_ids()
Posted by Muhammad Bilal 1 day, 1 hour ago
When usr_queue_id_array is NULL and num_queues is non-zero,
get_queue_ids() returns NULL. The callers check only IS_ERR() on the
return value; since IS_ERR(NULL) == false the check passes, and
suspend_queues() calls q_array_invalidate() which immediately
dereferences NULL while iterating num_queues times.

Userspace can trigger this via kfd_ioctl_set_debug_trap() by supplying
num_queues > 0 with a zero queue_array_ptr, causing a kernel panic.

A NULL usr_queue_id_array with num_queues == 0 is a legitimate no-op
(q_array_invalidate never executes, and resume_queues already guards
all queue_ids dereferences behind a NULL check). Return ERR_PTR(-EINVAL)
only when num_queues is non-zero and the pointer is absent; both callers
already propagate IS_ERR() returns correctly to userspace.

Fixes: a70a93fa568b ("drm/amdkfd: add debug suspend and resume process queues operation")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c08ad718dbd7..8488b3a6c2ba 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3312,7 +3312,7 @@ static uint32_t *get_queue_ids(uint32_t num_queues, uint32_t *usr_queue_id_array
 	size_t array_size;
 
 	if (!usr_queue_id_array)
-		return NULL;
+		return num_queues ? ERR_PTR(-EINVAL) : NULL;
 
 	if (check_mul_overflow((size_t)num_queues, sizeof(uint32_t), &array_size))
 		return ERR_PTR(-EINVAL);
-- 
2.53.0