[v2] drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format

[PATCH v2] drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format

Posted by Mikhail Gavrilov 5 days, 9 hours ago

When dumping IB contents from a hung job, amdgpu_devcoredump_format()
acquires the VM root PD's reservation lock via amdgpu_vm_lock_by_pasid()
and then, for each IB referenced by the job, calls amdgpu_bo_reserve()
on the BO that backs the IB.  Both reservations are taken on
reservation_ww_class_mutex objects but neither uses a ww_acquire_ctx,
which trips lockdep:

  WARNING: possible recursive locking detected
  --------------------------------------------
  kworker/u128:0 is trying to acquire lock:
  ffff88838b16e1f0 (reservation_ww_class_mutex){+.+.}-{4:4},
    at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]

  but task is already holding lock:
  ffff8882f82681f0 (reservation_ww_class_mutex){+.+.}-{4:4},
    at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]

   Possible unsafe locking scenario:
         CPU0
         ----
    lock(reservation_ww_class_mutex);
    lock(reservation_ww_class_mutex);

   *** DEADLOCK ***
   May be due to missing lock nesting notation

  Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu]
  Call Trace:
   __ww_mutex_lock.constprop.0
   ww_mutex_lock
   amdgpu_bo_reserve
   amdgpu_devcoredump_format+0x1594 [amdgpu]
   amdgpu_devcoredump_deferred_work+0xea [amdgpu]

The two reservations are on different BOs in the captured trace, so the
splat is a lockdep-correctness warning, not an observed deadlock.  It
becomes a real self-deadlock whenever the IB BO shares its dma_resv
with the root PD (the always-valid case, see
amdgpu_vm_is_bo_always_valid()): amdgpu_bo_reserve(abo) re-acquires the
same ww_mutex without a ticket and blocks forever.

Fix it in two steps:

1. Collect per-IB BO references under the root PD's reservation, then
   release the root before locking the IB BOs.  The walk over the VM
   mapping tree must remain under the root lock (mappings can be torn
   down without it), but the actual content copies do not.

2. Lock all the IB BOs together using drm_exec(9) with a single
   ww_acquire_ctx.  DRM_EXEC_IGNORE_DUPLICATES handles the case where
   IB BOs share a dma_resv (e.g. always-valid BOs).  Each lock attempt
   is now a top-level acquire under one ticket, with retry-on-
   contention handled by drm_exec; the recursive ww_mutex condition
   is gone.

The collect/lock/release logic is factored out into three small helpers
(amdgpu_devcoredump_{collect,lock,release}_ib_refs) to keep the main
function readable and within the kernel coding style indentation
guideline.

This also fixes a BO refcount leak in the original code: when
amdgpu_bo_reserve() failed, control jumped to free_ib_content without
running amdgpu_bo_unref().  In the new structure the per-IB BO refs
are released unconditionally in the cleanup helper.

Reproducer (~150 LoC libdrm_amdgpu): submit a single GFX IB containing
PACKET3_INDIRECT_BUFFER chained at GPU VA 0 and wait for the fence.
The TDR fires within ~10 s and the deferred coredump worker produces
the splat above on every invocation.

v2: switch from per-IB amdgpu_bo_reserve() to drm_exec for the IB BO
    locking as suggested by Christian König; the snapshot approach
    for collecting BO references under the root PD's reservation is
    retained.

Fixes: 7b15fc2d1f1a ("drm/amdgpu: dump job ibs in the devcoredump")
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c  | 187 ++++++++++++++----
 1 file changed, 148 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
index d386bc775d03..9ac958cf09fd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
@@ -24,6 +24,7 @@
 
 #include <generated/utsrelease.h>
 #include <linux/devcoredump.h>
+#include <drm/drm_exec.h>
 #include "amdgpu_dev_coredump.h"
 #include "atom.h"
 
@@ -207,6 +208,108 @@ static void amdgpu_devcoredump_fw_info(struct amdgpu_device *adev,
 	}
 }
 
+struct amdgpu_devcoredump_ib_ref {
+	struct amdgpu_bo	*bo;
+	u64			offset;
+};
+
+/*
+ * Walk the VM's mapping tree under the root PD's reservation to obtain the BO
+ * that backs each IB and pin it with a refcount. The root PD reservation is
+ * dropped before this function returns; the caller can then lock each IB BO
+ * via drm_exec without nesting reservations on reservation_ww_class_mutex.
+ *
+ * Returns an array of num_ibs entries (each ib_refs[i].bo may be NULL if its
+ * mapping was not found), or NULL on allocation failure / VM lookup failure.
+ * The caller must release the BO refs and free the array via
+ * amdgpu_devcoredump_release_ib_refs().
+ */
+static struct amdgpu_devcoredump_ib_ref *
+amdgpu_devcoredump_collect_ib_refs(struct amdgpu_device *adev,
+				   struct amdgpu_coredump_info *coredump)
+{
+	struct amdgpu_devcoredump_ib_ref *ib_refs;
+	struct amdgpu_bo_va_mapping *mapping;
+	struct amdgpu_bo *root;
+	struct amdgpu_vm *vm;
+	u64 va_start;
+
+	ib_refs = kcalloc(coredump->num_ibs, sizeof(*ib_refs), GFP_KERNEL);
+	if (!ib_refs)
+		return NULL;
+
+	vm = amdgpu_vm_lock_by_pasid(adev, &root, coredump->pasid);
+	if (!vm) {
+		kfree(ib_refs);
+		return NULL;
+	}
+
+	for (int i = 0; i < coredump->num_ibs; i++) {
+		va_start = coredump->ibs[i].gpu_addr & AMDGPU_GMC_HOLE_MASK;
+		mapping = amdgpu_vm_bo_lookup_mapping(vm, va_start / AMDGPU_GPU_PAGE_SIZE);
+		if (!mapping)
+			continue;
+
+		ib_refs[i].bo = amdgpu_bo_ref(mapping->bo_va->base.bo);
+		ib_refs[i].offset = va_start -
+				    mapping->start * AMDGPU_GPU_PAGE_SIZE;
+	}
+
+	amdgpu_bo_unreserve(root);
+	amdgpu_bo_unref(&root);
+
+	return ib_refs;
+}
+
+static void
+amdgpu_devcoredump_release_ib_refs(struct amdgpu_devcoredump_ib_ref *ib_refs,
+				   int num_ibs)
+{
+	if (!ib_refs)
+		return;
+
+	for (int i = 0; i < num_ibs; i++)
+		if (ib_refs[i].bo)
+			amdgpu_bo_unref(&ib_refs[i].bo);
+	kfree(ib_refs);
+}
+
+/*
+ * Lock all collected IB BOs together using a single drm_exec ticket. This
+ * eliminates the nested ww_mutex acquire that lockdep flags as recursive
+ * locking (and that becomes a real self-deadlock for IB BOs sharing their
+ * dma_resv with the root PD).
+ *
+ * Returns 0 if drm_exec was initialised and the BOs are locked; the caller
+ * must call drm_exec_fini() on success. Returns non-zero on failure, in which
+ * case drm_exec is already torn down.
+ */
+static int
+amdgpu_devcoredump_lock_ib_refs(struct drm_exec *exec,
+				struct amdgpu_devcoredump_ib_ref *ib_refs,
+				int num_ibs)
+{
+	int r = 0;
+
+	drm_exec_init(exec, DRM_EXEC_IGNORE_DUPLICATES, num_ibs);
+	drm_exec_until_all_locked(exec) {
+		r = 0;
+		for (int i = 0; i < num_ibs; i++) {
+			if (!ib_refs[i].bo)
+				continue;
+			r = drm_exec_lock_obj(exec, &ib_refs[i].bo->tbo.base);
+			drm_exec_retry_on_contention(exec);
+			if (r)
+				break;
+		}
+		if (r)
+			break;
+	}
+	if (r)
+		drm_exec_fini(exec);
+	return r;
+}
+
 static ssize_t
 amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_info *coredump)
 {
@@ -214,13 +317,9 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
 	struct drm_printer p;
 	struct drm_print_iterator iter;
 	struct amdgpu_vm_fault_info *fault_info;
-	struct amdgpu_bo_va_mapping *mapping;
 	struct amdgpu_ip_block *ip_block;
 	struct amdgpu_res_cursor cursor;
-	struct amdgpu_bo *abo, *root;
-	uint64_t va_start, offset;
 	struct amdgpu_ring *ring;
-	struct amdgpu_vm *vm;
 	u32 *ib_content;
 	uint8_t *kptr;
 	int ver, i, j, r;
@@ -343,43 +442,52 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
 		drm_printf(&p, "VRAM is lost due to GPU reset!\n");
 
 	if (coredump->num_ibs) {
-		/* Don't try to lookup the VM or map the BOs when calculating the
-		 * size required to store the devcoredump.
+		struct amdgpu_devcoredump_ib_ref *ib_refs = NULL;
+		struct drm_exec exec;
+		bool ibs_locked = false;
+
+		/*
+		 * Collect the BO that backs each IB under the root PD's
+		 * reservation, drop the root reservation, then lock all the
+		 * IB BOs together in one drm_exec ticket. This avoids nesting
+		 * amdgpu_bo_reserve() inside the root PD's reservation, which
+		 * would be a recursive reservation_ww_class_mutex acquire
+		 * without a ww_acquire_ctx (lockdep splat, and a real
+		 * self-deadlock for always-valid BOs that share their dma_resv
+		 * with the root PD).
+		 *
+		 * Skip lookup/locking entirely on the sizing pass: it does not
+		 * write IB content, and the size estimate doesn't depend on
+		 * whether the BOs are reachable.
 		 */
-		if (sizing_pass)
-			vm = NULL;
-		else
-			vm = amdgpu_vm_lock_by_pasid(adev, &root, coredump->pasid);
+		if (!sizing_pass) {
+			ib_refs = amdgpu_devcoredump_collect_ib_refs(adev, coredump);
+			if (ib_refs) {
+				r = amdgpu_devcoredump_lock_ib_refs(&exec, ib_refs,
+								    coredump->num_ibs);
+				if (!r)
+					ibs_locked = true;
+			}
+		}
+
+		for (int i = 0; i < coredump->num_ibs; i++) {
+			struct amdgpu_bo *abo = ibs_locked ? ib_refs[i].bo : NULL;
+			u64 offset = ibs_locked ? ib_refs[i].offset : 0;
+			bool emit_content = sizing_pass;
 
-		for (int i = 0; i < coredump->num_ibs && (sizing_pass || vm); i++) {
 			ib_content = kvmalloc_array(coredump->ibs[i].ib_size_dw, 4,
 						    GFP_KERNEL);
 			if (!ib_content)
 				continue;
 
-			/* vm=NULL can only happen when 'sizing_pass' is true. Skip to the
-			 * drm_printf() calls (ib_content doesn't need to be initialized
-			 * as its content won't be written anywhere).
-			 */
-			if (!vm)
+			if (!abo)
 				goto output_ib_content;
 
-			va_start = coredump->ibs[i].gpu_addr & AMDGPU_GMC_HOLE_MASK;
-			mapping = amdgpu_vm_bo_lookup_mapping(vm, va_start / AMDGPU_GPU_PAGE_SIZE);
-			if (!mapping)
-				goto free_ib_content;
-
-			offset = va_start - (mapping->start * AMDGPU_GPU_PAGE_SIZE);
-			abo = amdgpu_bo_ref(mapping->bo_va->base.bo);
-			r = amdgpu_bo_reserve(abo, false);
-			if (r)
-				goto free_ib_content;
-
 			if (abo->flags & AMDGPU_GEM_CREATE_NO_CPU_ACCESS) {
 				off = 0;
 
 				if (abo->tbo.resource->mem_type != TTM_PL_VRAM)
-					goto unreserve_abo;
+					goto output_ib_content;
 
 				amdgpu_res_first(abo->tbo.resource, offset,
 						 coredump->ibs[i].ib_size_dw * 4,
@@ -391,12 +499,13 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
 					off += cursor.size;
 					amdgpu_res_next(&cursor, cursor.size);
 				}
+				emit_content = true;
 			} else {
 				r = ttm_bo_kmap(&abo->tbo, 0,
 						PFN_UP(abo->tbo.base.size),
 						&abo->kmap);
 				if (r)
-					goto unreserve_abo;
+					goto output_ib_content;
 
 				kptr = amdgpu_bo_kptr(abo);
 				kptr += offset;
@@ -404,23 +513,23 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
 				       coredump->ibs[i].ib_size_dw * 4);
 
 				amdgpu_bo_kunmap(abo);
+				emit_content = true;
 			}
 
 output_ib_content:
 			drm_printf(&p, "\nIB #%d 0x%llx %d dw\n",
 				   i, coredump->ibs[i].gpu_addr, coredump->ibs[i].ib_size_dw);
-			for (int j = 0; j < coredump->ibs[i].ib_size_dw; j++)
-				drm_printf(&p, "0x%08x\n", ib_content[j]);
-unreserve_abo:
-			if (vm)
-				amdgpu_bo_unreserve(abo);
-free_ib_content:
+			if (emit_content) {
+				for (int j = 0; j < coredump->ibs[i].ib_size_dw; j++)
+					drm_printf(&p, "0x%08x\n", ib_content[j]);
+			}
 			kvfree(ib_content);
 		}
-		if (vm) {
-			amdgpu_bo_unreserve(root);
-			amdgpu_bo_unref(&root);
-		}
+
+		if (ibs_locked)
+			drm_exec_fini(&exec);
+
+		amdgpu_devcoredump_release_ib_refs(ib_refs, coredump->num_ibs);
 	}
 
 	return count - iter.remain;
-- 
2.54.0

Re: [PATCH v2] drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format

Posted by Christian König 4 days, 18 hours ago

On 5/19/26 18:15, Mikhail Gavrilov wrote:
> When dumping IB contents from a hung job, amdgpu_devcoredump_format()
> acquires the VM root PD's reservation lock via amdgpu_vm_lock_by_pasid()
> and then, for each IB referenced by the job, calls amdgpu_bo_reserve()
> on the BO that backs the IB.  Both reservations are taken on
> reservation_ww_class_mutex objects but neither uses a ww_acquire_ctx,
> which trips lockdep:
> 
>   WARNING: possible recursive locking detected
>   --------------------------------------------
>   kworker/u128:0 is trying to acquire lock:
>   ffff88838b16e1f0 (reservation_ww_class_mutex){+.+.}-{4:4},
>     at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
> 
>   but task is already holding lock:
>   ffff8882f82681f0 (reservation_ww_class_mutex){+.+.}-{4:4},
>     at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
> 
>    Possible unsafe locking scenario:
>          CPU0
>          ----
>     lock(reservation_ww_class_mutex);
>     lock(reservation_ww_class_mutex);
> 
>    *** DEADLOCK ***
>    May be due to missing lock nesting notation
> 
>   Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu]
>   Call Trace:
>    __ww_mutex_lock.constprop.0
>    ww_mutex_lock
>    amdgpu_bo_reserve
>    amdgpu_devcoredump_format+0x1594 [amdgpu]
>    amdgpu_devcoredump_deferred_work+0xea [amdgpu]
> 
> The two reservations are on different BOs in the captured trace, so the
> splat is a lockdep-correctness warning, not an observed deadlock.  It
> becomes a real self-deadlock whenever the IB BO shares its dma_resv
> with the root PD (the always-valid case, see
> amdgpu_vm_is_bo_always_valid()): amdgpu_bo_reserve(abo) re-acquires the
> same ww_mutex without a ticket and blocks forever.
> 
> Fix it in two steps:
> 
> 1. Collect per-IB BO references under the root PD's reservation, then
>    release the root before locking the IB BOs.  The walk over the VM
>    mapping tree must remain under the root lock (mappings can be torn
>    down without it), but the actual content copies do not.
> 
> 2. Lock all the IB BOs together using drm_exec(9) with a single
>    ww_acquire_ctx.  DRM_EXEC_IGNORE_DUPLICATES handles the case where
>    IB BOs share a dma_resv (e.g. always-valid BOs).  Each lock attempt
>    is now a top-level acquire under one ticket, with retry-on-
>    contention handled by drm_exec; the recursive ww_mutex condition
>    is gone.
> 
> The collect/lock/release logic is factored out into three small helpers
> (amdgpu_devcoredump_{collect,lock,release}_ib_refs) to keep the main
> function readable and within the kernel coding style indentation
> guideline.
> 
> This also fixes a BO refcount leak in the original code: when
> amdgpu_bo_reserve() failed, control jumped to free_ib_content without
> running amdgpu_bo_unref().  In the new structure the per-IB BO refs
> are released unconditionally in the cleanup helper.
> 
> Reproducer (~150 LoC libdrm_amdgpu): submit a single GFX IB containing
> PACKET3_INDIRECT_BUFFER chained at GPU VA 0 and wait for the fence.
> The TDR fires within ~10 s and the deferred coredump worker produces
> the splat above on every invocation.
> 
> v2: switch from per-IB amdgpu_bo_reserve() to drm_exec for the IB BO
>     locking as suggested by Christian König; the snapshot approach
>     for collecting BO references under the root PD's reservation is
>     retained.
> 
> Fixes: 7b15fc2d1f1a ("drm/amdgpu: dump job ibs in the devcoredump")
> Suggested-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
> ---
>  .../gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c  | 187 ++++++++++++++----
>  1 file changed, 148 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
> index d386bc775d03..9ac958cf09fd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
> @@ -24,6 +24,7 @@
>  
>  #include <generated/utsrelease.h>
>  #include <linux/devcoredump.h>
> +#include <drm/drm_exec.h>
>  #include "amdgpu_dev_coredump.h"
>  #include "atom.h"
>  
> @@ -207,6 +208,108 @@ static void amdgpu_devcoredump_fw_info(struct amdgpu_device *adev,
>  	}
>  }
>  
> +struct amdgpu_devcoredump_ib_ref {
> +	struct amdgpu_bo	*bo;
> +	u64			offset;
> +};

> +
> +/*
> + * Walk the VM's mapping tree under the root PD's reservation to obtain the BO
> + * that backs each IB and pin it with a refcount. The root PD reservation is
> + * dropped before this function returns; the caller can then lock each IB BO
> + * via drm_exec without nesting reservations on reservation_ww_class_mutex.
> + *
> + * Returns an array of num_ibs entries (each ib_refs[i].bo may be NULL if its
> + * mapping was not found), or NULL on allocation failure / VM lookup failure.
> + * The caller must release the BO refs and free the array via
> + * amdgpu_devcoredump_release_ib_refs().
> + */
> +static struct amdgpu_devcoredump_ib_ref *
> +amdgpu_devcoredump_collect_ib_refs(struct amdgpu_device *adev,
> +				   struct amdgpu_coredump_info *coredump)
> +{
> +	struct amdgpu_devcoredump_ib_ref *ib_refs;
> +	struct amdgpu_bo_va_mapping *mapping;
> +	struct amdgpu_bo *root;
> +	struct amdgpu_vm *vm;
> +	u64 va_start;
> +
> +	ib_refs = kcalloc(coredump->num_ibs, sizeof(*ib_refs), GFP_KERNEL);
> +	if (!ib_refs)
> +		return NULL;
> +
> +	vm = amdgpu_vm_lock_by_pasid(adev, &root, coredump->pasid);
> +	if (!vm) {
> +		kfree(ib_refs);
> +		return NULL;
> +	}
> +
> +	for (int i = 0; i < coredump->num_ibs; i++) {
> +		va_start = coredump->ibs[i].gpu_addr & AMDGPU_GMC_HOLE_MASK;
> +		mapping = amdgpu_vm_bo_lookup_mapping(vm, va_start / AMDGPU_GPU_PAGE_SIZE);
> +		if (!mapping)
> +			continue;
> +
> +		ib_refs[i].bo = amdgpu_bo_ref(mapping->bo_va->base.bo);
> +		ib_refs[i].offset = va_start -
> +				    mapping->start * AMDGPU_GPU_PAGE_SIZE;
> +	}
> +
> +	amdgpu_bo_unreserve(root);
> +	amdgpu_bo_unref(&root);
> +
> +	return ib_refs;
> +}


That whole infrastructure is superflous. You just need to modify amdgpu_vm_lock_by_pasid() to take a drm_exec object to lock the root BO.

Regards,
Christian.


> +
> +static void
> +amdgpu_devcoredump_release_ib_refs(struct amdgpu_devcoredump_ib_ref *ib_refs,
> +				   int num_ibs)
> +{
> +	if (!ib_refs)
> +		return;
> +
> +	for (int i = 0; i < num_ibs; i++)
> +		if (ib_refs[i].bo)
> +			amdgpu_bo_unref(&ib_refs[i].bo);
> +	kfree(ib_refs);
> +}
> +
> +/*
> + * Lock all collected IB BOs together using a single drm_exec ticket. This
> + * eliminates the nested ww_mutex acquire that lockdep flags as recursive
> + * locking (and that becomes a real self-deadlock for IB BOs sharing their
> + * dma_resv with the root PD).
> + *
> + * Returns 0 if drm_exec was initialised and the BOs are locked; the caller
> + * must call drm_exec_fini() on success. Returns non-zero on failure, in which
> + * case drm_exec is already torn down.
> + */
> +static int
> +amdgpu_devcoredump_lock_ib_refs(struct drm_exec *exec,
> +				struct amdgpu_devcoredump_ib_ref *ib_refs,
> +				int num_ibs)
> +{
> +	int r = 0;
> +
> +	drm_exec_init(exec, DRM_EXEC_IGNORE_DUPLICATES, num_ibs);
> +	drm_exec_until_all_locked(exec) {
> +		r = 0;
> +		for (int i = 0; i < num_ibs; i++) {
> +			if (!ib_refs[i].bo)
> +				continue;
> +			r = drm_exec_lock_obj(exec, &ib_refs[i].bo->tbo.base);
> +			drm_exec_retry_on_contention(exec);
> +			if (r)
> +				break;
> +		}
> +		if (r)
> +			break;
> +	}
> +	if (r)
> +		drm_exec_fini(exec);
> +	return r;
> +}
> +
>  static ssize_t
>  amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_info *coredump)
>  {
> @@ -214,13 +317,9 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
>  	struct drm_printer p;
>  	struct drm_print_iterator iter;
>  	struct amdgpu_vm_fault_info *fault_info;
> -	struct amdgpu_bo_va_mapping *mapping;
>  	struct amdgpu_ip_block *ip_block;
>  	struct amdgpu_res_cursor cursor;
> -	struct amdgpu_bo *abo, *root;
> -	uint64_t va_start, offset;
>  	struct amdgpu_ring *ring;
> -	struct amdgpu_vm *vm;
>  	u32 *ib_content;
>  	uint8_t *kptr;
>  	int ver, i, j, r;
> @@ -343,43 +442,52 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
>  		drm_printf(&p, "VRAM is lost due to GPU reset!\n");
>  
>  	if (coredump->num_ibs) {
> -		/* Don't try to lookup the VM or map the BOs when calculating the
> -		 * size required to store the devcoredump.
> +		struct amdgpu_devcoredump_ib_ref *ib_refs = NULL;
> +		struct drm_exec exec;
> +		bool ibs_locked = false;
> +
> +		/*
> +		 * Collect the BO that backs each IB under the root PD's
> +		 * reservation, drop the root reservation, then lock all the
> +		 * IB BOs together in one drm_exec ticket. This avoids nesting
> +		 * amdgpu_bo_reserve() inside the root PD's reservation, which
> +		 * would be a recursive reservation_ww_class_mutex acquire
> +		 * without a ww_acquire_ctx (lockdep splat, and a real
> +		 * self-deadlock for always-valid BOs that share their dma_resv
> +		 * with the root PD).
> +		 *
> +		 * Skip lookup/locking entirely on the sizing pass: it does not
> +		 * write IB content, and the size estimate doesn't depend on
> +		 * whether the BOs are reachable.
>  		 */
> -		if (sizing_pass)
> -			vm = NULL;
> -		else
> -			vm = amdgpu_vm_lock_by_pasid(adev, &root, coredump->pasid);
> +		if (!sizing_pass) {
> +			ib_refs = amdgpu_devcoredump_collect_ib_refs(adev, coredump);
> +			if (ib_refs) {
> +				r = amdgpu_devcoredump_lock_ib_refs(&exec, ib_refs,
> +								    coredump->num_ibs);
> +				if (!r)
> +					ibs_locked = true;
> +			}
> +		}
> +
> +		for (int i = 0; i < coredump->num_ibs; i++) {
> +			struct amdgpu_bo *abo = ibs_locked ? ib_refs[i].bo : NULL;
> +			u64 offset = ibs_locked ? ib_refs[i].offset : 0;
> +			bool emit_content = sizing_pass;
>  
> -		for (int i = 0; i < coredump->num_ibs && (sizing_pass || vm); i++) {
>  			ib_content = kvmalloc_array(coredump->ibs[i].ib_size_dw, 4,
>  						    GFP_KERNEL);
>  			if (!ib_content)
>  				continue;
>  
> -			/* vm=NULL can only happen when 'sizing_pass' is true. Skip to the
> -			 * drm_printf() calls (ib_content doesn't need to be initialized
> -			 * as its content won't be written anywhere).
> -			 */
> -			if (!vm)
> +			if (!abo)
>  				goto output_ib_content;
>  
> -			va_start = coredump->ibs[i].gpu_addr & AMDGPU_GMC_HOLE_MASK;
> -			mapping = amdgpu_vm_bo_lookup_mapping(vm, va_start / AMDGPU_GPU_PAGE_SIZE);
> -			if (!mapping)
> -				goto free_ib_content;
> -
> -			offset = va_start - (mapping->start * AMDGPU_GPU_PAGE_SIZE);
> -			abo = amdgpu_bo_ref(mapping->bo_va->base.bo);
> -			r = amdgpu_bo_reserve(abo, false);
> -			if (r)
> -				goto free_ib_content;
> -
>  			if (abo->flags & AMDGPU_GEM_CREATE_NO_CPU_ACCESS) {
>  				off = 0;
>  
>  				if (abo->tbo.resource->mem_type != TTM_PL_VRAM)
> -					goto unreserve_abo;
> +					goto output_ib_content;
>  
>  				amdgpu_res_first(abo->tbo.resource, offset,
>  						 coredump->ibs[i].ib_size_dw * 4,
> @@ -391,12 +499,13 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
>  					off += cursor.size;
>  					amdgpu_res_next(&cursor, cursor.size);
>  				}
> +				emit_content = true;
>  			} else {
>  				r = ttm_bo_kmap(&abo->tbo, 0,
>  						PFN_UP(abo->tbo.base.size),
>  						&abo->kmap);
>  				if (r)
> -					goto unreserve_abo;
> +					goto output_ib_content;
>  
>  				kptr = amdgpu_bo_kptr(abo);
>  				kptr += offset;
> @@ -404,23 +513,23 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
>  				       coredump->ibs[i].ib_size_dw * 4);
>  
>  				amdgpu_bo_kunmap(abo);
> +				emit_content = true;
>  			}
>  
>  output_ib_content:
>  			drm_printf(&p, "\nIB #%d 0x%llx %d dw\n",
>  				   i, coredump->ibs[i].gpu_addr, coredump->ibs[i].ib_size_dw);
> -			for (int j = 0; j < coredump->ibs[i].ib_size_dw; j++)
> -				drm_printf(&p, "0x%08x\n", ib_content[j]);
> -unreserve_abo:
> -			if (vm)
> -				amdgpu_bo_unreserve(abo);
> -free_ib_content:
> +			if (emit_content) {
> +				for (int j = 0; j < coredump->ibs[i].ib_size_dw; j++)
> +					drm_printf(&p, "0x%08x\n", ib_content[j]);
> +			}
>  			kvfree(ib_content);
>  		}
> -		if (vm) {
> -			amdgpu_bo_unreserve(root);
> -			amdgpu_bo_unref(&root);
> -		}
> +
> +		if (ibs_locked)
> +			drm_exec_fini(&exec);
> +
> +		amdgpu_devcoredump_release_ib_refs(ib_refs, coredump->num_ibs);
>  	}
>  
>  	return count - iter.remain;

Re: [PATCH v2] drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format

Posted by Mikhail Gavrilov 4 days, 17 hours ago

On Wed, May 20, 2026 at 12:08 PM Christian König
<christian.koenig@amd.com> wrote:
>
> That whole infrastructure is superflous. You just need to modify amdgpu_vm_lock_by_pasid() to take a drm_exec object to lock the root BO.
>

Christian, modifying amdgpu_vm_lock_by_pasid() to take a drm_exec turns
out to also require converting its other caller, amdgpu_vm_handle_fault(),
to drm_exec — most of the diff is that conversion, not the helper itself.

I can:
 (a) convert both in a 2-patch series (handle_fault becomes
     drm_exec_init + drm_exec_until_all_locked + drm_exec_fini, ~30 lines),
     or
 (b) keep the loop inside amdgpu_vm_lock_by_pasid() so handle_fault stays
     a one-liner — but then the devcoredump caller can't add the IB BOs
     to the same ticket, which is the whole point.

(a) seems unavoidable if we want one helper. Is that what you had in mind,
or did you intend something lighter — e.g. a separate
amdgpu_vm_lock_by_pasid_exec() leaving handle_fault untouched?

-- 
Best Regards,
Mike Gavrilov.

Re: [PATCH v2] drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format

Posted by Christian König 4 days, 14 hours ago

On 5/20/26 10:07, Mikhail Gavrilov wrote:
> On Wed, May 20, 2026 at 12:08 PM Christian König
> <christian.koenig@amd.com> wrote:
>>
>> That whole infrastructure is superflous. You just need to modify amdgpu_vm_lock_by_pasid() to take a drm_exec object to lock the root BO.
>>
> 
> Christian, modifying amdgpu_vm_lock_by_pasid() to take a drm_exec turns
> out to also require converting its other caller, amdgpu_vm_handle_fault(),
> to drm_exec — most of the diff is that conversion, not the helper itself.
> 
> I can:
>  (a) convert both in a 2-patch series (handle_fault becomes
>      drm_exec_init + drm_exec_until_all_locked + drm_exec_fini, ~30 lines),
>      or
>  (b) keep the loop inside amdgpu_vm_lock_by_pasid() so handle_fault stays
>      a one-liner — but then the devcoredump caller can't add the IB BOs
>      to the same ticket, which is the whole point.
> 
> (a) seems unavoidable if we want one helper. Is that what you had in mind,
> or did you intend something lighter — e.g. a separate
> amdgpu_vm_lock_by_pasid_exec() leaving handle_fault untouched?
> 

Just make that two patches, first switching over amdgpu_vm_lock_by_pasid() to using drm_exec() on both use cases.

And then changing the one for device core dumping to lock all BOs at once.

Thanks,
Christian.