From nobody Fri Oct 10 13:51:27 2025 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35A8322F770 for ; Fri, 13 Jun 2025 18:44:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749840251; cv=none; b=THVgQMnIOKaxIfQOv5f59DsRN1o7QDu/nv6tLMZfHpG20WlZ077k/DJ2Ijwc0G64K00JQrqN0O9a4I3nXBpDJggD2VNwiabk0EPtQpzEmUcugGpbVatcyknb8XX7DkEi6yIusDvaTiXXQcQHruptc4JVqwNj1sjhS7WG4s/Votc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749840251; c=relaxed/simple; bh=4YtU9gvjy6uxV0r1UMPIN6VSBWMM887lN47FVkROKrk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=RhAnvVs9wkJlNne4+EyJKgRXn03bG0E/7sZBptP5wjRF9D31uoQTyQEOVlpddsr08Fvpft03lr6qtoBoikScZ3zowuyM2SmPHYvZZYl9zlMppPEJHJdcNXUPsYNO/tU5skrZ+8hZCUrjFmjhFeaDuWUChM9izQVCs2z7l76XF9U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=S939APoI; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="S939APoI" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=fpk+ehOaQJ0s95miGn1Ez8px3wlt6Oawad6Q66DHOjU=; b=S939APoIsq6qoj2YPu1pqM+TfA xvORIeAUjZevu8HWvOhiNN+gfHlO68WxcMwYMG9zVv0WQ3JprFlbQFVdmzRkl3gObOaVu7lXVaKWm FJHmFunkBWRZ3gYx1t7jyBcscpW/elsnmbf1YGzQ9P1Vtcsbd0QYGg/SbTpkBHboklfCZusprABmi uJPtVVsvfDTe9z7hDEb1gSN/2skHgkuBiorxZ622CI5BrajiaK3yyvGoOV5BHj6MMJtNB+9oovXRy T1M0tnJ5Pd2PtYokTXBT6wYeFT6feledj2PcH3jivhqbDBzS9op1RdmVwwZf+8yG1fYISsVEUH21s /SOk+X6w==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uQ9Nf-003AD0-P4; Fri, 13 Jun 2025 20:44:00 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v7 1/5] drm: amdgpu: Create amdgpu_vm_print_task_info() Date: Fri, 13 Jun 2025 15:43:44 -0300 Message-ID: <20250613184348.1761020-2-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250613184348.1761020-1-andrealmeid@igalia.com> References: <20250613184348.1761020-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable To avoid repetitive code in amdgpu, create a function that prints the content of struct amdgpu_task_info. Signed-off-by: Andr=C3=A9 Almeida --- v7: new patch --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 3 +++ drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 5 +---- drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 5 +---- drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 5 +---- drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 4 +--- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 5 +---- 8 files changed, 18 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_job.c index 75262ce8db27..3d887428ca2b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -124,9 +124,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(stru= ct drm_sched_job *s_job) =20 ti =3D amdgpu_vm_get_task_info_pasid(ring->adev, job->pasid); if (ti) { - dev_err(adev->dev, - "Process information: process %s pid %d thread %s pid %d\n", - ti->process_name, ti->tgid, ti->task_name, ti->pid); + amdgpu_vm_print_task_info(adev, ti); amdgpu_vm_put_task_info(ti); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/a= mdgpu/amdgpu_vm.c index 3911c78f8282..f2a0132521c2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -3156,3 +3156,12 @@ bool amdgpu_vm_is_bo_always_valid(struct amdgpu_vm *= vm, struct amdgpu_bo *bo) { return bo && bo->tbo.base.resv =3D=3D vm->root.bo->tbo.base.resv; } + +inline void amdgpu_vm_print_task_info(struct amdgpu_device *adev, + struct amdgpu_task_info *task_info) +{ + dev_err(adev->dev, + " Process %s pid %d thread %s pid %d\n", + task_info->process_name, task_info->tgid, + task_info->task_name, task_info->pid); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/a= mdgpu/amdgpu_vm.h index f3ad687125ad..3862a256b9b8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -668,4 +668,7 @@ void amdgpu_vm_tlb_fence_create(struct amdgpu_device *a= dev, struct amdgpu_vm *vm, struct dma_fence **fence); =20 +inline void amdgpu_vm_print_task_info(struct amdgpu_device *adev, + struct amdgpu_task_info *task_info); + #endif diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/a= mdgpu/gmc_v10_0.c index a3e2787501f1..7923f491cf73 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c @@ -164,10 +164,7 @@ static int gmc_v10_0_process_interrupt(struct amdgpu_d= evice *adev, entry->src_id, entry->ring_id, entry->vmid, entry->pasid); task_info =3D amdgpu_vm_get_task_info_pasid(adev, entry->pasid); if (task_info) { - dev_err(adev->dev, - " in process %s pid %d thread %s pid %d\n", - task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + amdgpu_vm_print_task_info(adev, task_info); amdgpu_vm_put_task_info(task_info); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/a= mdgpu/gmc_v11_0.c index 72211409227b..f15d691e9a20 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c @@ -134,10 +134,7 @@ static int gmc_v11_0_process_interrupt(struct amdgpu_d= evice *adev, entry->src_id, entry->ring_id, entry->vmid, entry->pasid); task_info =3D amdgpu_vm_get_task_info_pasid(adev, entry->pasid); if (task_info) { - dev_err(adev->dev, - " in process %s pid %d thread %s pid %d)\n", - task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + amdgpu_vm_print_task_info(adev, task_info); amdgpu_vm_put_task_info(task_info); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c b/drivers/gpu/drm/amd/a= mdgpu/gmc_v12_0.c index b645d3e6a6c8..de763105fdfd 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c @@ -127,10 +127,7 @@ static int gmc_v12_0_process_interrupt(struct amdgpu_d= evice *adev, entry->src_id, entry->ring_id, entry->vmid, entry->pasid); task_info =3D amdgpu_vm_get_task_info_pasid(adev, entry->pasid); if (task_info) { - dev_err(adev->dev, - " in process %s pid %d thread %s pid %d)\n", - task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + amdgpu_vm_print_task_info(adev, task_info); amdgpu_vm_put_task_info(task_info); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/am= dgpu/gmc_v8_0.c index 99ca08e9bdb5..b45fa0cea9d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c @@ -1458,9 +1458,7 @@ static int gmc_v8_0_process_interrupt(struct amdgpu_d= evice *adev, =20 task_info =3D amdgpu_vm_get_task_info_pasid(adev, entry->pasid); if (task_info) { - dev_err(adev->dev, " for process %s pid %d thread %s pid %d\n", - task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + amdgpu_vm_print_task_info(adev, task_info); amdgpu_vm_put_task_info(task_info); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/am= dgpu/gmc_v9_0.c index 282197f4ffb1..78f65aea03f8 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -641,10 +641,7 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_de= vice *adev, =20 task_info =3D amdgpu_vm_get_task_info_pasid(adev, entry->pasid); if (task_info) { - dev_err(adev->dev, - " for process %s pid %d thread %s pid %d)\n", - task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + amdgpu_vm_print_task_info(adev, task_info); amdgpu_vm_put_task_info(task_info); } =20 --=20 2.49.0 From nobody Fri Oct 10 13:51:27 2025 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18273233714 for ; Fri, 13 Jun 2025 18:44:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749840255; cv=none; b=usaeyndfF9sSM5a2Z9+57ti5sIwxOrcQUwy5CyP/+Zmb9Rg1UH6IJLhQAm5sBEEu93l+NNdKb1HBPo19PPP4buHCizt/yufBf2h1z4wM1AYyr0ey0Kmp0t1r19MVgu8ZUANF6iU/MFnnOnt0KYZ+K+NJ4vA40sfNBJbDdahTdfM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749840255; c=relaxed/simple; bh=3S7UbO25clCwMGaJTfac9ZFHxBuYr0XY+7mKomESpto=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=i/zhBpfMAODLakkgzHheBXzEhg75KHL/w6gxPgYVhPGFbxl7+BpdHaBGdyGl9KTMbhJ00mcVfWWq53z36Mx5SGNSxGXAnVPq5DT4NgL4cL1TQcyYarTL6VKL0c8CySxxTUKm3NxsxqPu1uv8IGZ0sX0792iWQxGWrNxyPc88rr0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=hUhdm8sz; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="hUhdm8sz" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=W/FTyF7NDnaWY2yudF+a663WW7/aKtuqO0NLRA6gThY=; b=hUhdm8sz/6Mv/Cve3Uxlo+MSr0 JgeBwDkDQaoWlSJqlMNd/U5QVKdjx3AuHvSNoNHAAjiA9CxWYrupn5Mz6eczD6xDIFLRNyr90COu1 bFbOkFE2Kyycr+aChvG2n3ZtEVU8lkg/a9B9QG8ViVcoHHVn2oMVXVLKd8UlILtRZuZjPWRuYF1fI 1OmvtNa05FK1qBsvwCrgH4AhMa0kLpJqulIe53pKM4CWCy9IF5NFY30RyPNk2sEp/JuaFqtc74tzy mEqFyNFDFwsHQmY0T6N/Os48E4NjJYDQC+hyD5JMLCNvqsSlTTxWoKF83mNOPaccCOhKaN7gjfHm6 XWTZaiSQ==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uQ9Nj-003AD0-NN; Fri, 13 Jun 2025 20:44:04 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v7 2/5] drm: Create a task info option for wedge events Date: Fri, 13 Jun 2025 15:43:45 -0300 Message-ID: <20250613184348.1761020-3-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250613184348.1761020-1-andrealmeid@igalia.com> References: <20250613184348.1761020-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable When a device get wedged, it might be caused by a guilty application. For userspace, knowing which task was involved can be useful for some situations, like for implementing a policy, logs or for giving a chance for the compositor to let the user know what task was involved in the problem. This is an optional argument, when the task info is not available, the PID and TASK string won't appear in the event string. Sometimes just the PID isn't enough giving that the task might be already dead by the time userspace will try to check what was this PID's name, so to make the life easier also notify what's the task's name in the user event. Acked-by: Rodrigo Vivi (for i915 and xe) Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: Andr=C3=A9 Almeida --- v7: - Change `char *comm` to `char comm[TASK_COMM_LEN]` v6: - s/cause/involved - drop string initialization v5: - s/app/task for struct and commit message as well - move defines to drm_drv.c - validates if comm is not NULL and it's not empty v4: s/APP/TASK v3: Make comm_string and pid_string empty when there's no app info --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- drivers/gpu/drm/drm_drv.c | 20 ++++++++++++++++---- drivers/gpu/drm/i915/gt/intel_reset.c | 3 ++- drivers/gpu/drm/xe/xe_device.c | 3 ++- include/drm/drm_device.h | 8 ++++++++ include/drm/drm_drv.h | 3 ++- 7 files changed, 32 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/a= md/amdgpu/amdgpu_device.c index e1bab6a96cb6..8a0f36f33f13 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -6364,7 +6364,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *a= dev, atomic_set(&adev->reset_domain->reset_res, r); =20 if (!r) - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE); + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); =20 return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_job.c index 3d887428ca2b..0c1381b527fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -164,7 +164,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(stru= ct drm_sched_job *s_job) if (amdgpu_ring_sched_ready(ring)) drm_sched_start(&ring->sched, 0); dev_err(adev->dev, "Ring %s reset succeeded\n", ring->sched.name); - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE); + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); goto exit; } dev_err(adev->dev, "Ring %s reset failure\n", ring->sched.name); diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 56dd61f8e05a..eba99a081ec1 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -538,10 +538,15 @@ static const char *drm_get_wedge_recovery(unsigned in= t opt) } } =20 +#define WEDGE_STR_LEN 32 +#define PID_STR_LEN 15 +#define COMM_STR_LEN (TASK_COMM_LEN + 5) + /** * drm_dev_wedged_event - generate a device wedged uevent * @dev: DRM device * @method: method(s) to be used for recovery + * @info: optional information about the guilty task * * This generates a device wedged uevent for the DRM device specified by @= dev. * Recovery @method\(s) of choice will be sent in the uevent environment as @@ -554,13 +559,13 @@ static const char *drm_get_wedge_recovery(unsigned in= t opt) * * Returns: 0 on success, negative error code otherwise. */ -int drm_dev_wedged_event(struct drm_device *dev, unsigned long method) +int drm_dev_wedged_event(struct drm_device *dev, unsigned long method, + struct drm_wedge_task_info *info) { const char *recovery =3D NULL; unsigned int len, opt; - /* Event string length up to 28+ characters with available methods */ - char event_string[32]; - char *envp[] =3D { event_string, NULL }; + char event_string[WEDGE_STR_LEN], pid_string[PID_STR_LEN], comm_string[CO= MM_STR_LEN]; + char *envp[] =3D { event_string, NULL, NULL, NULL }; =20 len =3D scnprintf(event_string, sizeof(event_string), "%s", "WEDGED=3D"); =20 @@ -582,6 +587,13 @@ int drm_dev_wedged_event(struct drm_device *dev, unsig= ned long method) drm_info(dev, "device wedged, %s\n", method =3D=3D DRM_WEDGE_RECOVERY_NON= E ? "but recovered through reset" : "needs recovery"); =20 + if (info && (info->comm[0] !=3D '\0') && (info->pid >=3D 0)) { + snprintf(pid_string, sizeof(pid_string), "PID=3D%u", info->pid); + snprintf(comm_string, sizeof(comm_string), "TASK=3D%s", info->comm); + envp[1] =3D pid_string; + envp[2] =3D comm_string; + } + return kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp); } EXPORT_SYMBOL(drm_dev_wedged_event); diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/g= t/intel_reset.c index dbdcfe130ad4..ba1d8fdc3c7b 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -1448,7 +1448,8 @@ static void intel_gt_reset_global(struct intel_gt *gt, kobject_uevent_env(kobj, KOBJ_CHANGE, reset_done_event); else drm_dev_wedged_event(>->i915->drm, - DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET); + DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET, + NULL); } =20 /** diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index c02c4c4e9412..f329613e061f 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -1168,7 +1168,8 @@ void xe_device_declare_wedged(struct xe_device *xe) =20 /* Notify userspace of wedged device */ drm_dev_wedged_event(&xe->drm, - DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET); + DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET, + NULL); } =20 for_each_gt(gt, xe, id) diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h index e2f894f1b90a..729e1c6da138 100644 --- a/include/drm/drm_device.h +++ b/include/drm/drm_device.h @@ -30,6 +30,14 @@ struct pci_controller; #define DRM_WEDGE_RECOVERY_REBIND BIT(1) /* unbind + bind driver */ #define DRM_WEDGE_RECOVERY_BUS_RESET BIT(2) /* unbind + reset bus device += bind */ =20 +/** + * struct drm_wedge_task_info - information about the guilty task of a wed= ge dev + */ +struct drm_wedge_task_info { + pid_t pid; + char comm[TASK_COMM_LEN]; +}; + /** * enum switch_power_state - power state of drm device */ diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h index 63b51942d606..3f76a32d6b84 100644 --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -487,7 +487,8 @@ void drm_put_dev(struct drm_device *dev); bool drm_dev_enter(struct drm_device *dev, int *idx); void drm_dev_exit(int idx); void drm_dev_unplug(struct drm_device *dev); -int drm_dev_wedged_event(struct drm_device *dev, unsigned long method); +int drm_dev_wedged_event(struct drm_device *dev, unsigned long method, + struct drm_wedge_task_info *info); =20 /** * drm_dev_is_unplugged - is a DRM device unplugged --=20 2.49.0 From nobody Fri Oct 10 13:51:27 2025 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64C2D23771C for ; Fri, 13 Jun 2025 18:44:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749840259; cv=none; b=ox2n7WEmWX+/Bv44vn83sNG5ESW/MkHUFWNuMFDYeXiq/1Qs8mzPZELvY0i3DIh8nxTZ/OLCgOvSzh9XHn8Z4Ztg/WqZh8+WdnGc+fqxAs04Aq9R2YPe92jdX5eNk01DRa3Ee7CPfpqAV7OBpnn1aWb+PVhzpbNB94y+L+EzeeM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749840259; c=relaxed/simple; bh=vDHKuEB52NE3FyntkV/XWuE9A1twU7QMvf0KaC9hsyI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ZcoQM3UxGWrcd9l6YyXhYdjFv2ReSXxQv0GBGheRvrwYroPj4Uip/A8aaLoVinx+ff71Y8ND4rMTnjlGZiap/WFpJMmFzEO+Q25MkK0GxK1IPmjXHmeO7wuXdzrt6OTa5LhkaEY4hmDQ17ITv43N6JHziJZhxxZ8hdblNONUvvs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=Zo6icWzF; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="Zo6icWzF" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=5bW97YWvE2WNp84/m7LouGNWv9aZZ0XfiU91U9Q+7L4=; b=Zo6icWzFyINiSyU+TOThbUnoHD HuceCzinkloqBxARJ66WINhSBbJP2LFMdgOYiSlUCXwTwU88uX2LK7B4ojcTb1TQcrtI8fHPAlRaW iSkEz2iAkVkhFfK5St7psyEkI6yQo4XIrkzwENnL+Bw1ik934lyVj11VyUYGWzYBj+jpWEKPalosy GEWVwo5CvxlAZm0BP/n9luvUqT2k+nbZj7CkgR3MYVuNxZ51DG4oI011rNZLixzv6FPaqN6+XkAE9 9Ujzl3O5/YdL764CFRHqTETG4Dq5ARbKQONrO6bgnk2c+G8pxVzkN82UXcF50kgk8lQd+wHJ5TSRN qQzYs9+g==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uQ9Nn-003AD0-Le; Fri, 13 Jun 2025 20:44:08 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v7 3/5] drm/doc: Add a section about "Task information" for the wedge API Date: Fri, 13 Jun 2025 15:43:46 -0300 Message-ID: <20250613184348.1761020-4-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250613184348.1761020-1-andrealmeid@igalia.com> References: <20250613184348.1761020-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add a section about "Task information" for the wedge API. Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: Andr=C3=A9 Almeida --- v5: - Change app to task in the text as well v4: - Change APP to TASK v3: - Change "app that caused ..." to "app involved ..." - Clarify that devcoredump have more information about what happened - Update that PID and APP will be empty if there's no app info --- Documentation/gpu/drm-uapi.rst | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst index 4863a4deb0ee..263e5a97c080 100644 --- a/Documentation/gpu/drm-uapi.rst +++ b/Documentation/gpu/drm-uapi.rst @@ -446,6 +446,23 @@ telemetry information (devcoredump, syslog). This is u= seful because the first hang is usually the most critical one which can result in consequential ha= ngs or complete wedging. =20 +Task information +--------------- + +The information about which application (if any) was involved in the device +wedging is useful for userspace if they want to notify the user about what +happened (e.g. the compositor display a message to the user "The +caused a graphical error and the system recovered") or to implement polici= es +(e.g. the daemon may "ban" an task that keeps resetting the device). If th= e task +information is available, the uevent will display as ``PID=3D`` and +``TASK=3D``. Otherwise, ``PID`` and ``TASK`` will not appear in= the +event string. + +The reliability of this information is driver and hardware specific, and s= hould +be taken with a caution regarding it's precision. To have a big picture of= what +really happened, the devcoredump file provides should have much more detai= led +information about the device state and about the event. + Consumer prerequisites ---------------------- =20 --=20 2.49.0 From nobody Fri Oct 10 13:51:27 2025 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DBE9238C04 for ; Fri, 13 Jun 2025 18:44:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749840262; cv=none; b=MD5m6MPGEkSyRrz32Akok8sVy3bPSkrRjYNd0UyRsDe4Lsndlg0ZyvM0RAMPfzpXUVBelZXK+VUpySw56cuKcIKYCq+ykzXqsAodvlzvdvfRi+s6rET88Ncfcp5Cz7xUjfjN51j3iDEXFiE9O4/JDbEaNn69RKmwMLop7zdvluc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749840262; c=relaxed/simple; bh=ZanFKgcsTFyfi2rwSD/tgyjiHEJzTibA/tePxW/PcYw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jYzg9uEOTB1PWgKKLvw/G2k7citmHsMxAU0ukG02u1SvRPMW3NzKXsJdXTIL1S4uOKO8Y/ZEfYxmMPqtjJcKR7LWvxHek9yim3ZvTTX584TdHK7hlVpS/NWcq0o0gyDPvJFAFv3tjpC4PWcFAtG6epDIlek/Ds/iZD89+okOZGk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=RgP6vdW7; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="RgP6vdW7" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=SRU62zzjZpsN9hshHm9vrDZHMgBidcN7rETCCaiDACM=; b=RgP6vdW7hBlfqGAy4mMZKKRbqW Peaegfb3o804lkC2QsNogkCnXMnf1wPn10Kh+aY7k9+GFq6Vrz96AKj5RYtQSTvwCkGuEKKXk4UI1 qCUm/HsJSpabDqI8Y/klEhCQblQ5EF4vK1BUyI9MozaOQmT7HTg2KiRG2H58TwJfGn3mfyu36b3QK B3d62GUtsc3mAsIAN5mAnlGcWbR97+Wv3oejE9BJW5FS8fObPGeJLYp1FNj+lO30IcjYC8xCPJT9B f6+GM0xUPKDQa8YMDHAL7V4wYTIG7WhGyhtWEP34xhYdPd59RQNiz+ei/gLlJo2xyIVXW6DJxZ38P lEeZAISw==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uQ9Nr-003AD0-K3; Fri, 13 Jun 2025 20:44:12 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v7 4/5] drm: amdgpu: Use struct drm_wedge_task_info inside of struct amdgpu_task_info Date: Fri, 13 Jun 2025 15:43:47 -0300 Message-ID: <20250613184348.1761020-5-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250613184348.1761020-1-andrealmeid@igalia.com> References: <20250613184348.1761020-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable To avoid a cast when calling drm_dev_wedged_event(), replace pid and task name inside of struct amdgpu_task_info with struct drm_wedge_task_info. Signed-off-by: Andr=C3=A9 Almeida Reviewed-by: Christian K=C3=B6nig --- v7: New patch --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 3 +-- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 8 ++++---- 9 files changed, 18 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/= amd/amdgpu/amdgpu_debugfs.c index 8e626f50b362..dac4b926e7be 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c @@ -1786,7 +1786,7 @@ static int amdgpu_debugfs_vm_info_show(struct seq_fil= e *m, void *unused) =20 ti =3D amdgpu_vm_get_task_info_vm(vm); if (ti) { - seq_printf(m, "pid:%d\tProcess:%s ----------\n", ti->pid, ti->process_n= ame); + seq_printf(m, "pid:%d\tProcess:%s ----------\n", ti->task.pid, ti->proc= ess_name); amdgpu_vm_put_task_info(ti); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c b/drivers/gpu= /drm/amd/amdgpu/amdgpu_dev_coredump.c index 7b50741dc097..8a026bc9ea44 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c @@ -220,10 +220,10 @@ amdgpu_devcoredump_read(char *buffer, loff_t offset, = size_t count, drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); =20 - if (coredump->reset_task_info.pid) + if (coredump->reset_task_info.task.pid) drm_printf(&p, "process_name: %s PID: %d\n", coredump->reset_task_info.process_name, - coredump->reset_task_info.pid); + coredump->reset_task_info.task.pid); =20 /* SOC Information */ drm_printf(&p, "\nSOC Information\n"); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_gem.c index 0ecc88df7208..e5e33a68d935 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -329,7 +329,7 @@ static int amdgpu_gem_object_open(struct drm_gem_object= *obj, =20 dev_warn(adev->dev, "validate_and_fence failed: %d\n", r); if (ti) { - dev_warn(adev->dev, "pid %d\n", ti->pid); + dev_warn(adev->dev, "pid %d\n", ti->task.pid); amdgpu_vm_put_task_info(ti); } } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/a= mdgpu/amdgpu_vm.c index f2a0132521c2..0efd3fc7cf3e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -622,7 +622,7 @@ int amdgpu_vm_validate(struct amdgpu_device *adev, stru= ct amdgpu_vm *vm, =20 pr_warn_ratelimited("Evicted user BO is not reserved\n"); if (ti) { - pr_warn_ratelimited("pid %d\n", ti->pid); + pr_warn_ratelimited("pid %d\n", ti->task.pid); amdgpu_vm_put_task_info(ti); } =20 @@ -2507,11 +2507,11 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm) if (!vm->task_info) return; =20 - if (vm->task_info->pid =3D=3D current->pid) + if (vm->task_info->task.pid =3D=3D current->pid) return; =20 - vm->task_info->pid =3D current->pid; - get_task_comm(vm->task_info->task_name, current); + vm->task_info->task.pid =3D current->pid; + get_task_comm(vm->task_info->task.comm, current); =20 if (current->group_leader->mm !=3D current->mm) return; @@ -2774,7 +2774,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struc= t amdgpu_vm *vm) =20 dev_warn(adev->dev, "VM memory stats for proc %s(%d) task %s(%d) is non-zero when fini\n", - ti->process_name, ti->pid, ti->task_name, ti->tgid); + ti->process_name, ti->task.pid, ti->task.comm, ti->tgid); } =20 amdgpu_vm_put_task_info(vm->task_info); @@ -3163,5 +3163,5 @@ inline void amdgpu_vm_print_task_info(struct amdgpu_d= evice *adev, dev_err(adev->dev, " Process %s pid %d thread %s pid %d\n", task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + task_info->task.comm, task_info->task.pid); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/a= mdgpu/amdgpu_vm.h index 3862a256b9b8..b5c3af1c5e99 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -236,9 +236,8 @@ struct amdgpu_vm_pte_funcs { }; =20 struct amdgpu_task_info { + struct drm_wedge_task_info task; char process_name[TASK_COMM_LEN]; - char task_name[TASK_COMM_LEN]; - pid_t pid; pid_t tgid; struct kref refcount; }; diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/a= mdgpu/sdma_v4_0.c index 33ed2b158fcd..f38004e6064e 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -2187,7 +2187,7 @@ static int sdma_v4_0_print_iv_entry(struct amdgpu_dev= ice *adev, dev_dbg_ratelimited(adev->dev, " for process %s pid %d thread %s pid %d\n", task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + task_info->task.comm, task_info->task.pid); amdgpu_vm_put_task_info(task_info); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd= /amdgpu/sdma_v4_4_2.c index 9c169112a5e7..bcde34e4e0a1 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c @@ -1884,7 +1884,7 @@ static int sdma_v4_4_2_print_iv_entry(struct amdgpu_d= evice *adev, if (task_info) { dev_dbg_ratelimited(adev->dev, " for process %s pid %d thread %s pid %d\= n", task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + task_info->task.comm, task_info->task.pid); amdgpu_vm_put_task_info(task_info); } =20 diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/= amdkfd/kfd_events.c index 2b294ada3ec0..82905f3e54dd 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -1302,7 +1302,7 @@ void kfd_signal_reset_event(struct kfd_node *dev) if (ti) { dev_err(dev->adev->dev, "Queues reset on process %s tid %d thread %s pid %d\n", - ti->process_name, ti->tgid, ti->task_name, ti->pid); + ti->process_name, ti->tgid, ti->task.comm, ti->task.pid); amdgpu_vm_put_task_info(ti); } } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/= amd/amdkfd/kfd_smi_events.c index 83d9384ac815..a499449fcb06 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c @@ -253,9 +253,9 @@ void kfd_smi_event_update_vmfault(struct kfd_node *dev,= uint16_t pasid) task_info =3D amdgpu_vm_get_task_info_pasid(dev->adev, pasid); if (task_info) { /* Report VM faults from user applications, not retry from kernel */ - if (task_info->pid) + if (task_info->task.pid) kfd_smi_event_add(0, dev, KFD_SMI_EVENT_VMFAULT, KFD_EVENT_FMT_VMFAULT( - task_info->pid, task_info->task_name)); + task_info->task.pid, task_info->task.comm)); amdgpu_vm_put_task_info(task_info); } } @@ -359,8 +359,8 @@ void kfd_smi_event_process(struct kfd_process_device *p= dd, bool start) kfd_smi_event_add(0, pdd->dev, start ? KFD_SMI_EVENT_PROCESS_START : KFD_SMI_EVENT_PROCESS_END, - KFD_EVENT_FMT_PROCESS(task_info->pid, - task_info->task_name)); + KFD_EVENT_FMT_PROCESS(task_info->task.pid, + task_info->task.comm)); amdgpu_vm_put_task_info(task_info); } } --=20 2.49.0 From nobody Fri Oct 10 13:51:27 2025 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2248423C4F8 for ; Fri, 13 Jun 2025 18:44:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749840265; cv=none; b=ZPYkgD12xoNjlkWHR69AcMazqA3l6R6lfuKOqEqQHa9yWqzEd65+CzY2GCycf3gO4JkB42aOa75po/AloYPLcLFUeMmmWISFu8/8n8v8tIrcwFpxwSUpMx3QCat/d3JcRa12MaaHfcNiObr37Zj70gOVo8N4taf60TC8SgU2ZCQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749840265; c=relaxed/simple; bh=7Kn1u/gVa9sDdXQEiFPRNu3yl2AWY2F/Esch2C3NQcA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=oIjZeW796DQfD3dSMZeLbNk67aRA5hB0L92D58F5JHHrjdIgDasPWMNLc9YB/TNrkCW/E99TnqpQgIbRqkfNQm5NKbd61iNfV/5cH4t8/hBaV/b4r2zN3b7ZIRlYxuoGpXQBGsIRRF8QlzJ3yoQ+fxoVFVrB2DPKpWwTU9o60rM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=YwZELRoP; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="YwZELRoP" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=oOcwqzD6tOcdnoanCuqsRDl10qjCYbFL/juK82IuqGw=; b=YwZELRoPPHj0BmrWPXnXXoJqXT LdWcGIjzyCGQAbnYS1p/TIAS574XtWilBE9G3QpNhQgAVw7+KNUbPH72QyHb+lZ/GHwsarzpoyEUu 3tzJvDfLOsOQ71RJhk9aEROkBdDryDZzjuQP1ofoAvLbIm1yTu1G4ke4ubq+T23Km+MJPsEuPjX10 RiRiW0L8aqIGPNNVDbUvDcqOSV9Sri9dFwcWpmvDFF+V3rNqUIZvzQGCfTIumGXgPXBQlIpJU54TD qtSxSaye5bVRxOBL4iwwO7p1fGb5jykvryPMZxTRE7WBUc68G/vs66SdAeBkbq1RKAyhRf2MaWSb0 TTzB/bDg==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uQ9Nv-003AD0-I2; Fri, 13 Jun 2025 20:44:15 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v7 5/5] drm/amdgpu: Make use of drm_wedge_task_info Date: Fri, 13 Jun 2025 15:43:48 -0300 Message-ID: <20250613184348.1761020-6-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250613184348.1761020-1-andrealmeid@igalia.com> References: <20250613184348.1761020-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable To notify userspace about which task (if any) made the device get in a wedge state, make use of drm_wedge_task_info parameter, filling it with the task PID and name. Signed-off-by: Andr=C3=A9 Almeida --- v7: - Remove struct cast, now we can use `info =3D &ti->task` - Fix struct lifetime, move amdgpu_vm_put_task_info() after drm_dev_wedged_event() call --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 +++++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++-- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/a= md/amdgpu/amdgpu_device.c index 8a0f36f33f13..67cff53678e1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -6363,8 +6363,21 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *= adev, =20 atomic_set(&adev->reset_domain->reset_res, r); =20 - if (!r) - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); + if (!r) { + struct drm_wedge_task_info *info =3D NULL; + struct amdgpu_task_info *ti =3D NULL; + + if (job) { + ti =3D amdgpu_vm_get_task_info_pasid(adev, job->pasid); + if (ti) + info =3D &ti->task; + } + + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, info); + + if (ti) + amdgpu_vm_put_task_info(ti); + } =20 return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_job.c index 0c1381b527fe..f061f691f556 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -89,6 +89,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct= drm_sched_job *s_job) { struct amdgpu_ring *ring =3D to_amdgpu_ring(s_job->sched); struct amdgpu_job *job =3D to_amdgpu_job(s_job); + struct drm_wedge_task_info *info =3D NULL; struct amdgpu_task_info *ti; struct amdgpu_device *adev =3D ring->adev; int idx; @@ -125,7 +126,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(stru= ct drm_sched_job *s_job) ti =3D amdgpu_vm_get_task_info_pasid(ring->adev, job->pasid); if (ti) { amdgpu_vm_print_task_info(adev, ti); - amdgpu_vm_put_task_info(ti); + info =3D &ti->task; } =20 /* attempt a per ring reset */ @@ -164,13 +165,16 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(st= ruct drm_sched_job *s_job) if (amdgpu_ring_sched_ready(ring)) drm_sched_start(&ring->sched, 0); dev_err(adev->dev, "Ring %s reset succeeded\n", ring->sched.name); - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, info); goto exit; } dev_err(adev->dev, "Ring %s reset failure\n", ring->sched.name); } dma_fence_set_error(&s_job->s_fence->finished, -ETIME); =20 + if (ti) + amdgpu_vm_put_task_info(ti); + if (amdgpu_device_should_recover_gpu(ring->adev)) { struct amdgpu_reset_context reset_context; memset(&reset_context, 0, sizeof(reset_context)); --=20 2.49.0