From nobody Thu Oct 9 20:25:18 2025 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B23E1B4231 for ; Mon, 16 Jun 2025 18:15:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097736; cv=none; b=n29KF9shSvQABYwlm9psa+DwjY4MBtCrK5RnwrVjE9ejL5brE/GlPkxuqHx9OwrgaWqAx6dKsNDs2eDOBYlTsljM6oxL2+BOaEeMtwP2R9qn+3ZTetQYLETAvGC0ptNy7kbl6qOqvnY5JGA37gXE8unDQSiVArwixcLdmLMUMhI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097736; c=relaxed/simple; bh=E1ZIO9BFrLWcgct8qP07+TcvpjAKRQ1ZjdkzQYhno/k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NMStr65t4DOd5wmBmuUSMfOX50egx/S3r2F1JixgM3uTBO1FbuG+2pzD2FU2148cTgnKOY2C/gtKbIbAlut/dkEC+eowp+F35acFY3KGzLKBaQcf4YklSH//kfrdtn+TTsDjHql8hJBpOwXZFvNjmYHzp+TCjb6m7ccM5uTlsyQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=OpaF8pqh; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="OpaF8pqh" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=psYij8s9Ia2uYH+3eeL2z0De96IT8g97QGXw+98uT8g=; b=OpaF8pqhsEEyn/YUt8Ls0rCWPP Fex+nlV8DsU7HvGjjEPobjGnW2KFYWfiw0vUnYH9hxr+JYmdsL7YWI/70HX39eTl1+ZfUY2ZOLmYV T0+Hvx6FkVrmAZd114MFj78urHM5fGMqSF/2tB1g+/jWhLvwHezAfsrtUIU+UCv1mPGjbyTOsdi9x S+cpiHUKFnFxwKsVbpWqVrn6uRqVzK3kAOkNCKesMWc5tJMXEzLGEWRLyXsFOW9FN3bR/N9UuSphF IynfJs/xVwMTF8oy6zrenXhytgRCSolV/Rhxhw+LOVeG+75Hm6bUALPZUzOI4QK6fTmcFqb+13o9f vLq32qpg==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uREMa-004Eh6-4E; Mon, 16 Jun 2025 20:15:20 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v8 1/6] drm: amdgpu: Allow NULL pointers at amdgpu_vm_put_task_info() Date: Mon, 16 Jun 2025 15:14:33 -0300 Message-ID: <20250616181438.2124656-2-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250616181438.2124656-1-andrealmeid@igalia.com> References: <20250616181438.2124656-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Allow NULL pointers at amdgpu_vm_put_task_info() as it common practice for "put" or "free" functions. This avoid an extra check for NULL for callers. Signed-off-by: Andr=C3=A9 Almeida --- v8: New patch --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/a= mdgpu/amdgpu_vm.c index 3911c78f8282..dadf5633476c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2447,6 +2447,9 @@ amdgpu_vm_get_vm_from_pasid(struct amdgpu_device *ade= v, u32 pasid) */ void amdgpu_vm_put_task_info(struct amdgpu_task_info *task_info) { + if (unlikely(ZERO_OR_NULL_PTR(task_info))) + return; + kref_put(&task_info->refcount, amdgpu_vm_destroy_task_info); } =20 --=20 2.49.0 From nobody Thu Oct 9 20:25:18 2025 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91B981C860C for ; Mon, 16 Jun 2025 18:15:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097737; cv=none; b=TPJWweYUk9fQAVgI+cmfsieHxMXzQrItQNuwIJ6k89MJOrWXmND8S+Qp6UyRnp1kNWcx6BZtOkxr9ITERgbId246DQ04aUvj2X+ezpjhWxjl/7Qb1Tlk7ZUHFXw8lOLRe44Lxl0tDaNHw9EJmEgRQKKm0I7UYz672auXvDt/Abs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097737; c=relaxed/simple; bh=hqO0qra/pRir1ZBUmPLYKHVA2rtUBCmA3Gd7v9SdkjY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=GE/7dYyoe3DIdJCT1JpZJdEBLR0U5kDpVI1SLKwSneBiu3CrCViiYg/pZoWJaEDLQxvCKnKEGAGf16y3wF4T6hAIdlloPbAY26FqS7YQxl5yM6QPiTa14ac4LzWGQ5izEQWDQVSk2Z5twBLR7w3uSZZcWWfe+BGzAeZy+ZYtwb8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=JcUqdXOJ; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="JcUqdXOJ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=FoIwQaxAIO0K7p8TD2D0AWyvgUziVqQ2Sz3i8pi59yc=; b=JcUqdXOJiWa/rSkviN7ImOOlOe fjnkI4g9Bs2DGw/XwoXvm8uq2xdbYbK85lLmd64skK3KDz8986lDdYb0h8l3SM9jm5wuiEzAASKah RdFf6g124J8btAn4VZzV8avBYfLm79+cfN1pZOsdC0TQKJ24fo2IC8oifdJj2NAVqORHaIQQdERqM F8v0ZCzDjoI3tBzOJaNoFAszW3Ik1pKf6i/LCB4d9q/hZkzdRztccQaFmqYIDwTR3/i5ZhJMFLS9u 3uJDcu8RouNorI3SIEAjoMfd38pFT28yU15W0yaQIgW81Hb2W1uW8M8f/7fOemCgobc/nmDU4LPKq rJc/edMQ==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uREMe-004Eh6-2X; Mon, 16 Jun 2025 20:15:24 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v8 2/6] drm: amdgpu: Create amdgpu_vm_print_task_info() Date: Mon, 16 Jun 2025 15:14:34 -0300 Message-ID: <20250616181438.2124656-3-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250616181438.2124656-1-andrealmeid@igalia.com> References: <20250616181438.2124656-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable To avoid repetitive code in amdgpu, create a function that prints the content of struct amdgpu_task_info. Signed-off-by: Andr=C3=A9 Almeida --- v8: drop the inline v7: new patch --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 3 +++ drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 5 +---- drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 5 +---- drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 5 +---- drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 4 +--- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 5 +---- 8 files changed, 18 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_job.c index 75262ce8db27..3d887428ca2b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -124,9 +124,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(stru= ct drm_sched_job *s_job) =20 ti =3D amdgpu_vm_get_task_info_pasid(ring->adev, job->pasid); if (ti) { - dev_err(adev->dev, - "Process information: process %s pid %d thread %s pid %d\n", - ti->process_name, ti->tgid, ti->task_name, ti->pid); + amdgpu_vm_print_task_info(adev, ti); amdgpu_vm_put_task_info(ti); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/a= mdgpu/amdgpu_vm.c index dadf5633476c..347a55376291 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -3159,3 +3159,12 @@ bool amdgpu_vm_is_bo_always_valid(struct amdgpu_vm *= vm, struct amdgpu_bo *bo) { return bo && bo->tbo.base.resv =3D=3D vm->root.bo->tbo.base.resv; } + +void amdgpu_vm_print_task_info(struct amdgpu_device *adev, + struct amdgpu_task_info *task_info) +{ + dev_err(adev->dev, + " Process %s pid %d thread %s pid %d\n", + task_info->process_name, task_info->tgid, + task_info->task_name, task_info->pid); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/a= mdgpu/amdgpu_vm.h index f3ad687125ad..9ec5d94200aa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -668,4 +668,7 @@ void amdgpu_vm_tlb_fence_create(struct amdgpu_device *a= dev, struct amdgpu_vm *vm, struct dma_fence **fence); =20 +void amdgpu_vm_print_task_info(struct amdgpu_device *adev, + struct amdgpu_task_info *task_info); + #endif diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/a= mdgpu/gmc_v10_0.c index a3e2787501f1..7923f491cf73 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c @@ -164,10 +164,7 @@ static int gmc_v10_0_process_interrupt(struct amdgpu_d= evice *adev, entry->src_id, entry->ring_id, entry->vmid, entry->pasid); task_info =3D amdgpu_vm_get_task_info_pasid(adev, entry->pasid); if (task_info) { - dev_err(adev->dev, - " in process %s pid %d thread %s pid %d\n", - task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + amdgpu_vm_print_task_info(adev, task_info); amdgpu_vm_put_task_info(task_info); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/a= mdgpu/gmc_v11_0.c index 72211409227b..f15d691e9a20 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c @@ -134,10 +134,7 @@ static int gmc_v11_0_process_interrupt(struct amdgpu_d= evice *adev, entry->src_id, entry->ring_id, entry->vmid, entry->pasid); task_info =3D amdgpu_vm_get_task_info_pasid(adev, entry->pasid); if (task_info) { - dev_err(adev->dev, - " in process %s pid %d thread %s pid %d)\n", - task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + amdgpu_vm_print_task_info(adev, task_info); amdgpu_vm_put_task_info(task_info); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c b/drivers/gpu/drm/amd/a= mdgpu/gmc_v12_0.c index b645d3e6a6c8..de763105fdfd 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c @@ -127,10 +127,7 @@ static int gmc_v12_0_process_interrupt(struct amdgpu_d= evice *adev, entry->src_id, entry->ring_id, entry->vmid, entry->pasid); task_info =3D amdgpu_vm_get_task_info_pasid(adev, entry->pasid); if (task_info) { - dev_err(adev->dev, - " in process %s pid %d thread %s pid %d)\n", - task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + amdgpu_vm_print_task_info(adev, task_info); amdgpu_vm_put_task_info(task_info); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/am= dgpu/gmc_v8_0.c index 99ca08e9bdb5..b45fa0cea9d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c @@ -1458,9 +1458,7 @@ static int gmc_v8_0_process_interrupt(struct amdgpu_d= evice *adev, =20 task_info =3D amdgpu_vm_get_task_info_pasid(adev, entry->pasid); if (task_info) { - dev_err(adev->dev, " for process %s pid %d thread %s pid %d\n", - task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + amdgpu_vm_print_task_info(adev, task_info); amdgpu_vm_put_task_info(task_info); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/am= dgpu/gmc_v9_0.c index 282197f4ffb1..78f65aea03f8 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -641,10 +641,7 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_de= vice *adev, =20 task_info =3D amdgpu_vm_get_task_info_pasid(adev, entry->pasid); if (task_info) { - dev_err(adev->dev, - " for process %s pid %d thread %s pid %d)\n", - task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + amdgpu_vm_print_task_info(adev, task_info); amdgpu_vm_put_task_info(task_info); } =20 --=20 2.49.0 From nobody Thu Oct 9 20:25:18 2025 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08CF51339A4 for ; Mon, 16 Jun 2025 18:15:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097739; cv=none; b=hg/CGzxtASuzE8v+ocs8NANj8ivjrCWIua0c0mSAloIZ0mH+pseKk3em8+8egZzSSeZqUacBU48WOEuvncYnonaHg3TGCNOWT/DpRF0QxMwnchgevojT/j0DY4VoLWzpqHU5hjHyBoN+kdh+ZA/R+guEzfqoeeHqJzESpSydc2w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097739; c=relaxed/simple; bh=2v9bmDfoi+U9lc4MgFzPyKK5AMpr4I9Wh765oVNRQhs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=C1SXFtDYtTAUIDixR/c6h1DjTdPpZxvHpOwYWeJLEwQ+nJi/zwi6MB4loBLoKCWiBm0B9UTV91rPfaPnwmNkrg+8nUETtCodQXgt+OolrjgOkhPhGbvuZxeSqxgF5c9D/uT2na/THPNN69OPxTjnb3ROhkrALjcboINImBJNgOc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=LI2hI0cL; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="LI2hI0cL" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=egqVkJr89695k6cbqDYsiYNvXLoUZ0Gdt1PL0OlA0Kg=; b=LI2hI0cLn0J6tonsHXHaBZvKsM o/pglMdtLc4IWJ/6cH3H7SiRy0YPIMde5JjKVtaguItObhRtpmB0kD2KI0yavgiZuhErOcXCcVBCo pzO9xaFVIvQBIHwqsu8YiJcen4F61EHpAF5uSnZRWrFuNoj+26Fz8yzpbCnZD9dFUCB2QjBHSAadt sDIS7cvEqfr7BulOP0gbuNH6P6uHkgpGyLGZKtJZfvODoY0ovT2V4LUl9k/qIgNd0o9OSSlDHS2EP 3typ0VRjL0Oras0bO3mJZkHf0iGvzQiVE48WVFHPhIs5xnG1KdyqAezbD58k/DdQIS7m1mJcIzAtd ZdmsQIIg==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uREMi-004Eh6-0n; Mon, 16 Jun 2025 20:15:28 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v8 3/6] drm: Create a task info option for wedge events Date: Mon, 16 Jun 2025 15:14:35 -0300 Message-ID: <20250616181438.2124656-4-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250616181438.2124656-1-andrealmeid@igalia.com> References: <20250616181438.2124656-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable When a device get wedged, it might be caused by a guilty application. For userspace, knowing which task was involved can be useful for some situations, like for implementing a policy, logs or for giving a chance for the compositor to let the user know what task was involved in the problem. This is an optional argument, when the task info is not available, the PID and TASK string won't appear in the event string. Sometimes just the PID isn't enough giving that the task might be already dead by the time userspace will try to check what was this PID's name, so to make the life easier also notify what's the task's name in the user event. Acked-by: Rodrigo Vivi (for i915 and xe) Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: Andr=C3=A9 Almeida --- v8: Code style changes (Raag) v7: - Change `char *comm` to `char comm[TASK_COMM_LEN]` v6: - s/cause/involved - drop string initialization v5: - s/app/task for struct and commit message as well - move defines to drm_drv.c - validates if comm is not NULL and it's not empty v4: s/APP/TASK v3: Make comm_string and pid_string empty when there's no app info --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- drivers/gpu/drm/drm_drv.c | 21 +++++++++++++++++---- drivers/gpu/drm/i915/gt/intel_reset.c | 3 ++- drivers/gpu/drm/xe/xe_device.c | 3 ++- include/drm/drm_device.h | 9 +++++++++ include/drm/drm_drv.h | 3 ++- 7 files changed, 34 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/a= md/amdgpu/amdgpu_device.c index e1bab6a96cb6..8a0f36f33f13 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -6364,7 +6364,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *a= dev, atomic_set(&adev->reset_domain->reset_res, r); =20 if (!r) - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE); + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); =20 return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_job.c index 3d887428ca2b..0c1381b527fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -164,7 +164,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(stru= ct drm_sched_job *s_job) if (amdgpu_ring_sched_ready(ring)) drm_sched_start(&ring->sched, 0); dev_err(adev->dev, "Ring %s reset succeeded\n", ring->sched.name); - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE); + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); goto exit; } dev_err(adev->dev, "Ring %s reset failure\n", ring->sched.name); diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 56dd61f8e05a..a994da9d9233 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -538,10 +539,15 @@ static const char *drm_get_wedge_recovery(unsigned in= t opt) } } =20 +#define WEDGE_STR_LEN 32 +#define PID_STR_LEN 15 +#define COMM_STR_LEN (TASK_COMM_LEN + 5) + /** * drm_dev_wedged_event - generate a device wedged uevent * @dev: DRM device * @method: method(s) to be used for recovery + * @info: optional information about the guilty task * * This generates a device wedged uevent for the DRM device specified by @= dev. * Recovery @method\(s) of choice will be sent in the uevent environment as @@ -554,13 +560,13 @@ static const char *drm_get_wedge_recovery(unsigned in= t opt) * * Returns: 0 on success, negative error code otherwise. */ -int drm_dev_wedged_event(struct drm_device *dev, unsigned long method) +int drm_dev_wedged_event(struct drm_device *dev, unsigned long method, + struct drm_wedge_task_info *info) { + char event_string[WEDGE_STR_LEN], pid_string[PID_STR_LEN], comm_string[CO= MM_STR_LEN]; + char *envp[] =3D { event_string, NULL, NULL, NULL }; const char *recovery =3D NULL; unsigned int len, opt; - /* Event string length up to 28+ characters with available methods */ - char event_string[32]; - char *envp[] =3D { event_string, NULL }; =20 len =3D scnprintf(event_string, sizeof(event_string), "%s", "WEDGED=3D"); =20 @@ -582,6 +588,13 @@ int drm_dev_wedged_event(struct drm_device *dev, unsig= ned long method) drm_info(dev, "device wedged, %s\n", method =3D=3D DRM_WEDGE_RECOVERY_NON= E ? "but recovered through reset" : "needs recovery"); =20 + if (info && (info->comm[0] !=3D '\0') && (info->pid >=3D 0)) { + snprintf(pid_string, sizeof(pid_string), "PID=3D%u", info->pid); + snprintf(comm_string, sizeof(comm_string), "TASK=3D%s", info->comm); + envp[1] =3D pid_string; + envp[2] =3D comm_string; + } + return kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp); } EXPORT_SYMBOL(drm_dev_wedged_event); diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/g= t/intel_reset.c index dbdcfe130ad4..ba1d8fdc3c7b 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -1448,7 +1448,8 @@ static void intel_gt_reset_global(struct intel_gt *gt, kobject_uevent_env(kobj, KOBJ_CHANGE, reset_done_event); else drm_dev_wedged_event(>->i915->drm, - DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET); + DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET, + NULL); } =20 /** diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index c02c4c4e9412..f329613e061f 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -1168,7 +1168,8 @@ void xe_device_declare_wedged(struct xe_device *xe) =20 /* Notify userspace of wedged device */ drm_dev_wedged_event(&xe->drm, - DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET); + DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET, + NULL); } =20 for_each_gt(gt, xe, id) diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h index e2f894f1b90a..08b3b2467c4c 100644 --- a/include/drm/drm_device.h +++ b/include/drm/drm_device.h @@ -5,6 +5,7 @@ #include #include #include +#include =20 #include =20 @@ -30,6 +31,14 @@ struct pci_controller; #define DRM_WEDGE_RECOVERY_REBIND BIT(1) /* unbind + bind driver */ #define DRM_WEDGE_RECOVERY_BUS_RESET BIT(2) /* unbind + reset bus device += bind */ =20 +/** + * struct drm_wedge_task_info - information about the guilty task of a wed= ge dev + */ +struct drm_wedge_task_info { + pid_t pid; + char comm[TASK_COMM_LEN]; +}; + /** * enum switch_power_state - power state of drm device */ diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h index 63b51942d606..3f76a32d6b84 100644 --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -487,7 +487,8 @@ void drm_put_dev(struct drm_device *dev); bool drm_dev_enter(struct drm_device *dev, int *idx); void drm_dev_exit(int idx); void drm_dev_unplug(struct drm_device *dev); -int drm_dev_wedged_event(struct drm_device *dev, unsigned long method); +int drm_dev_wedged_event(struct drm_device *dev, unsigned long method, + struct drm_wedge_task_info *info); =20 /** * drm_dev_is_unplugged - is a DRM device unplugged --=20 2.49.0 From nobody Thu Oct 9 20:25:18 2025 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4C61D288C1B for ; Mon, 16 Jun 2025 18:15:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097745; cv=none; b=CAXMcLoBpmwcygm9Yb0tnJSa3O+qhJwlEW8RSKU2ZVNVXuCEwZsGUwia2IPLlmRD0SYzy7JwB31EqVYi4c1wjlOoiTfUhk+Cgjl625yiH5tgDaBtys5W5pMmXFN31V8s9I/vOBDBrh93ELX73ShXBT/YnF1XtkUURZ3KKcqDOKI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097745; c=relaxed/simple; bh=vDHKuEB52NE3FyntkV/XWuE9A1twU7QMvf0KaC9hsyI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=W4ljzcJPrc0lmnY1qjcWbivvAaBc31pIWnj3gfbOebc/a2+SYdaGLWh0tiNwd5bCpe6H4F+YLVeVfKqS1pZHTcPbSZ/7A2W0Y6j6vHxx/IDLpLNqh25XG+QAOqkMjtIW/I2ytgaUxovfFpWfjThQVlVNOBn+yg/yYZghY9kCNv0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=O1oOHuAd; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="O1oOHuAd" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=5bW97YWvE2WNp84/m7LouGNWv9aZZ0XfiU91U9Q+7L4=; b=O1oOHuAdwoWhd8ipuTenXU2jCh nr6trAeQZBp/3x1Hwz8eFju84zVLSmyTfcK2HB3FzCVRm6Nc72bTh91m5CKJ9Cdc+obIVYtPOJM3N Hnjwia6VoTmXXYwukEz5UZUmxiW6KesPTrVmDRR5nraAuVBK1GTpaUVEFY8nCF373ht5vTT3JsI/L UB25psvarFYPpTt/JDQdMG69gquiCGdXpnrWN19neZxztQuh7vpXLKOSobyBuLmS3/Z2gMdkQyOOu E+eDe5YVkByBquhC8yGBILb0tEiHnj2NdlFv8at+/li9IHjL1z2fJy6WMPCucCpQZWzWr1nDRZXok vc5w3F6A==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uREMl-004Eh6-VT; Mon, 16 Jun 2025 20:15:32 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v8 4/6] drm/doc: Add a section about "Task information" for the wedge API Date: Mon, 16 Jun 2025 15:14:36 -0300 Message-ID: <20250616181438.2124656-5-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250616181438.2124656-1-andrealmeid@igalia.com> References: <20250616181438.2124656-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add a section about "Task information" for the wedge API. Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: Andr=C3=A9 Almeida --- v5: - Change app to task in the text as well v4: - Change APP to TASK v3: - Change "app that caused ..." to "app involved ..." - Clarify that devcoredump have more information about what happened - Update that PID and APP will be empty if there's no app info --- Documentation/gpu/drm-uapi.rst | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst index 4863a4deb0ee..263e5a97c080 100644 --- a/Documentation/gpu/drm-uapi.rst +++ b/Documentation/gpu/drm-uapi.rst @@ -446,6 +446,23 @@ telemetry information (devcoredump, syslog). This is u= seful because the first hang is usually the most critical one which can result in consequential ha= ngs or complete wedging. =20 +Task information +--------------- + +The information about which application (if any) was involved in the device +wedging is useful for userspace if they want to notify the user about what +happened (e.g. the compositor display a message to the user "The +caused a graphical error and the system recovered") or to implement polici= es +(e.g. the daemon may "ban" an task that keeps resetting the device). If th= e task +information is available, the uevent will display as ``PID=3D`` and +``TASK=3D``. Otherwise, ``PID`` and ``TASK`` will not appear in= the +event string. + +The reliability of this information is driver and hardware specific, and s= hould +be taken with a caution regarding it's precision. To have a big picture of= what +really happened, the devcoredump file provides should have much more detai= led +information about the device state and about the event. + Consumer prerequisites ---------------------- =20 --=20 2.49.0 From nobody Thu Oct 9 20:25:18 2025 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8840828A1C7 for ; Mon, 16 Jun 2025 18:15:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097747; cv=none; b=I+eVzx5mmb3jWqV9WI3UP31srqHc9QPoAjaGQQafejazQEya9w7+0B+QMSfTc9UrBecMfKICyCT/DB2BQ7ylAHWqVPVzHi6iWepjKu0nKhnxwtZNLPNwzu9ODgLx/9MW8qhHvF9ZwVfKJol1mMgynT9uRuNnvzbmLBpADGJ+MxM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097747; c=relaxed/simple; bh=G9oyoOAJuWWpPfwooLY35IeoqhhyDBGyfRkqcancUMM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=BobUn+U61nwnkWLPR1TIqO0xo9RIB3RRdnr7N0n61UPqXWrllX9vgZSpapzFv/uTbJ7qu2OG98Zu/USi3vChmi7nVunq3iE7aK9RBNS9ymrbGWDd1r1/3pICmJ9aga3FUExhCzF0+iG15Ts6AboENSQk3xweH0c360jlFEP++FM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=aqpsUcWQ; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="aqpsUcWQ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=NYH3P+o7dLXcYIY/W8InmFCvnsUko+8af5VEYYapQWY=; b=aqpsUcWQ7via5UPtn4QDc0pU9j aYX1wCqJ0fHiP3RDO1kXtLh9DpcqRzDmKGgXy/JKYGHn1TDo3wn4aCzT1Pj3UOnAmnI4zQ+W9y+3A qFBfz5IZWIBvCM2AfjYkhYiGTPFGd2j4WpV9w29Rk/NrZVLsDbrzEuWmxg+Qxz2ZL4eRN+xOz9Qk9 NAJjB6CSbX6PWWaA/geMgZxxI661nEJRshpuvEmhQ7MLSBu4RKiPMTB5fyXNi+tWuCi58jCAYvVz2 Mjb64MHC9QM8R0/YDxd46VsJmyCbeLCl51tV0XuhDKiBeymCxTswhQhvG69FlYZ7EEWAh5x81+kZ4 IF9KpBvQ==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uREMp-004Eh6-Tp; Mon, 16 Jun 2025 20:15:36 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v8 5/6] drm: amdgpu: Use struct drm_wedge_task_info inside of struct amdgpu_task_info Date: Mon, 16 Jun 2025 15:14:37 -0300 Message-ID: <20250616181438.2124656-6-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250616181438.2124656-1-andrealmeid@igalia.com> References: <20250616181438.2124656-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable To avoid a cast when calling drm_dev_wedged_event(), replace pid and task name inside of struct amdgpu_task_info with struct drm_wedge_task_info. Signed-off-by: Andr=C3=A9 Almeida Reviewed-by: Christian K=C3=B6nig --- v7: New patch --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 3 +-- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 8 ++++---- 9 files changed, 18 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/= amd/amdgpu/amdgpu_debugfs.c index 8e626f50b362..dac4b926e7be 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c @@ -1786,7 +1786,7 @@ static int amdgpu_debugfs_vm_info_show(struct seq_fil= e *m, void *unused) =20 ti =3D amdgpu_vm_get_task_info_vm(vm); if (ti) { - seq_printf(m, "pid:%d\tProcess:%s ----------\n", ti->pid, ti->process_n= ame); + seq_printf(m, "pid:%d\tProcess:%s ----------\n", ti->task.pid, ti->proc= ess_name); amdgpu_vm_put_task_info(ti); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c b/drivers/gpu= /drm/amd/amdgpu/amdgpu_dev_coredump.c index 7b50741dc097..8a026bc9ea44 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c @@ -220,10 +220,10 @@ amdgpu_devcoredump_read(char *buffer, loff_t offset, = size_t count, drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); =20 - if (coredump->reset_task_info.pid) + if (coredump->reset_task_info.task.pid) drm_printf(&p, "process_name: %s PID: %d\n", coredump->reset_task_info.process_name, - coredump->reset_task_info.pid); + coredump->reset_task_info.task.pid); =20 /* SOC Information */ drm_printf(&p, "\nSOC Information\n"); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_gem.c index 0ecc88df7208..e5e33a68d935 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -329,7 +329,7 @@ static int amdgpu_gem_object_open(struct drm_gem_object= *obj, =20 dev_warn(adev->dev, "validate_and_fence failed: %d\n", r); if (ti) { - dev_warn(adev->dev, "pid %d\n", ti->pid); + dev_warn(adev->dev, "pid %d\n", ti->task.pid); amdgpu_vm_put_task_info(ti); } } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/a= mdgpu/amdgpu_vm.c index 347a55376291..563be579fe09 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -622,7 +622,7 @@ int amdgpu_vm_validate(struct amdgpu_device *adev, stru= ct amdgpu_vm *vm, =20 pr_warn_ratelimited("Evicted user BO is not reserved\n"); if (ti) { - pr_warn_ratelimited("pid %d\n", ti->pid); + pr_warn_ratelimited("pid %d\n", ti->task.pid); amdgpu_vm_put_task_info(ti); } =20 @@ -2510,11 +2510,11 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm) if (!vm->task_info) return; =20 - if (vm->task_info->pid =3D=3D current->pid) + if (vm->task_info->task.pid =3D=3D current->pid) return; =20 - vm->task_info->pid =3D current->pid; - get_task_comm(vm->task_info->task_name, current); + vm->task_info->task.pid =3D current->pid; + get_task_comm(vm->task_info->task.comm, current); =20 if (current->group_leader->mm !=3D current->mm) return; @@ -2777,7 +2777,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struc= t amdgpu_vm *vm) =20 dev_warn(adev->dev, "VM memory stats for proc %s(%d) task %s(%d) is non-zero when fini\n", - ti->process_name, ti->pid, ti->task_name, ti->tgid); + ti->process_name, ti->task.pid, ti->task.comm, ti->tgid); } =20 amdgpu_vm_put_task_info(vm->task_info); @@ -3166,5 +3166,5 @@ void amdgpu_vm_print_task_info(struct amdgpu_device *= adev, dev_err(adev->dev, " Process %s pid %d thread %s pid %d\n", task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + task_info->task.comm, task_info->task.pid); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/a= mdgpu/amdgpu_vm.h index 9ec5d94200aa..fd086efd8457 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -236,9 +236,8 @@ struct amdgpu_vm_pte_funcs { }; =20 struct amdgpu_task_info { + struct drm_wedge_task_info task; char process_name[TASK_COMM_LEN]; - char task_name[TASK_COMM_LEN]; - pid_t pid; pid_t tgid; struct kref refcount; }; diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/a= mdgpu/sdma_v4_0.c index 33ed2b158fcd..f38004e6064e 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -2187,7 +2187,7 @@ static int sdma_v4_0_print_iv_entry(struct amdgpu_dev= ice *adev, dev_dbg_ratelimited(adev->dev, " for process %s pid %d thread %s pid %d\n", task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + task_info->task.comm, task_info->task.pid); amdgpu_vm_put_task_info(task_info); } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd= /amdgpu/sdma_v4_4_2.c index 9c169112a5e7..bcde34e4e0a1 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c @@ -1884,7 +1884,7 @@ static int sdma_v4_4_2_print_iv_entry(struct amdgpu_d= evice *adev, if (task_info) { dev_dbg_ratelimited(adev->dev, " for process %s pid %d thread %s pid %d\= n", task_info->process_name, task_info->tgid, - task_info->task_name, task_info->pid); + task_info->task.comm, task_info->task.pid); amdgpu_vm_put_task_info(task_info); } =20 diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/= amdkfd/kfd_events.c index 2b294ada3ec0..82905f3e54dd 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -1302,7 +1302,7 @@ void kfd_signal_reset_event(struct kfd_node *dev) if (ti) { dev_err(dev->adev->dev, "Queues reset on process %s tid %d thread %s pid %d\n", - ti->process_name, ti->tgid, ti->task_name, ti->pid); + ti->process_name, ti->tgid, ti->task.comm, ti->task.pid); amdgpu_vm_put_task_info(ti); } } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/= amd/amdkfd/kfd_smi_events.c index 83d9384ac815..a499449fcb06 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c @@ -253,9 +253,9 @@ void kfd_smi_event_update_vmfault(struct kfd_node *dev,= uint16_t pasid) task_info =3D amdgpu_vm_get_task_info_pasid(dev->adev, pasid); if (task_info) { /* Report VM faults from user applications, not retry from kernel */ - if (task_info->pid) + if (task_info->task.pid) kfd_smi_event_add(0, dev, KFD_SMI_EVENT_VMFAULT, KFD_EVENT_FMT_VMFAULT( - task_info->pid, task_info->task_name)); + task_info->task.pid, task_info->task.comm)); amdgpu_vm_put_task_info(task_info); } } @@ -359,8 +359,8 @@ void kfd_smi_event_process(struct kfd_process_device *p= dd, bool start) kfd_smi_event_add(0, pdd->dev, start ? KFD_SMI_EVENT_PROCESS_START : KFD_SMI_EVENT_PROCESS_END, - KFD_EVENT_FMT_PROCESS(task_info->pid, - task_info->task_name)); + KFD_EVENT_FMT_PROCESS(task_info->task.pid, + task_info->task.comm)); amdgpu_vm_put_task_info(task_info); } } --=20 2.49.0 From nobody Thu Oct 9 20:25:18 2025 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13BCC28A1EA for ; Mon, 16 Jun 2025 18:15:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097752; cv=none; b=ALhWqotHMDPJ22iNtxWC0d7HKPw8X9XoerlbBaRbRLaE+afJGZ6dZucWP74ABJ25ChTKu4OBq9a0pHBTk+joyiezG/ZhO+D+Og1e77tnBDDbtp3d5uitf9mw0l6I5Iv1sTrNM4SRWq5pTWbN9fxoMx1uuXTlbOli1A+U4gWXQJs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097752; c=relaxed/simple; bh=MBXvbiv09WFTedILbln+GxGD4d0ugksV6zXN5X3pQqg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HtOoo5l6QZT0EZ7YvcU/5nxpjbEb8uyV5Tq6llOkgHk8Q+MX8hHfiJem12/JbS/ICMX+cqR8Tbhr95hl4UlaxG+nN9ewKDk7b38q7FkEbQDoDh/LHlZXRVR5nqSnZ6LBJjMaT7QkhCh4S4LhdOjaOshhLXw0tBAfS7H2FMIl8og= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=S6vJATHD; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="S6vJATHD" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=HoiH7l3fENLBTE2ibK2tEdCTvd0Zntd1kdiJj4+rKlE=; b=S6vJATHDbgMv1X0XOriLBSRSPr 1c+rADVNGTiXsY7wmFw8PvxO87yFr5t/U/aye+LWMIzMBIrEAsDzB5iCrkEpA+DunEZFW2L4WPbX8 GXFxUFyLOfockJgonTV2GxnCwo0sADSI+s7vJnyU9sXk/EizLoObNB0juT4nADlpEhElf9tF7AqPV hezfQdsdS463Sydd0eipLI+UHqUwC8j7k8kAZ3EtpJWkyocuGwULDAxAaR3SRCXHL5TZlymJi1Bw4 F2v48woVuzz97qroNmp+LaAKu2jNMnezO/KkIds51IjHwxgDH78ka8oKEr4NT/Iq+McZ2sD/Jdgi+ ecEoyphg==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uREMt-004Eh6-Rx; Mon, 16 Jun 2025 20:15:40 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v8 6/6] drm/amdgpu: Make use of drm_wedge_task_info Date: Mon, 16 Jun 2025 15:14:38 -0300 Message-ID: <20250616181438.2124656-7-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250616181438.2124656-1-andrealmeid@igalia.com> References: <20250616181438.2124656-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable To notify userspace about which task (if any) made the device get in a wedge state, make use of drm_wedge_task_info parameter, filling it with the task PID and name. Signed-off-by: Andr=C3=A9 Almeida --- v8: - Drop check before calling amdgpu_vm_put_task_info() - Drop local variable `info` v7: - Remove struct cast, now we can use `info =3D &ti->task` - Fix struct lifetime, move amdgpu_vm_put_task_info() after drm_dev_wedged_event() call --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 +++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 7 +++++-- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/a= md/amdgpu/amdgpu_device.c index 8a0f36f33f13..a59f194e3360 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -6363,8 +6363,17 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *= adev, =20 atomic_set(&adev->reset_domain->reset_res, r); =20 - if (!r) - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); + if (!r) { + struct amdgpu_task_info *ti =3D NULL; + + if (job) + ti =3D amdgpu_vm_get_task_info_pasid(adev, job->pasid); + + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, + ti ? &ti->task : NULL); + + amdgpu_vm_put_task_info(ti); + } =20 return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_job.c index 0c1381b527fe..1e24590ae144 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -89,6 +89,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct= drm_sched_job *s_job) { struct amdgpu_ring *ring =3D to_amdgpu_ring(s_job->sched); struct amdgpu_job *job =3D to_amdgpu_job(s_job); + struct drm_wedge_task_info *info =3D NULL; struct amdgpu_task_info *ti; struct amdgpu_device *adev =3D ring->adev; int idx; @@ -125,7 +126,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(stru= ct drm_sched_job *s_job) ti =3D amdgpu_vm_get_task_info_pasid(ring->adev, job->pasid); if (ti) { amdgpu_vm_print_task_info(adev, ti); - amdgpu_vm_put_task_info(ti); + info =3D &ti->task; } =20 /* attempt a per ring reset */ @@ -164,13 +165,15 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(st= ruct drm_sched_job *s_job) if (amdgpu_ring_sched_ready(ring)) drm_sched_start(&ring->sched, 0); dev_err(adev->dev, "Ring %s reset succeeded\n", ring->sched.name); - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, info); goto exit; } dev_err(adev->dev, "Ring %s reset failure\n", ring->sched.name); } dma_fence_set_error(&s_job->s_fence->finished, -ETIME); =20 + amdgpu_vm_put_task_info(ti); + if (amdgpu_device_should_recover_gpu(ring->adev)) { struct amdgpu_reset_context reset_context; memset(&reset_context, 0, sizeof(reset_context)); --=20 2.49.0