From nobody Sun Feb 8 23:52:20 2026 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3981321ADBA for ; Mon, 19 May 2025 22:04:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747692271; cv=none; b=RkWA4VH55CT8bGLz3WcY592qP0enwKTcPbzLw7Xdm676JBWrBGN0+KglBUeAR/SZd1bNUj9N2nwyxzz/1rw3vD/leOrFLItYIeYHgcP2ceJorMBFslMnh+T8+W44/ZkInSBSQki9D6FXs+p6ge2aefBhlrKKqsxyRFS67MpHcYI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747692271; c=relaxed/simple; bh=Z2kqy6Azg4UkGDM2qCvNHNm1XO0MgN6yBjNfjjLHZLA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HOZCkJ3TjFsAUrd0T2/PnBkYsZP0mblklZXxmpyuQqZjKNsVYFoa7Sa+XDXMnKTLOOmX6pEsOsT1ot+5O2onK4T48pH8oJ3M1uz8BozJlgqVCsNo7IbI5CBoH61sMwJ9w67XZFqCVu5cO4lYIQv11ZF5jfeQXJbAQAdQruVE8WA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=VyUo8ezK; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="VyUo8ezK" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=2FiBmkmymEqG8Rhn8cQi/OoTfYGGUfC6+jfy8P7z1Zg=; b=VyUo8ezK2mUGLHilgo2bNiVnbr i5b5v5IaghGN1e7jWCdDGZd7Km/r8bNvlbAhcRQz34n4QcpSNq2XWhGIMGBRu01Poa5COx0zlzyls gzEkN2aoF63gXdqeA6wgCI/MAeoyNEclLdAPUgV1PWZ4w/Nf8y1tH8n/21ZptkgAwoSirzD769+Vv 1RcGgekVeJvEjNJfpfhtkEGYvcM6DcLG1xIrFY41Ry105aqTTd6t3zvvW5AReL11lIiy6aFgSKdH3 o8tsena2otrWPJtCPBA5LiN1m2gszbzgMdIBkjXRY1vlAKtdHBsavhLrIfSTU3oAx1mzP3EigLWp+ M+rEgSdg==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uH8ak-00AQwj-1S; Tue, 20 May 2025 00:04:14 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v4 1/3] drm: Create an app info option for wedge events Date: Mon, 19 May 2025 19:03:30 -0300 Message-ID: <20250519220333.101355-2-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250519220333.101355-1-andrealmeid@igalia.com> References: <20250519220333.101355-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable When a device get wedged, it might be caused by a guilty application. For userspace, knowing which app was the cause can be useful for some situations, like for implementing a policy, logs or for giving a chance for the compositor to let the user know what app caused the problem. This is an optional argument, when the app info is not available, the PID and TASK string won't appear in the event string. Sometimes just the PID isn't enough giving that the app might be already dead by the time userspace will try to check what was this PID's name, so to make the life easier also notify what's the app's name in the user event. Acked-by: Rodrigo Vivi (for i915 and xe) Reviewed-by: Krzysztof Karas Signed-off-by: Andr=C3=A9 Almeida --- v4: s/APP/TASK v3: Make comm_string and pid_string empty when there's no app info --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- drivers/gpu/drm/drm_drv.c | 16 ++++++++++++---- drivers/gpu/drm/i915/gt/intel_reset.c | 3 ++- drivers/gpu/drm/xe/xe_device.c | 3 ++- include/drm/drm_device.h | 11 +++++++++++ include/drm/drm_drv.h | 3 ++- 7 files changed, 31 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/a= md/amdgpu/amdgpu_device.c index 4d1b54f58495..d27091d5929c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -6363,7 +6363,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *a= dev, atomic_set(&adev->reset_domain->reset_res, r); =20 if (!r) - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE); + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); =20 return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_job.c index acb21fc8b3ce..a47b2eb301e5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -166,7 +166,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(stru= ct drm_sched_job *s_job) if (amdgpu_ring_sched_ready(ring)) drm_sched_start(&ring->sched, 0); dev_err(adev->dev, "Ring %s reset succeeded\n", ring->sched.name); - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE); + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); goto exit; } dev_err(adev->dev, "Ring %s reset failure\n", ring->sched.name); diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 3dc7acd56b1d..c428d05a8e7c 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -542,6 +542,7 @@ static const char *drm_get_wedge_recovery(unsigned int = opt) * drm_dev_wedged_event - generate a device wedged uevent * @dev: DRM device * @method: method(s) to be used for recovery + * @info: optional information about the guilty app * * This generates a device wedged uevent for the DRM device specified by @= dev. * Recovery @method\(s) of choice will be sent in the uevent environment as @@ -554,13 +555,13 @@ static const char *drm_get_wedge_recovery(unsigned in= t opt) * * Returns: 0 on success, negative error code otherwise. */ -int drm_dev_wedged_event(struct drm_device *dev, unsigned long method) +int drm_dev_wedged_event(struct drm_device *dev, unsigned long method, + struct drm_wedge_app_info *info) { const char *recovery =3D NULL; unsigned int len, opt; - /* Event string length up to 28+ characters with available methods */ - char event_string[32]; - char *envp[] =3D { event_string, NULL }; + char event_string[WEDGE_STR_LEN], pid_string[PID_LEN] =3D "", comm_string= [TASK_COMM_LEN] =3D ""; + char *envp[] =3D { event_string, NULL, NULL, NULL }; =20 len =3D scnprintf(event_string, sizeof(event_string), "%s", "WEDGED=3D"); =20 @@ -582,6 +583,13 @@ int drm_dev_wedged_event(struct drm_device *dev, unsig= ned long method) drm_info(dev, "device wedged, %s\n", method =3D=3D DRM_WEDGE_RECOVERY_NON= E ? "but recovered through reset" : "needs recovery"); =20 + if (info) { + snprintf(pid_string, sizeof(pid_string), "PID=3D%u", info->pid); + snprintf(comm_string, sizeof(comm_string), "TASK=3D%s", info->comm); + envp[1] =3D pid_string; + envp[2] =3D comm_string; + } + return kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp); } EXPORT_SYMBOL(drm_dev_wedged_event); diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/g= t/intel_reset.c index dbdcfe130ad4..ba1d8fdc3c7b 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -1448,7 +1448,8 @@ static void intel_gt_reset_global(struct intel_gt *gt, kobject_uevent_env(kobj, KOBJ_CHANGE, reset_done_event); else drm_dev_wedged_event(>->i915->drm, - DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET); + DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET, + NULL); } =20 /** diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index c02c4c4e9412..f329613e061f 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -1168,7 +1168,8 @@ void xe_device_declare_wedged(struct xe_device *xe) =20 /* Notify userspace of wedged device */ drm_dev_wedged_event(&xe->drm, - DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET); + DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET, + NULL); } =20 for_each_gt(gt, xe, id) diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h index e2f894f1b90a..8a3368dfc7f0 100644 --- a/include/drm/drm_device.h +++ b/include/drm/drm_device.h @@ -30,6 +30,17 @@ struct pci_controller; #define DRM_WEDGE_RECOVERY_REBIND BIT(1) /* unbind + bind driver */ #define DRM_WEDGE_RECOVERY_BUS_RESET BIT(2) /* unbind + reset bus device += bind */ =20 +#define WEDGE_STR_LEN 32 +#define PID_LEN 15 + +/** + * struct drm_wedge_app_info - information about the guilty app of a wedge= dev + */ +struct drm_wedge_app_info { + pid_t pid; + char *comm; +}; + /** * enum switch_power_state - power state of drm device */ diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h index a43d707b5f36..8fc6412a6345 100644 --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -482,7 +482,8 @@ void drm_put_dev(struct drm_device *dev); bool drm_dev_enter(struct drm_device *dev, int *idx); void drm_dev_exit(int idx); void drm_dev_unplug(struct drm_device *dev); -int drm_dev_wedged_event(struct drm_device *dev, unsigned long method); +int drm_dev_wedged_event(struct drm_device *dev, unsigned long method, + struct drm_wedge_app_info *info); =20 /** * drm_dev_is_unplugged - is a DRM device unplugged --=20 2.49.0 From nobody Sun Feb 8 23:52:20 2026 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F0AC21A92F for ; Mon, 19 May 2025 22:04:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747692271; cv=none; b=Ql6CLNzQ5jHSzvPdhUYYUe0XE5dqkh2upUwtPMrrHHXIis6bh3R/OyqaYbLTDtmTNQyvAHHEZ5awDehcj7EkjLMQm6hb6ViCXPyuN4Y2D7ovNeHJ+3JCKPw6hceBcuCqRUCvbAmadA79HUIN+i9IHfUS358wh7bO13FJvcUuAmY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747692271; c=relaxed/simple; bh=slbv+24brnjH3p6t3FO9486g8JCJ3wnIP6kvoCSj/Xw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=TJ7IrSVF34aOem5QTu0idF1oBmNcD8pCBEpoNPvqYPki4u4xR920sNKn+0IVWhVcGv1DodyWySo1yUDsDJOlv6m3PL50J//8ztryh/wPGbuV2yE/72LBrdszOpk2ZKd+lAYF5Y7AW4pGb9ZW6vbzZ5io28nLMLKlwSJgkx0m36w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=aOJ/oWJD; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="aOJ/oWJD" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=2S6oKaT48+py3QKNAk/C2dlDi8LQZkQI/D8jYucUO44=; b=aOJ/oWJD2hCBCHkNiusVmXyDW+ cOsSA8Ua9nTt6Q2l+9m7hRtPWhFtdPxYPIyLwtRSHIAET4lLy5A0KIlNTh8zH1V3oRlt6USa7VuZf 5HDA8cVZExkhluIPm6fCxyRvRnyd04XD5/+GNrTKlp97lIiBo+S0VrmH9H4iBBqoRQCYxcn799MwQ Yeq11aRYtZDitD1OtCkMl3kkechKnC8vEHEEwTxSGciEwJDstzyNyjuRpKtFp6TV5ao4gLJ758a5w nWuJWnFXAvOHfXzytSr9b5E2L5v8cuPPJWliolAnNqGarwZrsSe2mfa8pwHX3z43YYDwwaeRHqKmx /3MFkWZg==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uH8ao-00AQwj-0R; Tue, 20 May 2025 00:04:18 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v4 2/3] drm/doc: Add a section about "App information" for the wedge API Date: Mon, 19 May 2025 19:03:31 -0300 Message-ID: <20250519220333.101355-3-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250519220333.101355-1-andrealmeid@igalia.com> References: <20250519220333.101355-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add a section about "App information" for the wedge API. Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: Andr=C3=A9 Almeida --- v4: - Change APP to TASK v3: - Change "app that caused ..." to "app involved ..." - Clarify that devcoredump have more information about what happened - Update that PID and APP will be empty if there's no app info --- Documentation/gpu/drm-uapi.rst | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst index 69f72e71a96e..37ae4beab160 100644 --- a/Documentation/gpu/drm-uapi.rst +++ b/Documentation/gpu/drm-uapi.rst @@ -446,6 +446,23 @@ telemetry information (devcoredump, syslog). This is u= seful because the first hang is usually the most critical one which can result in consequential ha= ngs or complete wedging. =20 +App information +--------------- + +The information about which application (if any) was involved in the device +wedging is useful for userspace if they want to notify the user about what +happened (e.g. the compositor display a message to the user "The +caused a graphical error and the system recovered") or to implement polici= es +(e.g. the daemon may "ban" an app that keeps resetting the device). If the= app +information is available, the uevent will display as ``PID=3D`` and +``TASK=3D``. Otherwise, ``PID`` and ``TASK`` will not appear in= the +event string. + +The reliability of this information is driver and hardware specific, and s= hould +be taken with a caution regarding it's precision. To have a big picture of= what +really happened, the devcoredump file provides should have much more detai= led +information about the device state and about the event. + Consumer prerequisites ---------------------- =20 --=20 2.49.0 From nobody Sun Feb 8 23:52:20 2026 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCEA621CC64 for ; Mon, 19 May 2025 22:04:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747692273; cv=none; b=uY/OGDP/IDXF1wRY7n3z9eWfbOemZ+WT+2HJm526CmvnBTPE5B1KrLK6EM3B1ZPqU+m11fsyadxNJKeSnGcgB1aAB0lKvU9EC5kXbEKej/bOtxrErVzWqsjsjxx8SiWvvI+6q6jpljDVPYtk2cxi99+56tKy7e/t6zdICA2IFEg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747692273; c=relaxed/simple; bh=Hr4BFAnPBzDhnF9zIIjeBj2jN1Bq7e6xSHMx6NW4qz0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DxJHfy5LpH2iNE72jEU7vGA+/QqFtrqYw96ups7Us6uB0KfwJp0TWlBRks1sdCzwb7sd/bAzqfd/BdoDxYtjKfBD5aTIqTrixuHz/hpMaqK4oKhpOAmfrAJ/7ALAPHKRMZbcWgWaKiTfB4PR2SYBp5RzK698syco+8GBdAG96us= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=TkWfIoFs; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="TkWfIoFs" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=6htrZa/SZWK2YDvdxZWgpg4Nj5HaKkzm469XyRbIpvA=; b=TkWfIoFs4UUP6wmyJTMRCyYLEF ftGxP6DFDjPib5zChOxd1CY16kfKGlun35wgkhyVkysNLB/SiCVtgUgDbSK/jgiUPnrCKTTPIMPz/ 6UGd2NXUwRleieyJV7f7p0WoqSaEOlIehWFXqBpRzkPIeu4NB2rjzRD1OPwL7TcgTPGbLQ3aGpdxE TN9T2PmhaRjI0Wkz6kX4ul4eIh7pMQ9gw1xPr8yU2jLAEP43nTdbnisckIIwyhuTN6poCVqmdBkVN Z/ZwkXzp+Bsb1kOgrIewVqAnzhMbzof2aGpcz6K4bPYAaW9eWH5C8+ChxVq+BUnJM5/glQkNkQJ17 c+CM+GKg==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uH8ar-00AQwj-Va; Tue, 20 May 2025 00:04:22 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v4 3/3] drm/amdgpu: Make use of drm_wedge_app_info Date: Mon, 19 May 2025 19:03:32 -0300 Message-ID: <20250519220333.101355-4-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250519220333.101355-1-andrealmeid@igalia.com> References: <20250519220333.101355-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable To notify userspace about which app (if any) made the device get in a wedge state, make use of drm_wedge_app_info parameter, filling it with the app PID and name. Signed-off-by: Andr=C3=A9 Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 +++++- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/a= md/amdgpu/amdgpu_device.c index d27091d5929c..81bd3b1db129 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -6362,8 +6362,23 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *= adev, =20 atomic_set(&adev->reset_domain->reset_res, r); =20 - if (!r) - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); + if (!r) { + struct drm_wedge_app_info aux, *info =3D NULL; + + if (job) { + struct amdgpu_task_info *ti; + + ti =3D amdgpu_vm_get_task_info_pasid(adev, job->pasid); + if (ti) { + aux.pid =3D ti->pid; + aux.comm =3D ti->process_name; + info =3D &aux; + amdgpu_vm_put_task_info(ti); + } + } + + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, info); + } =20 return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_job.c index a47b2eb301e5..98efa3318ddb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -89,6 +89,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct= drm_sched_job *s_job) { struct amdgpu_ring *ring =3D to_amdgpu_ring(s_job->sched); struct amdgpu_job *job =3D to_amdgpu_job(s_job); + struct drm_wedge_app_info aux, *info =3D NULL; struct amdgpu_task_info *ti; struct amdgpu_device *adev =3D ring->adev; int idx; @@ -127,6 +128,9 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(stru= ct drm_sched_job *s_job) dev_err(adev->dev, "Process information: process %s pid %d thread %s pid %d\n", ti->process_name, ti->tgid, ti->task_name, ti->pid); + aux.pid =3D ti->pid; + aux.comm =3D ti->process_name; + info =3D &aux; amdgpu_vm_put_task_info(ti); } =20 @@ -166,7 +170,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(stru= ct drm_sched_job *s_job) if (amdgpu_ring_sched_ready(ring)) drm_sched_start(&ring->sched, 0); dev_err(adev->dev, "Ring %s reset succeeded\n", ring->sched.name); - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, info); goto exit; } dev_err(adev->dev, "Ring %s reset failure\n", ring->sched.name); --=20 2.49.0