From nobody Fri Oct 10 02:48:07 2025 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08CF51339A4 for ; Mon, 16 Jun 2025 18:15:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097739; cv=none; b=hg/CGzxtASuzE8v+ocs8NANj8ivjrCWIua0c0mSAloIZ0mH+pseKk3em8+8egZzSSeZqUacBU48WOEuvncYnonaHg3TGCNOWT/DpRF0QxMwnchgevojT/j0DY4VoLWzpqHU5hjHyBoN+kdh+ZA/R+guEzfqoeeHqJzESpSydc2w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750097739; c=relaxed/simple; bh=2v9bmDfoi+U9lc4MgFzPyKK5AMpr4I9Wh765oVNRQhs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=C1SXFtDYtTAUIDixR/c6h1DjTdPpZxvHpOwYWeJLEwQ+nJi/zwi6MB4loBLoKCWiBm0B9UTV91rPfaPnwmNkrg+8nUETtCodQXgt+OolrjgOkhPhGbvuZxeSqxgF5c9D/uT2na/THPNN69OPxTjnb3ROhkrALjcboINImBJNgOc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=LI2hI0cL; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="LI2hI0cL" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=egqVkJr89695k6cbqDYsiYNvXLoUZ0Gdt1PL0OlA0Kg=; b=LI2hI0cLn0J6tonsHXHaBZvKsM o/pglMdtLc4IWJ/6cH3H7SiRy0YPIMde5JjKVtaguItObhRtpmB0kD2KI0yavgiZuhErOcXCcVBCo pzO9xaFVIvQBIHwqsu8YiJcen4F61EHpAF5uSnZRWrFuNoj+26Fz8yzpbCnZD9dFUCB2QjBHSAadt sDIS7cvEqfr7BulOP0gbuNH6P6uHkgpGyLGZKtJZfvODoY0ovT2V4LUl9k/qIgNd0o9OSSlDHS2EP 3typ0VRjL0Oras0bO3mJZkHf0iGvzQiVE48WVFHPhIs5xnG1KdyqAezbD58k/DdQIS7m1mJcIzAtd ZdmsQIIg==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uREMi-004Eh6-0n; Mon, 16 Jun 2025 20:15:28 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v8 3/6] drm: Create a task info option for wedge events Date: Mon, 16 Jun 2025 15:14:35 -0300 Message-ID: <20250616181438.2124656-4-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250616181438.2124656-1-andrealmeid@igalia.com> References: <20250616181438.2124656-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable When a device get wedged, it might be caused by a guilty application. For userspace, knowing which task was involved can be useful for some situations, like for implementing a policy, logs or for giving a chance for the compositor to let the user know what task was involved in the problem. This is an optional argument, when the task info is not available, the PID and TASK string won't appear in the event string. Sometimes just the PID isn't enough giving that the task might be already dead by the time userspace will try to check what was this PID's name, so to make the life easier also notify what's the task's name in the user event. Acked-by: Rodrigo Vivi (for i915 and xe) Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: Andr=C3=A9 Almeida --- v8: Code style changes (Raag) v7: - Change `char *comm` to `char comm[TASK_COMM_LEN]` v6: - s/cause/involved - drop string initialization v5: - s/app/task for struct and commit message as well - move defines to drm_drv.c - validates if comm is not NULL and it's not empty v4: s/APP/TASK v3: Make comm_string and pid_string empty when there's no app info --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- drivers/gpu/drm/drm_drv.c | 21 +++++++++++++++++---- drivers/gpu/drm/i915/gt/intel_reset.c | 3 ++- drivers/gpu/drm/xe/xe_device.c | 3 ++- include/drm/drm_device.h | 9 +++++++++ include/drm/drm_drv.h | 3 ++- 7 files changed, 34 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/a= md/amdgpu/amdgpu_device.c index e1bab6a96cb6..8a0f36f33f13 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -6364,7 +6364,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *a= dev, atomic_set(&adev->reset_domain->reset_res, r); =20 if (!r) - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE); + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); =20 return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_job.c index 3d887428ca2b..0c1381b527fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -164,7 +164,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(stru= ct drm_sched_job *s_job) if (amdgpu_ring_sched_ready(ring)) drm_sched_start(&ring->sched, 0); dev_err(adev->dev, "Ring %s reset succeeded\n", ring->sched.name); - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE); + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL); goto exit; } dev_err(adev->dev, "Ring %s reset failure\n", ring->sched.name); diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 56dd61f8e05a..a994da9d9233 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -538,10 +539,15 @@ static const char *drm_get_wedge_recovery(unsigned in= t opt) } } =20 +#define WEDGE_STR_LEN 32 +#define PID_STR_LEN 15 +#define COMM_STR_LEN (TASK_COMM_LEN + 5) + /** * drm_dev_wedged_event - generate a device wedged uevent * @dev: DRM device * @method: method(s) to be used for recovery + * @info: optional information about the guilty task * * This generates a device wedged uevent for the DRM device specified by @= dev. * Recovery @method\(s) of choice will be sent in the uevent environment as @@ -554,13 +560,13 @@ static const char *drm_get_wedge_recovery(unsigned in= t opt) * * Returns: 0 on success, negative error code otherwise. */ -int drm_dev_wedged_event(struct drm_device *dev, unsigned long method) +int drm_dev_wedged_event(struct drm_device *dev, unsigned long method, + struct drm_wedge_task_info *info) { + char event_string[WEDGE_STR_LEN], pid_string[PID_STR_LEN], comm_string[CO= MM_STR_LEN]; + char *envp[] =3D { event_string, NULL, NULL, NULL }; const char *recovery =3D NULL; unsigned int len, opt; - /* Event string length up to 28+ characters with available methods */ - char event_string[32]; - char *envp[] =3D { event_string, NULL }; =20 len =3D scnprintf(event_string, sizeof(event_string), "%s", "WEDGED=3D"); =20 @@ -582,6 +588,13 @@ int drm_dev_wedged_event(struct drm_device *dev, unsig= ned long method) drm_info(dev, "device wedged, %s\n", method =3D=3D DRM_WEDGE_RECOVERY_NON= E ? "but recovered through reset" : "needs recovery"); =20 + if (info && (info->comm[0] !=3D '\0') && (info->pid >=3D 0)) { + snprintf(pid_string, sizeof(pid_string), "PID=3D%u", info->pid); + snprintf(comm_string, sizeof(comm_string), "TASK=3D%s", info->comm); + envp[1] =3D pid_string; + envp[2] =3D comm_string; + } + return kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp); } EXPORT_SYMBOL(drm_dev_wedged_event); diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/g= t/intel_reset.c index dbdcfe130ad4..ba1d8fdc3c7b 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -1448,7 +1448,8 @@ static void intel_gt_reset_global(struct intel_gt *gt, kobject_uevent_env(kobj, KOBJ_CHANGE, reset_done_event); else drm_dev_wedged_event(>->i915->drm, - DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET); + DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET, + NULL); } =20 /** diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index c02c4c4e9412..f329613e061f 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -1168,7 +1168,8 @@ void xe_device_declare_wedged(struct xe_device *xe) =20 /* Notify userspace of wedged device */ drm_dev_wedged_event(&xe->drm, - DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET); + DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET, + NULL); } =20 for_each_gt(gt, xe, id) diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h index e2f894f1b90a..08b3b2467c4c 100644 --- a/include/drm/drm_device.h +++ b/include/drm/drm_device.h @@ -5,6 +5,7 @@ #include #include #include +#include =20 #include =20 @@ -30,6 +31,14 @@ struct pci_controller; #define DRM_WEDGE_RECOVERY_REBIND BIT(1) /* unbind + bind driver */ #define DRM_WEDGE_RECOVERY_BUS_RESET BIT(2) /* unbind + reset bus device += bind */ =20 +/** + * struct drm_wedge_task_info - information about the guilty task of a wed= ge dev + */ +struct drm_wedge_task_info { + pid_t pid; + char comm[TASK_COMM_LEN]; +}; + /** * enum switch_power_state - power state of drm device */ diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h index 63b51942d606..3f76a32d6b84 100644 --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -487,7 +487,8 @@ void drm_put_dev(struct drm_device *dev); bool drm_dev_enter(struct drm_device *dev, int *idx); void drm_dev_exit(int idx); void drm_dev_unplug(struct drm_device *dev); -int drm_dev_wedged_event(struct drm_device *dev, unsigned long method); +int drm_dev_wedged_event(struct drm_device *dev, unsigned long method, + struct drm_wedge_task_info *info); =20 /** * drm_dev_is_unplugged - is a DRM device unplugged --=20 2.49.0