From nobody Sun Feb 8 14:59:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8223C001B0 for ; Fri, 14 Jul 2023 16:11:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236104AbjGNQLs (ORCPT ); Fri, 14 Jul 2023 12:11:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235849AbjGNQLo (ORCPT ); Fri, 14 Jul 2023 12:11:44 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94A4A35AF for ; Fri, 14 Jul 2023 09:11:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=KtDjAFCYmByWLVDPHSoFeYlAFDkrSIvmPO75H9eHTbY=; b=rJnKXAw1+3q0gl0C3d2t6P2Oht mDcQ2CfCmX5gtDkRmE3JCXvNdNJ0jOiYl+n1VapFUJ1uwYgocYVpHexb9RgVo5Nt9pEsQisOG/gF3 PV8vzun43MciS18ey5TP5ovNR0l7VAq/5dsxtj8ccZk2YCbrEIseWl/FOyHUho8CqgkQ0jiOHnBum 3MyGwPaLnERhwWWkDlr92GA/hO7nK8wJougbw4jUyMOfdppbiD/SajrwAX1A4QjTWqxd0YeoWyIy/ un9ORvYtS/PGPnMqjV6rkmViqe1Qcf2Sfu0hZUd4HMVS2PoinOyXKHB45vgTNNDBSerSJ7avGDlAa nk8IkHYg==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qKLOL-00Eaot-8q; Fri, 14 Jul 2023 18:11:37 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?UTF-8?q?=27Marek=20Ol=C5=A1=C3=A1k=27?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?UTF-8?q?Timur=20Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v3 1/5] drm/amdgpu: Create a module param to disable soft recovery Date: Fri, 14 Jul 2023 13:11:24 -0300 Message-ID: <20230714161128.69545-2-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714161128.69545-1-andrealmeid@igalia.com> References: <20230714161128.69545-1-andrealmeid@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Create a module parameter to disable soft recoveries on amdgpu, making every recovery go through the device reset path. This option makes easier to force device resets for testing and debugging purposes. Signed-off-by: Andr=C3=A9 Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +++++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdg= pu/amdgpu.h index a84bd4a0c421..dbe062a087c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -189,6 +189,7 @@ extern uint amdgpu_force_long_training; extern int amdgpu_lbpw; extern int amdgpu_compute_multipipe; extern int amdgpu_gpu_recovery; +extern bool amdgpu_soft_recovery; extern int amdgpu_emu_mode; extern uint amdgpu_smu_memory_pool_size; extern int amdgpu_smu_pptable_id; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_drv.c index 3b711babd4e2..7c69f3169aa6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -163,6 +163,7 @@ uint amdgpu_force_long_training; int amdgpu_lbpw =3D -1; int amdgpu_compute_multipipe =3D -1; int amdgpu_gpu_recovery =3D -1; /* auto */ +bool amdgpu_soft_recovery =3D true; int amdgpu_emu_mode; uint amdgpu_smu_memory_pool_size; int amdgpu_smu_pptable_id =3D -1; @@ -540,6 +541,14 @@ module_param_named(compute_multipipe, amdgpu_compute_m= ultipipe, int, 0444); MODULE_PARM_DESC(gpu_recovery, "Enable GPU recovery mechanism, (1 =3D enab= le, 0 =3D disable, -1 =3D auto)"); module_param_named(gpu_recovery, amdgpu_gpu_recovery, int, 0444); =20 +/** + * DOC: gpu_soft_recovery (bool) + * Set true to allow the driver to try soft recoveries if a job get stuck.= Set + * to false to always force a GPU reset during recovery. + */ +MODULE_PARM_DESC(gpu_soft_recovery, "Enable GPU soft recovery mechanism (d= efault: true)"); +module_param_named(gpu_soft_recovery, amdgpu_soft_recovery, bool, 0644); + /** * DOC: emu_mode (int) * Set value 1 to enable emulation mode. This is only needed when running = on an emulator. The default is 0 (disabled). diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd= /amdgpu/amdgpu_ring.c index 80d6e132e409..40678d9fb17e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *rin= g, unsigned int vmid, struct dma_fence *fence) { unsigned long flags; + ktime_t deadline; =20 - ktime_t deadline =3D ktime_add_us(ktime_get(), 10000); + if (!amdgpu_soft_recovery) + return false; + + deadline =3D ktime_add_us(ktime_get(), 10000); =20 if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence) return false; --=20 2.41.0 From nobody Sun Feb 8 14:59:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAFCCC001DE for ; Fri, 14 Jul 2023 16:11:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236223AbjGNQL4 (ORCPT ); Fri, 14 Jul 2023 12:11:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236113AbjGNQLp (ORCPT ); Fri, 14 Jul 2023 12:11:45 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 115F635B0 for ; Fri, 14 Jul 2023 09:11:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=zIgp80+eQBdnIXRtmt80Wa1T4Fgh2Xesu535mvlv8DU=; b=PDiktgmjGmtw6OSOZXfulTKykc xdQ3jsP7z7Dzm/1e0LRtHjTaMQGCZBUeyt6RpWtqv+w6cLAyqkpPChY6E28V1osqIFgBkxFIGz5zP b/kLHpoASocNpL2ckKWFjfcJsZV9iF3rA54QT7qL7O+c0rO7Squd7+WzcL8N35UTYyKOkQPoE+OBx tgkWYRf31Lv/+BYhIqbpvIodSRWho4ixm8+OVABxz85BBTPkKh2Qyf4VI4Ca0kKdzWIq8vZecWt8j TmyG2FvPTjy7udp6w7Uiu1t04wfFt94O6Vv6gx7AnBchHAmPkcevhDicvJwJwgEPMnHeDECcJ/xIa vGBu6gKA==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qKLOO-00Eaot-HJ; Fri, 14 Jul 2023 18:11:40 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?UTF-8?q?=27Marek=20Ol=C5=A1=C3=A1k=27?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?UTF-8?q?Timur=20Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v3 2/5] drm/amdgpu: Allocate coredump memory in a nonblocking way Date: Fri, 14 Jul 2023 13:11:25 -0300 Message-ID: <20230714161128.69545-3-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714161128.69545-1-andrealmeid@igalia.com> References: <20230714161128.69545-1-andrealmeid@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org During a GPU reset, a normal memory reclaim could block to reclaim memory. Giving that coredump is a best effort mechanism, it shouldn't disturb the reset path. Change its memory allocation flag to a nonblocking one. Signed-off-by: Andr=C3=A9 Almeida Reviewed-by: Christian K=C3=B6nig --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/a= md/amdgpu/amdgpu_device.c index e25f085ee886..a824f844a984 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5011,7 +5011,7 @@ static void amdgpu_reset_capture_coredumpm(struct amd= gpu_device *adev) struct drm_device *dev =3D adev_to_drm(adev); =20 ktime_get_ts64(&adev->reset_time); - dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_KERNEL, + dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT, amdgpu_devcoredump_read, amdgpu_devcoredump_free); } #endif --=20 2.41.0 From nobody Sun Feb 8 14:59:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3BEAC001E0 for ; Fri, 14 Jul 2023 16:12:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236160AbjGNQMA (ORCPT ); Fri, 14 Jul 2023 12:12:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236086AbjGNQLr (ORCPT ); Fri, 14 Jul 2023 12:11:47 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 08FF235A0 for ; Fri, 14 Jul 2023 09:11:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=FYm525OzqEwsfjbYhxH+ScM7pgsJfEGWNcZAh5wLS3k=; b=O0Pz+D1zGlAo70EzsAnYx7DHd1 cxrGiUTWWZFzj2ZBWtwd7vplocp/e4vKH9eOrSPfbdu/BICMX9MFXigPvEERENvwuGsWhw9jYIguy OyTaYkC6Q67CNHB1o/5PrdYJB60vWx9m+LbBDYCZ2RI88Ur3OCO4z/YABKEgbNipANOkoUvjYEClk +iBy4UKdaEnolq4B6CvJNtk4aN5Tf9ex/aqdtyhfLX3YqZ8sPBpJytFYgsielLyJ4jvHFxelFXFq3 rd5yhCGzXQssZdC0eDAM/ajma6hAuo/5wNuoS1l+oie495N2Apwd1cyDbqrEsujuRE4J5P0orA0MK J1DGHVEA==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qKLOR-00Eaot-Q3; Fri, 14 Jul 2023 18:11:44 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?UTF-8?q?=27Marek=20Ol=C5=A1=C3=A1k=27?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?UTF-8?q?Timur=20Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v3 3/5] drm/amdgpu: Rework coredump to use memory dynamically Date: Fri, 14 Jul 2023 13:11:26 -0300 Message-ID: <20230714161128.69545-4-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714161128.69545-1-andrealmeid@igalia.com> References: <20230714161128.69545-1-andrealmeid@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Instead of storing coredump information inside amdgpu_device struct, move if to a proper separated struct and allocate it dynamically. This will make it easier to further expand the logged information. Signed-off-by: Andr=C3=A9 Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 14 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 65 ++++++++++++++-------- 2 files changed, 51 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdg= pu/amdgpu.h index dbe062a087c5..e1cc83a89d46 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1068,11 +1068,6 @@ struct amdgpu_device { uint32_t *reset_dump_reg_list; uint32_t *reset_dump_reg_value; int num_regs; -#ifdef CONFIG_DEV_COREDUMP - struct amdgpu_task_info reset_task_info; - bool reset_vram_lost; - struct timespec64 reset_time; -#endif =20 bool scpm_enabled; uint32_t scpm_status; @@ -1085,6 +1080,15 @@ struct amdgpu_device { uint32_t aid_mask; }; =20 +#ifdef CONFIG_DEV_COREDUMP +struct amdgpu_coredump_info { + struct amdgpu_device *adev; + struct amdgpu_task_info reset_task_info; + struct timespec64 reset_time; + bool reset_vram_lost; +}; +#endif + static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) { return container_of(ddev, struct amdgpu_device, ddev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/a= md/amdgpu/amdgpu_device.c index a824f844a984..e80670420586 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4963,12 +4963,17 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_dev= ice *adev) return 0; } =20 -#ifdef CONFIG_DEV_COREDUMP +#ifndef CONFIG_DEV_COREDUMP +static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) +{ +} +#else static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, size_t count, void *data, size_t datalen) { struct drm_printer p; - struct amdgpu_device *adev =3D data; + struct amdgpu_coredump_info *coredump =3D data; struct drm_print_iterator iter; int i; =20 @@ -4982,21 +4987,21 @@ static ssize_t amdgpu_devcoredump_read(char *buffer= , loff_t offset, drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); drm_printf(&p, "kernel: " UTS_RELEASE "\n"); drm_printf(&p, "module: " KBUILD_MODNAME "\n"); - drm_printf(&p, "time: %lld.%09ld\n", adev->reset_time.tv_sec, adev->reset= _time.tv_nsec); - if (adev->reset_task_info.pid) + drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredum= p->reset_time.tv_nsec); + if (coredump->reset_task_info.pid) drm_printf(&p, "process_name: %s PID: %d\n", - adev->reset_task_info.process_name, - adev->reset_task_info.pid); + coredump->reset_task_info.process_name, + coredump->reset_task_info.pid); =20 - if (adev->reset_vram_lost) + if (coredump->reset_vram_lost) drm_printf(&p, "VRAM is lost due to GPU reset!\n"); - if (adev->num_regs) { + if (coredump->adev->num_regs) { drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n"); =20 - for (i =3D 0; i < adev->num_regs; i++) + for (i =3D 0; i < coredump->adev->num_regs; i++) drm_printf(&p, "0x%08x: 0x%08x\n", - adev->reset_dump_reg_list[i], - adev->reset_dump_reg_value[i]); + coredump->adev->reset_dump_reg_list[i], + coredump->adev->reset_dump_reg_value[i]); } =20 return count - iter.remain; @@ -5004,14 +5009,34 @@ static ssize_t amdgpu_devcoredump_read(char *buffer= , loff_t offset, =20 static void amdgpu_devcoredump_free(void *data) { + kfree(data); } =20 -static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev) +static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) { + struct amdgpu_coredump_info *coredump; struct drm_device *dev =3D adev_to_drm(adev); =20 - ktime_get_ts64(&adev->reset_time); - dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT, + coredump =3D kmalloc(sizeof(*coredump), GFP_NOWAIT); + + if (!coredump) { + DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__); + return; + } + + memset(coredump, 0, sizeof(*coredump)); + + coredump->reset_vram_lost =3D vram_lost; + + if (reset_context->job && reset_context->job->vm) + coredump->reset_task_info =3D reset_context->job->vm->task_info; + + coredump->adev =3D adev; + + ktime_get_ts64(&coredump->reset_time); + + dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT, amdgpu_devcoredump_read, amdgpu_devcoredump_free); } #endif @@ -5119,15 +5144,9 @@ int amdgpu_do_asic_reset(struct list_head *device_li= st_handle, goto out; =20 vram_lost =3D amdgpu_device_check_vram_lost(tmp_adev); -#ifdef CONFIG_DEV_COREDUMP - tmp_adev->reset_vram_lost =3D vram_lost; - memset(&tmp_adev->reset_task_info, 0, - sizeof(tmp_adev->reset_task_info)); - if (reset_context->job && reset_context->job->vm) - tmp_adev->reset_task_info =3D - reset_context->job->vm->task_info; - amdgpu_reset_capture_coredumpm(tmp_adev); -#endif + + amdgpu_coredump(tmp_adev, vram_lost, reset_context); + if (vram_lost) { DRM_INFO("VRAM is lost due to GPU reset!\n"); amdgpu_inc_vram_lost(tmp_adev); --=20 2.41.0 From nobody Sun Feb 8 14:59:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38D92C001DC for ; Fri, 14 Jul 2023 16:12:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236287AbjGNQME (ORCPT ); Fri, 14 Jul 2023 12:12:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235965AbjGNQLu (ORCPT ); Fri, 14 Jul 2023 12:11:50 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63F393585 for ; Fri, 14 Jul 2023 09:11:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=v6Si+/vYSjh6Zgr29qvXCcz14aY2TCvPaaAg6UKiqIk=; b=oz8NRg3ICEXUYCEwpGn4fBV5vJ ewk1bPk3cduBY7P5jA61RLA91E2jl9ajpL9AttAsV7fJeAPPrkY2AWBAsXdRUQLYkgRXGcHUfOhra iUBuLUYMFqIqANxgv7q/vl5vm0oxwKomzt0DIjPpPDJXVkMDPs3WV8NZqWEtvQlA6DjUp8eEoRK8F +Q/wmEuLLPpx5EtrIAsWh+gQyR5QMpiO2i2LfG/2tZmV30m2VHdpXDA0jBSS+WJVlxu+v2pX+Pj/r nRgWwsYJOIxsqKiANwTQM+ON+UB34RfYAguQPlscinY3qOFvrjsIE03vd9EAm7MFxY27bLy1vbzjN glupqFWw==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qKLOV-00Eaot-56; Fri, 14 Jul 2023 18:11:47 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?UTF-8?q?=27Marek=20Ol=C5=A1=C3=A1k=27?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?UTF-8?q?Timur=20Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v3 4/5] drm/amdgpu: Move coredump code to amdgpu_reset file Date: Fri, 14 Jul 2023 13:11:27 -0300 Message-ID: <20230714161128.69545-5-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714161128.69545-1-andrealmeid@igalia.com> References: <20230714161128.69545-1-andrealmeid@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Giving that we use codedump just for device resets, move it's functions and structs to a more semantic file, the amdgpu_reset.{c, h}. Signed-off-by: Andr=C3=A9 Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 80 ---------------------- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 78 +++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 11 +++ 4 files changed, 89 insertions(+), 89 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdg= pu/amdgpu.h index e1cc83a89d46..1e76cb38a554 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1080,15 +1080,6 @@ struct amdgpu_device { uint32_t aid_mask; }; =20 -#ifdef CONFIG_DEV_COREDUMP -struct amdgpu_coredump_info { - struct amdgpu_device *adev; - struct amdgpu_task_info reset_task_info; - struct timespec64 reset_time; - bool reset_vram_lost; -}; -#endif - static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) { return container_of(ddev, struct amdgpu_device, ddev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/a= md/amdgpu/amdgpu_device.c index e80670420586..e84d499aaf4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -32,8 +32,6 @@ #include #include #include -#include -#include #include #include =20 @@ -4963,84 +4961,6 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_devi= ce *adev) return 0; } =20 -#ifndef CONFIG_DEV_COREDUMP -static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, - struct amdgpu_reset_context *reset_context) -{ -} -#else -static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, - size_t count, void *data, size_t datalen) -{ - struct drm_printer p; - struct amdgpu_coredump_info *coredump =3D data; - struct drm_print_iterator iter; - int i; - - iter.data =3D buffer; - iter.offset =3D 0; - iter.start =3D offset; - iter.remain =3D count; - - p =3D drm_coredump_printer(&iter); - - drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); - drm_printf(&p, "kernel: " UTS_RELEASE "\n"); - drm_printf(&p, "module: " KBUILD_MODNAME "\n"); - drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredum= p->reset_time.tv_nsec); - if (coredump->reset_task_info.pid) - drm_printf(&p, "process_name: %s PID: %d\n", - coredump->reset_task_info.process_name, - coredump->reset_task_info.pid); - - if (coredump->reset_vram_lost) - drm_printf(&p, "VRAM is lost due to GPU reset!\n"); - if (coredump->adev->num_regs) { - drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n"); - - for (i =3D 0; i < coredump->adev->num_regs; i++) - drm_printf(&p, "0x%08x: 0x%08x\n", - coredump->adev->reset_dump_reg_list[i], - coredump->adev->reset_dump_reg_value[i]); - } - - return count - iter.remain; -} - -static void amdgpu_devcoredump_free(void *data) -{ - kfree(data); -} - -static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, - struct amdgpu_reset_context *reset_context) -{ - struct amdgpu_coredump_info *coredump; - struct drm_device *dev =3D adev_to_drm(adev); - - coredump =3D kmalloc(sizeof(*coredump), GFP_NOWAIT); - - if (!coredump) { - DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__); - return; - } - - memset(coredump, 0, sizeof(*coredump)); - - coredump->reset_vram_lost =3D vram_lost; - - if (reset_context->job && reset_context->job->vm) - coredump->reset_task_info =3D reset_context->job->vm->task_info; - - coredump->adev =3D adev; - - ktime_get_ts64(&coredump->reset_time); - - dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT, - amdgpu_devcoredump_read, amdgpu_devcoredump_free); -} -#endif - int amdgpu_do_asic_reset(struct list_head *device_list_handle, struct amdgpu_reset_context *reset_context) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/am= d/amdgpu/amdgpu_reset.c index eec41ad30406..081cdf3bc267 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c @@ -21,6 +21,9 @@ * */ =20 +#include +#include + #include "amdgpu_reset.h" #include "aldebaran.h" #include "sienna_cichlid.h" @@ -167,5 +170,80 @@ void amdgpu_device_unlock_reset_domain(struct amdgpu_r= eset_domain *reset_domain) up_write(&reset_domain->sem); } =20 +#ifndef CONFIG_DEV_COREDUMP +void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) +{ +} +#else +static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, + size_t count, void *data, size_t datalen) +{ + struct drm_printer p; + struct amdgpu_coredump_info *coredump =3D data; + struct drm_print_iterator iter; + int i; + + iter.data =3D buffer; + iter.offset =3D 0; + iter.start =3D offset; + iter.remain =3D count; + + p =3D drm_coredump_printer(&iter); + + drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); + drm_printf(&p, "kernel: " UTS_RELEASE "\n"); + drm_printf(&p, "module: " KBUILD_MODNAME "\n"); + drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredum= p->reset_time.tv_nsec); + if (coredump->reset_task_info.pid) + drm_printf(&p, "process_name: %s PID: %d\n", + coredump->reset_task_info.process_name, + coredump->reset_task_info.pid); + + if (coredump->reset_vram_lost) + drm_printf(&p, "VRAM is lost due to GPU reset!\n"); + if (coredump->adev->num_regs) { + drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n"); + + for (i =3D 0; i < coredump->adev->num_regs; i++) + drm_printf(&p, "0x%08x: 0x%08x\n", + coredump->adev->reset_dump_reg_list[i], + coredump->adev->reset_dump_reg_value[i]); + } + + return count - iter.remain; +} + +static void amdgpu_devcoredump_free(void *data) +{ + kfree(data); +} + +void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) +{ + struct amdgpu_coredump_info *coredump; + struct drm_device *dev =3D adev_to_drm(adev); + + coredump =3D kmalloc(sizeof(*coredump), GFP_NOWAIT); + + if (!coredump) { + DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__); + return; + } + + memset(coredump, 0, sizeof(*coredump)); + + coredump->reset_vram_lost =3D vram_lost; + + if (reset_context->job && reset_context->job->vm) + coredump->reset_task_info =3D reset_context->job->vm->task_info; =20 + coredump->adev =3D adev; =20 + ktime_get_ts64(&coredump->reset_time); + + dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT, + amdgpu_devcoredump_read, amdgpu_devcoredump_free); +} +#endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/am= d/amdgpu/amdgpu_reset.h index f4a501ff87d9..362954521721 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h @@ -87,6 +87,15 @@ struct amdgpu_reset_domain { atomic_t reset_res; }; =20 +#ifdef CONFIG_DEV_COREDUMP +struct amdgpu_coredump_info { + struct amdgpu_device *adev; + struct amdgpu_task_info reset_task_info; + struct timespec64 reset_time; + bool reset_vram_lost; +}; +#endif + =20 int amdgpu_reset_init(struct amdgpu_device *adev); int amdgpu_reset_fini(struct amdgpu_device *adev); @@ -126,4 +135,6 @@ void amdgpu_device_lock_reset_domain(struct amdgpu_rese= t_domain *reset_domain); =20 void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain *reset_d= omain); =20 +void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context); #endif --=20 2.41.0 From nobody Sun Feb 8 14:59:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B233EB64DC for ; Fri, 14 Jul 2023 16:12:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236113AbjGNQMH (ORCPT ); Fri, 14 Jul 2023 12:12:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44238 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236203AbjGNQLy (ORCPT ); Fri, 14 Jul 2023 12:11:54 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AD132D64 for ; Fri, 14 Jul 2023 09:11:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=nhGZVSWSn6mtbEKzKTGtlDNdAdMZQ2RiJK0D2JyHWac=; b=aw2mqb3UjXpyfRVOdST/PkQaM5 5XCYIhALq+vuob1227YOvJY1fwX53CPTbpRCcDcFwdH/vqKyLvOVM7igY/SxbCIQLJiuXVzsSgeVw p0VVW96j83R96Iu3Es8w1xmWyzhlJBeNSKA/pwlTsiKu/WO4U52vw0+Psgood5EDxlCHBN0WBToV6 6PlaoIMr8wqli8W7/uBoQ4afSC0+btg5n0LFG8MwwRdpsUZnv4bmFnz+ZT4DdlFp+NPL7E27l8EKo VVpXVasoY6KlTZCklisRLKyqjL2RNblCV27obQHZv1cOih2jIDaaJpQ77UvCk7Pdv0AzAvbxopE6X CBhO6Ogw==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qKLOY-00Eaot-Co; Fri, 14 Jul 2023 18:11:50 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?UTF-8?q?=27Marek=20Ol=C5=A1=C3=A1k=27?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?UTF-8?q?Timur=20Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v3 5/5] drm/amdgpu: Create version number for coredumps Date: Fri, 14 Jul 2023 13:11:28 -0300 Message-ID: <20230714161128.69545-6-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714161128.69545-1-andrealmeid@igalia.com> References: <20230714161128.69545-1-andrealmeid@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Even if there's nothing currently parsing amdgpu's coredump files, if we eventually have such tools they will be glad to find a version field to properly read the file. Create a version number to be displayed on top of coredump file, to be incremented when the file format or content get changed. Signed-off-by: Andr=C3=A9 Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +++ 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/am= d/amdgpu/amdgpu_reset.c index 081cdf3bc267..dab385f9dd80 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c @@ -192,6 +192,7 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, lo= ff_t offset, p =3D drm_coredump_printer(&iter); =20 drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); + drm_printf(&p, "version: " AMDGPU_COREDUMP_VERSION "\n"); drm_printf(&p, "kernel: " UTS_RELEASE "\n"); drm_printf(&p, "module: " KBUILD_MODNAME "\n"); drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredum= p->reset_time.tv_nsec); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/am= d/amdgpu/amdgpu_reset.h index 362954521721..7b6767ca8127 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h @@ -88,6 +88,9 @@ struct amdgpu_reset_domain { }; =20 #ifdef CONFIG_DEV_COREDUMP + +#define AMDGPU_COREDUMP_VERSION "1" + struct amdgpu_coredump_info { struct amdgpu_device *adev; struct amdgpu_task_info reset_task_info; --=20 2.41.0