From nobody Sun Feb 8 16:33:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42541C83F01 for ; Wed, 30 Aug 2023 22:45:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344155AbjH3WpH (ORCPT ); Wed, 30 Aug 2023 18:45:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344112AbjH3WpG (ORCPT ); Wed, 30 Aug 2023 18:45:06 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 511C8CC9 for ; Wed, 30 Aug 2023 15:44:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=iXbgzIwp8rvKvb+sVe7Pc3bEJ83K6W453b2caD8OllI=; b=ljQ0fh15pra/XnwnRgif+KHO2L ZMIxi3MjLyMzjrYaYEn7v5Ds7/WaV6sug3wKAzUhAPBlMXmY8L2BTew2lktjwn5a06Q/Xa0SGt/gY cqYWPjKTLpuxYaOro+FT23qNsFoL4Ka+/gC5AICS6u9tGZOLhhQNHk6NYfcoEA81wvnLdREPWvdur 5Ced3epGSKWCBM2d1jz8u23adEbf22Kcv5ExIdshV0U5CLl6lVQj7tHSLgjWjI4KzZsXrXqUMJQFd CnastXPgsBPHYDMH6MLJmK2L8cwLcTE11UFFF0FjboN8fEvFMVc7J3GRI64BdVTYpIeJsOgSrEaIO CukN6R6w==; Received: from [191.193.15.45] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qbTMH-00Ha7K-B3; Thu, 31 Aug 2023 00:08:17 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?UTF-8?q?=27Marek=20Ol=C5=A1=C3=A1k=27?= , =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v2 1/2] drm/amdgpu: Merge debug module parameters Date: Wed, 30 Aug 2023 19:08:07 -0300 Message-ID: <20230830220808.421935-2-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230830220808.421935-1-andrealmeid@igalia.com> References: <20230830220808.421935-1-andrealmeid@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Merge all developer debug options available as separated module parameters in one, making it obvious that are for developers. Drop the obsolete module options in favor of the new ones. Signed-off-by: Andr=C3=A9 Almeida Acked-by: Felix Kuehling --- v2: - drop old module params - use BIT() macros - replace global var with adev-> vars --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 48 ++++++++++++++---------- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +- drivers/gpu/drm/amd/include/amd_shared.h | 8 ++++ 8 files changed, 45 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdg= pu/amdgpu.h index 4de074243c4d..82eaccfce347 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1101,6 +1101,10 @@ struct amdgpu_device { bool dc_enabled; /* Mask of active clusters */ uint32_t aid_mask; + + /* Debug */ + bool debug_vm; + bool debug_largebar; }; =20 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/a= mdgpu/amdgpu_cs.c index fb78a8f47587..8a26bed76505 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1191,7 +1191,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_par= ser *p) job->vm_pd_addr =3D amdgpu_gmc_pd_addr(vm->root.bo); } =20 - if (amdgpu_vm_debug) { + if (adev->debug_vm) { /* Invalidate all BOs to test for userspace bugs */ amdgpu_bo_list_for_each_entry(e, p->bo_list) { struct amdgpu_bo *bo =3D ttm_to_amdgpu_bo(e->tv.bo); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_drv.c index f5856b82605e..0cd48c025433 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -140,7 +140,6 @@ int amdgpu_vm_size =3D -1; int amdgpu_vm_fragment_size =3D -1; int amdgpu_vm_block_size =3D -1; int amdgpu_vm_fault_stop; -int amdgpu_vm_debug; int amdgpu_vm_update_mode =3D -1; int amdgpu_exp_hw_support; int amdgpu_dc =3D -1; @@ -194,6 +193,7 @@ int amdgpu_use_xgmi_p2p =3D 1; int amdgpu_vcnfw_log; int amdgpu_sg_display =3D -1; /* auto */ int amdgpu_user_partt_mode =3D AMDGPU_AUTO_COMPUTE_PARTITION_MODE; +uint amdgpu_debug_mask; =20 static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work= ); =20 @@ -405,13 +405,6 @@ module_param_named(vm_block_size, amdgpu_vm_block_size= , int, 0444); MODULE_PARM_DESC(vm_fault_stop, "Stop on VM fault (0 =3D never (default), = 1 =3D print first, 2 =3D always)"); module_param_named(vm_fault_stop, amdgpu_vm_fault_stop, int, 0444); =20 -/** - * DOC: vm_debug (int) - * Debug VM handling (0 =3D disabled, 1 =3D enabled). The default is 0 (Di= sabled). - */ -MODULE_PARM_DESC(vm_debug, "Debug VM handling (0 =3D disabled (default), 1= =3D enabled)"); -module_param_named(vm_debug, amdgpu_vm_debug, int, 0644); - /** * DOC: vm_update_mode (int) * Override VM update mode. VM updated by using CPU (0 =3D never, 1 =3D Gr= aphics only, 2 =3D Compute only, 3 =3D Both). The default @@ -743,18 +736,6 @@ module_param(send_sigterm, int, 0444); MODULE_PARM_DESC(send_sigterm, "Send sigterm to HSA process on unhandled exception (0 =3D disable, 1 =3D= enable)"); =20 -/** - * DOC: debug_largebar (int) - * Set debug_largebar as 1 to enable simulating large-bar capability on no= n-large bar - * system. This limits the VRAM size reported to ROCm applications to the = visible - * size, usually 256MB. - * Default value is 0, diabled. - */ -int debug_largebar; -module_param(debug_largebar, int, 0444); -MODULE_PARM_DESC(debug_largebar, - "Debug large-bar flag used to simulate large-bar capability on non-large = bar machine (0 =3D disable, 1 =3D enable)"); - /** * DOC: halt_if_hws_hang (int) * Halt if HWS hang is detected. Default value, 0, disables the halt on ha= ng. @@ -938,6 +919,18 @@ module_param_named(user_partt_mode, amdgpu_user_partt_= mode, uint, 0444); module_param(enforce_isolation, bool, 0444); MODULE_PARM_DESC(enforce_isolation, "enforce process isolation between gra= phics and compute . enforce_isolation =3D on"); =20 +/** + * DOC: debug_mask (uint) + * Debug options for amdgpu, work as a binary mask with the following opti= ons: + * + * - 0x1: Debug VM handling + * - 0x2: Enable simulating large-bar capability on non-large bar system. = This + * limits the VRAM size reported to ROCm applications to the visible + * size, usually 256MB. + */ +MODULE_PARM_DESC(debug_mask, "debug options for amdgpu, disabled by defaul= t"); +module_param_named(debug_mask, amdgpu_debug_mask, uint, 0444); + /* These devices are not supported by amdgpu. * They are supported by the mach64, r128, radeon drivers */ @@ -2042,6 +2035,19 @@ static void amdgpu_get_secondary_funcs(struct amdgpu= _device *adev) } } =20 +static void amdgpu_init_debug_options(struct amdgpu_device *adev) +{ + if (amdgpu_debug_mask & AMDGPU_DEBUG_VM) { + pr_info("debug: VM handling debug enabled\n"); + adev->debug_vm =3D true; + } + + if (amdgpu_debug_mask & AMDGPU_DEBUG_LARGEBAR) { + pr_info("debug: enabled simulating large-bar capability on non-large bar= system\n"); + adev->debug_largebar =3D true; + } +} + static int amdgpu_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) { @@ -2220,6 +2226,8 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, amdgpu_get_secondary_funcs(adev); } =20 + amdgpu_init_debug_options(adev); + return 0; =20 err_pci: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_gem.c index 09203e22b026..548e65f2db5f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -794,7 +794,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *d= ata, default: break; } - if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) && !amdgpu_vm_debug) + if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) && !adev->debug_vm) amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va, args->operation); =20 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/a= mdgpu/amdgpu_vm.c index 74380b21e7a5..d483cd9c612a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1407,7 +1407,7 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev, spin_unlock(&vm->status_lock); =20 /* Try to reserve the BO to avoid clearing its ptes */ - if (!amdgpu_vm_debug && dma_resv_trylock(resv)) + if (!adev->debug_vm && dma_resv_trylock(resv)) clear =3D false; /* Somebody else is using the BO right now */ else diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd= /amdkfd/kfd_chardev.c index 3b8f592384fa..41ac2ec936c3 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1021,7 +1021,7 @@ static int kfd_ioctl_acquire_vm(struct file *filep, s= truct kfd_process *p, =20 bool kfd_dev_is_large_bar(struct kfd_node *dev) { - if (debug_largebar) { + if (dev->kfd->adev->debug_largebar) { pr_debug("Simulate large-bar allocation on non large-bar machine\n"); return true; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/am= dkfd/kfd_crat.c index 2e9612cf56ae..b05e06f89814 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -2115,7 +2115,7 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_ima= ge, sub_type_hdr =3D (typeof(sub_type_hdr))((char *)sub_type_hdr + sub_type_hdr->length); =20 - if (debug_largebar) + if (kdev->adev->debug_largebar) local_mem_info.local_mem_size_private =3D 0; =20 if (local_mem_info.local_mem_size_private =3D=3D 0) diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd= /include/amd_shared.h index 67d7b7ee8a2a..2fd6af2183cc 100644 --- a/drivers/gpu/drm/amd/include/amd_shared.h +++ b/drivers/gpu/drm/amd/include/amd_shared.h @@ -257,6 +257,14 @@ enum DC_DEBUG_MASK { =20 enum amd_dpm_forced_level; =20 +/* + * amdgpu.debug module options. Are all disabled by default + */ +enum AMDGPU_DEBUG_MASK { + AMDGPU_DEBUG_VM =3D BIT(0), + AMDGPU_DEBUG_LARGEBAR =3D BIT(1), +}; + /** * struct amd_ip_funcs - general hooks for managing amdgpu IP Blocks * @name: Name of IP block --=20 2.41.0 From nobody Sun Feb 8 16:33:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1ED1C83F01 for ; Wed, 30 Aug 2023 22:10:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245167AbjH3WKs (ORCPT ); Wed, 30 Aug 2023 18:10:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245157AbjH3WKo (ORCPT ); Wed, 30 Aug 2023 18:10:44 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72B67CDB for ; Wed, 30 Aug 2023 15:10:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=s5b3JIGKv+IRuR82jvSbbKiUEIXZQN02jOcNSvcB+ks=; b=qdUuglmNpHEjwddBO35XMHmZx1 7hlpnViD1e+/lld25QZ8+ik0F94EQ6Uk1rLRYD2QFG2nPHeGw/Uu9p2gcRP+FEUxCn0tGVBlS7RQt WWv5X/k5YY0ZRjZyiWkl5Uuj7+j18S9FOyzfLfeZB1tPoaokcks+512mS52tYZAtyaP20FKIMWuLB T1bO41HJI1JTqTbKeyPWhFvLF2Q/w118uSdkA8exi9Ou4Ib1zA7BMtdQXk+gyoyio5M5HZD4M8ZYz jIGi2fiu0fyDYmVTlPzl1rbpoV/i6lCDFT6KCo/tai+CP7kYfQB3upnXJTfoLp3q0Z+lijrabimTj njE0LdTQ==; Received: from [191.193.15.45] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qbTMJ-00Ha7K-SZ; Thu, 31 Aug 2023 00:08:20 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?UTF-8?q?=27Marek=20Ol=C5=A1=C3=A1k=27?= , =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v2 2/2] drm/amdgpu: Create an option to disable soft recovery Date: Wed, 30 Aug 2023 19:08:08 -0300 Message-ID: <20230830220808.421935-3-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230830220808.421935-1-andrealmeid@igalia.com> References: <20230830220808.421935-1-andrealmeid@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Create a module option to disable soft recoveries on amdgpu, making every recovery go through the device reset path. This option makes easier to force device resets for testing and debugging purposes. Signed-off-by: Andr=C3=A9 Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +++++- drivers/gpu/drm/amd/include/amd_shared.h | 1 + 4 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdg= pu/amdgpu.h index 82eaccfce347..5f49e2c0ae7a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1105,6 +1105,7 @@ struct amdgpu_device { /* Debug */ bool debug_vm; bool debug_largebar; + bool debug_disable_soft_recovery; }; =20 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_drv.c index 0cd48c025433..59e9fe594b51 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -927,6 +927,7 @@ MODULE_PARM_DESC(enforce_isolation, "enforce process is= olation between graphics * - 0x2: Enable simulating large-bar capability on non-large bar system. = This * limits the VRAM size reported to ROCm applications to the visible * size, usually 256MB. + * - 0x4: Disable GPU soft recovery */ MODULE_PARM_DESC(debug_mask, "debug options for amdgpu, disabled by defaul= t"); module_param_named(debug_mask, amdgpu_debug_mask, uint, 0444); @@ -2046,6 +2047,11 @@ static void amdgpu_init_debug_options(struct amdgpu_= device *adev) pr_info("debug: enabled simulating large-bar capability on non-large bar= system\n"); adev->debug_largebar =3D true; } + + if (amdgpu_debug_mask & AMDGPU_DEBUG_DISABLE_GPU_SOFT_RECOVERY) { + pr_info("debug: soft reset for GPU recovery disabled\n"); + adev->debug_disable_soft_recovery =3D true; + } } =20 static int amdgpu_pci_probe(struct pci_dev *pdev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd= /amdgpu/amdgpu_ring.c index 80d6e132e409..6a80d3ec887e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *rin= g, unsigned int vmid, struct dma_fence *fence) { unsigned long flags; + ktime_t deadline; =20 - ktime_t deadline =3D ktime_add_us(ktime_get(), 10000); + if (unlikely(ring->adev->debug_disable_soft_recovery)) + return false; + + deadline =3D ktime_add_us(ktime_get(), 10000); =20 if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence) return false; diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd= /include/amd_shared.h index 2fd6af2183cc..32ee982be99e 100644 --- a/drivers/gpu/drm/amd/include/amd_shared.h +++ b/drivers/gpu/drm/amd/include/amd_shared.h @@ -263,6 +263,7 @@ enum amd_dpm_forced_level; enum AMDGPU_DEBUG_MASK { AMDGPU_DEBUG_VM =3D BIT(0), AMDGPU_DEBUG_LARGEBAR =3D BIT(1), + AMDGPU_DEBUG_DISABLE_GPU_SOFT_RECOVERY =3D BIT(2), }; =20 /** --=20 2.41.0