From nobody Mon Feb 9 19:05:28 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1ED1C83F01 for ; Wed, 30 Aug 2023 22:10:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245167AbjH3WKs (ORCPT ); Wed, 30 Aug 2023 18:10:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245157AbjH3WKo (ORCPT ); Wed, 30 Aug 2023 18:10:44 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72B67CDB for ; Wed, 30 Aug 2023 15:10:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=s5b3JIGKv+IRuR82jvSbbKiUEIXZQN02jOcNSvcB+ks=; b=qdUuglmNpHEjwddBO35XMHmZx1 7hlpnViD1e+/lld25QZ8+ik0F94EQ6Uk1rLRYD2QFG2nPHeGw/Uu9p2gcRP+FEUxCn0tGVBlS7RQt WWv5X/k5YY0ZRjZyiWkl5Uuj7+j18S9FOyzfLfeZB1tPoaokcks+512mS52tYZAtyaP20FKIMWuLB T1bO41HJI1JTqTbKeyPWhFvLF2Q/w118uSdkA8exi9Ou4Ib1zA7BMtdQXk+gyoyio5M5HZD4M8ZYz jIGi2fiu0fyDYmVTlPzl1rbpoV/i6lCDFT6KCo/tai+CP7kYfQB3upnXJTfoLp3q0Z+lijrabimTj njE0LdTQ==; Received: from [191.193.15.45] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qbTMJ-00Ha7K-SZ; Thu, 31 Aug 2023 00:08:20 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?UTF-8?q?=27Marek=20Ol=C5=A1=C3=A1k=27?= , =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v2 2/2] drm/amdgpu: Create an option to disable soft recovery Date: Wed, 30 Aug 2023 19:08:08 -0300 Message-ID: <20230830220808.421935-3-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230830220808.421935-1-andrealmeid@igalia.com> References: <20230830220808.421935-1-andrealmeid@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Create a module option to disable soft recoveries on amdgpu, making every recovery go through the device reset path. This option makes easier to force device resets for testing and debugging purposes. Signed-off-by: Andr=C3=A9 Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +++++- drivers/gpu/drm/amd/include/amd_shared.h | 1 + 4 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdg= pu/amdgpu.h index 82eaccfce347..5f49e2c0ae7a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1105,6 +1105,7 @@ struct amdgpu_device { /* Debug */ bool debug_vm; bool debug_largebar; + bool debug_disable_soft_recovery; }; =20 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_drv.c index 0cd48c025433..59e9fe594b51 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -927,6 +927,7 @@ MODULE_PARM_DESC(enforce_isolation, "enforce process is= olation between graphics * - 0x2: Enable simulating large-bar capability on non-large bar system. = This * limits the VRAM size reported to ROCm applications to the visible * size, usually 256MB. + * - 0x4: Disable GPU soft recovery */ MODULE_PARM_DESC(debug_mask, "debug options for amdgpu, disabled by defaul= t"); module_param_named(debug_mask, amdgpu_debug_mask, uint, 0444); @@ -2046,6 +2047,11 @@ static void amdgpu_init_debug_options(struct amdgpu_= device *adev) pr_info("debug: enabled simulating large-bar capability on non-large bar= system\n"); adev->debug_largebar =3D true; } + + if (amdgpu_debug_mask & AMDGPU_DEBUG_DISABLE_GPU_SOFT_RECOVERY) { + pr_info("debug: soft reset for GPU recovery disabled\n"); + adev->debug_disable_soft_recovery =3D true; + } } =20 static int amdgpu_pci_probe(struct pci_dev *pdev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd= /amdgpu/amdgpu_ring.c index 80d6e132e409..6a80d3ec887e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *rin= g, unsigned int vmid, struct dma_fence *fence) { unsigned long flags; + ktime_t deadline; =20 - ktime_t deadline =3D ktime_add_us(ktime_get(), 10000); + if (unlikely(ring->adev->debug_disable_soft_recovery)) + return false; + + deadline =3D ktime_add_us(ktime_get(), 10000); =20 if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence) return false; diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd= /include/amd_shared.h index 2fd6af2183cc..32ee982be99e 100644 --- a/drivers/gpu/drm/amd/include/amd_shared.h +++ b/drivers/gpu/drm/amd/include/amd_shared.h @@ -263,6 +263,7 @@ enum amd_dpm_forced_level; enum AMDGPU_DEBUG_MASK { AMDGPU_DEBUG_VM =3D BIT(0), AMDGPU_DEBUG_LARGEBAR =3D BIT(1), + AMDGPU_DEBUG_DISABLE_GPU_SOFT_RECOVERY =3D BIT(2), }; =20 /** --=20 2.41.0