From nobody Fri Dec 19 21:15:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A863CA0EC1 for ; Mon, 11 Sep 2023 21:39:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350058AbjIKVfj (ORCPT ); Mon, 11 Sep 2023 17:35:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243466AbjIKRNN (ORCPT ); Mon, 11 Sep 2023 13:13:13 -0400 Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF432121 for ; Mon, 11 Sep 2023 10:13:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=y0rAAye4zMNBjSASBqB4b1RQtLpL4q4XWxq6q2Rv4gM=; b=puMbAwoDDgER5jDqbmXb9wQME0 X4k97rYD7YiYsXSYvj88a++AAHscWTxawqcEPyJc6M8Gf1vRAENuopl0fVR70bTLXRzK0cX2lvHCw xjP4GeRMZ01twMjld5Sw3FmHbjDMwX5vxcS7LpVuVy7rGQq+mnR/wTLtgZwDWQMJtjrecZsjXecwp JP03uZrcDEc6tbql52fpXuKhvLZojB2S6yvleno4yosY8EqEEpQ/MWu/jbrA73uoKcKt+YKXlj8ku 89tv2+n/jllkFG8sPwK/qNwhIAA5kYworadG/KWxl82afeZ8bx44rNaJXM49cAJrpLChvZA+00teY DowJckIw==; Received: from [187.10.203.89] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qfkTC-002OoM-Gm; Mon, 11 Sep 2023 19:13:06 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, Shashank Sharma , hamza.mahfooz@amd.com, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v4 2/2] drm/amdgpu: Create an option to disable soft recovery Date: Mon, 11 Sep 2023 14:12:55 -0300 Message-ID: <20230911171255.143992-3-andrealmeid@igalia.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20230911171255.143992-1-andrealmeid@igalia.com> References: <20230911171255.143992-1-andrealmeid@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Create a module option to disable soft recoveries on amdgpu, making every recovery go through the device reset path. This option makes easier to force device resets for testing and debugging purposes. Signed-off-by: Andr=C3=A9 Almeida Reviewed-by: Christian K=C3=B6nig --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 7 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 7 ++++++- 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdg= pu/amdgpu.h index 37eb9b3790a0..f30490abb3fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1103,6 +1103,7 @@ struct amdgpu_device { /* Debug */ bool debug_vm; bool debug_largebar; + bool debug_disable_soft_recovery; }; =20 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_drv.c index 830146bd61c0..3ab7eac131e2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -124,6 +124,7 @@ enum AMDGPU_DEBUG_MASK { AMDGPU_DEBUG_VM =3D BIT(0), AMDGPU_DEBUG_LARGEBAR =3D BIT(1), + AMDGPU_DEBUG_DISABLE_GPU_SOFT_RECOVERY =3D BIT(2), }; =20 unsigned int amdgpu_vram_limit =3D UINT_MAX; @@ -945,6 +946,7 @@ MODULE_PARM_DESC(enforce_isolation, "enforce process is= olation between graphics * - 0x2: Enable simulating large-bar capability on non-large bar system. = This * limits the VRAM size reported to ROCm applications to the visible * size, usually 256MB. + * - 0x4: Disable GPU soft recovery, always do a full reset */ MODULE_PARM_DESC(debug_mask, "debug options for amdgpu, disabled by defaul= t"); module_param_named(debug_mask, amdgpu_debug_mask, uint, 0444); @@ -2064,6 +2066,11 @@ static void amdgpu_init_debug_options(struct amdgpu_= device *adev) pr_info("debug: enabled simulating large-bar capability on non-large bar= system\n"); adev->debug_largebar =3D true; } + + if (amdgpu_debug_mask & AMDGPU_DEBUG_DISABLE_GPU_SOFT_RECOVERY) { + pr_info("debug: soft reset for GPU recovery disabled\n"); + adev->debug_disable_soft_recovery =3D true; + } } =20 static int amdgpu_pci_probe(struct pci_dev *pdev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd= /amdgpu/amdgpu_ring.c index da26c555af24..231d49132a56 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -433,7 +433,12 @@ void amdgpu_ring_emit_reg_write_reg_wait_helper(struct= amdgpu_ring *ring, bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid, struct dma_fence *fence) { - ktime_t deadline =3D ktime_add_us(ktime_get(), 10000); + ktime_t deadline; + + if (unlikely(ring->adev->debug_disable_soft_recovery)) + return false; + + deadline =3D ktime_add_us(ktime_get(), 10000); =20 if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence) return false; --=20 2.42.0