From nobody Thu Dec 18 09:47:13 2025 Received: from out30-113.freemail.mail.aliyun.com (out30-113.freemail.mail.aliyun.com [115.124.30.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAE65155C88 for ; Wed, 18 Dec 2024 07:03:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734505427; cv=none; b=A8hJIE5akspzYrjLBwu9VaCxcT3m0bJEvX+Am/kwsXuG3cVjgBePoeXcQ5+AGChbqfZeV+nyf3mYsAqcwBEMwXWLok9J9vyNXNvzxjgAVM+jhcVJE84b50D98Wa0UoynAfUw0RJuEbawDh3gtFRVlcXvDg5zaxJdhLOmtwdxIvI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734505427; c=relaxed/simple; bh=CUt40jyt7IiQ915HP+OD9Ixj74jHEJwC4oLqe1+m8GA=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=sUlZMYdyJz/ho3bl035AdGUdwpoj5IHf0U3fnX/CZJOInfhbGsTG/OrJSEcilkQ41vgz8Abkhfv3Nhx0EcgUlVkM8HCR2zJD/Q8r8zYzVH0k9rz0e3o6F5/0WfBqUbz8dYKZ2JcHwJaavd3UQf4XsCJb1HTegn4YmZYES4AORRU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=mQNJdZAH; arc=none smtp.client-ip=115.124.30.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="mQNJdZAH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1734505421; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=He9K+P2n6BLbj3yv5dcRp+3tzz3/MpDmPnSxCDaPvPw=; b=mQNJdZAHxfJ1Gh2dOaszwuzbizxIKAb1WlwPtgtnBat0+YeEvUxgqvgmZcQegrNalXH6s8CkcKQ9O+dPZ6huFjTKnvAycIOYfNMnoExsL49ZJsSbJrnkQRscONhmkEc01jvfGSe4NtmovyIUOMA4yeMH4zeUxW3U7nES6hdTxGo= Received: from localhost(mailfrom:tianruidong@linux.alibaba.com fp:SMTPD_---0WLlha9x_1734505419 cluster:ay36) by smtp.aliyun-inc.com; Wed, 18 Dec 2024 15:03:40 +0800 From: Ruidong Tian To: amd-gfx@lists.freedesktop.org Cc: alexander.deucher@amd.com, christian.koenig@amd.com, Xinhui.Pan@amd.com, linux-kernel@vger.kernel.org, Ruidong Tian Subject: [PATCH] drm/amdgpu: add tracepoint while dump mca bank Date: Wed, 18 Dec 2024 15:03:37 +0800 Message-Id: <20241218070337.70381-1-tianruidong@linux.alibaba.com> X-Mailer: git-send-email 2.33.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A user-space program, such as rasdaemon, can capture tracepoint information to decode MCA errors, similar to AMD SMCA error. Signed-off-by: Ruidong Tian --- drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 31 +++++++++++++++++++++++ 2 files changed, 34 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_mca.c index 3ca03b5e0f91..9daa95365457 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c @@ -23,6 +23,7 @@ #include "amdgpu_ras.h" #include "amdgpu.h" #include "amdgpu_mca.h" +#include "amdgpu_trace.h" =20 #include "umc/umc_6_7_0_offset.h" #include "umc/umc_6_7_0_sh_mask.h" @@ -287,6 +288,8 @@ static void amdgpu_mca_smu_mca_bank_dump(struct amdgpu_= device *adev, int idx, st idx, entry->regs[MCA_REG_IDX_IPID]); RAS_EVENT_LOG(adev, event_id, HW_ERR "aca entry[%02d].SYND=3D0x%016llx\n", idx, entry->regs[MCA_REG_IDX_SYND]); + + trace_amdgpu_mca_bank_dumps(event_id, idx, entry); } =20 static int amdgpu_mca_smu_get_valid_mca_count(struct amdgpu_device *adev, = enum amdgpu_mca_error_type type, uint32_t *count) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/am= d/amdgpu/amdgpu_trace.h index 383fce40d4dd..3dee028b3138 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h @@ -554,6 +554,37 @@ TRACE_EVENT(amdgpu_reset_reg_dumps, __entry->value) ); =20 +TRACE_EVENT(amdgpu_mca_bank_dumps, + TP_PROTO(uint64_t event_id, int idx, struct mca_bank_entry *e), + TP_ARGS(event_id, idx, e), + TP_STRUCT__entry( + __field(uint64_t, event_id) + __field(int, idx) + __field(uint64_t, status) + __field(uint64_t, addr) + __field(uint64_t, misc0) + __field(uint64_t, ipid) + __field(uint64_t, synd) + ), + TP_fast_assign( + __entry->event_id =3D event_id; + __entry->idx =3D idx; + __entry->status =3D e->regs[MCA_REG_IDX_STATUS]; + __entry->addr =3D e->regs[MCA_REG_IDX_ADDR]; + __entry->misc0 =3D e->regs[MCA_REG_IDX_MISC0]; + __entry->ipid =3D e->regs[MCA_REG_IDX_IPID]; + __entry->synd =3D e->regs[MCA_REG_IDX_SYND]; + ), + TP_printk("amdgpu mca bank dump: event_id: %lld, idx: %d, STATU= S: %016llx, ADDR: %016llx, MISC0: %016llx, IPID: %016llx, SYND: %016llx", + __entry->event_id, + __entry->idx, + __entry->status, + __entry->addr, + __entry->misc0, + __entry->ipid, + __entry->synd) +); + #undef AMDGPU_JOB_GET_TIMELINE_NAME #endif =20 --=20 2.43.5