From nobody Tue Jun 30 08:09:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C85E4C433EF for ; Fri, 21 Jan 2022 18:26:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1382076AbiAUS0H (ORCPT ); Fri, 21 Jan 2022 13:26:07 -0500 Received: from smtp-fw-9102.amazon.com ([207.171.184.29]:20331 "EHLO smtp-fw-9102.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235596AbiAUS0F (ORCPT ); Fri, 21 Jan 2022 13:26:05 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1642789566; x=1674325566; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=GtfC2adnN8e0FlU+Pzw38v4HLYKFGTj1LqZMspVP5xI=; b=MiKoOjKm3alUFSmy1bUo8lAHP3rkr4rFZQb2jr6q2GXayyQGIUAh4/Fx NDAct3SQccs3HPxSEoDZIk5I6Pcy2/m+x+XnygM1LVK31AEBxagOw4J8r N/ycPcHIep/jBDN2MfOD4U7RaypXa0CCVueluTNU0M2WmFkc3ES77K6yL I=; X-IronPort-AV: E=Sophos;i="5.88,306,1635206400"; d="scan'208";a="189114961" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-pdx-2b-5a09360d.us-west-2.amazon.com) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP; 21 Jan 2022 18:26:06 +0000 Received: from EX13MTAUWB001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan3.pdx.amazon.com [10.236.137.198]) by email-inbound-relay-pdx-2b-5a09360d.us-west-2.amazon.com (Postfix) with ESMTPS id CE756418E7; Fri, 21 Jan 2022 18:26:04 +0000 (UTC) Received: from EX13D02UWB003.ant.amazon.com (10.43.161.48) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.28; Fri, 21 Jan 2022 18:26:04 +0000 Received: from EX13MTAUEE002.ant.amazon.com (10.43.62.24) by EX13D02UWB003.ant.amazon.com (10.43.161.48) with Microsoft SMTP Server (TLS) id 15.0.1497.28; Fri, 21 Jan 2022 18:26:04 +0000 Received: from dev-dsk-alisaidi-i31e-9f3421fe.us-east-1.amazon.com (10.200.138.153) by mail-relay.amazon.com (10.43.62.224) with Microsoft SMTP Server id 15.0.1497.28 via Frontend Transport; Fri, 21 Jan 2022 18:26:03 +0000 Received: by dev-dsk-alisaidi-i31e-9f3421fe.us-east-1.amazon.com (Postfix, from userid 5131138) id 9E46D21BB4; Fri, 21 Jan 2022 18:26:03 +0000 (UTC) From: Ali Saidi To: , , CC: , , Peter Zijlstra , Ingo Molnar , "Arnaldo Carvalho de Melo" , Mark Rutland , "Alexander Shishkin" , Jiri Olsa , Namhyung Kim , John Garry , "Will Deacon" , Mathieu Poirier , "Leo Yan" , James Clark , German Gomez , Andrew Kilroy Subject: [PATCH] perf arm-spe: Use SPE data source for neoverse cores Date: Fri, 21 Jan 2022 18:24:49 +0000 Message-ID: <20220121182456.13538-1-alisaidi@amazon.com> X-Mailer: git-send-email 2.24.4.AMZN MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When synthesizing data from SPE, augment the type with source information for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use the same encoding. I can't find encoding information for any other SPE implementations to unify their choices with Arm's thus that is left for fut= ure work. This changes enables the expected behavior of perf c2c on a system with SPE= where lines that are shared among multiple cores show up in perf c2c output.=20 Signed-off-by: Ali Saidi =20 --- .../util/arm-spe-decoder/arm-spe-decoder.c | 1 + .../util/arm-spe-decoder/arm-spe-decoder.h | 12 +++++ tools/perf/util/arm-spe.c | 48 ++++++++++++++----- 3 files changed, 49 insertions(+), 12 deletions(-) diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf= /util/arm-spe-decoder/arm-spe-decoder.c index 5e390a1a79ab..091987dd3966 100644 --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c @@ -220,6 +220,7 @@ static int arm_spe_read_record(struct arm_spe_decoder *= decoder) =20 break; case ARM_SPE_DATA_SOURCE: + decoder->record.source =3D payload; break; case ARM_SPE_BAD: break; diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf= /util/arm-spe-decoder/arm-spe-decoder.h index 69b31084d6be..1ecf4ee99415 100644 --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h @@ -29,10 +29,22 @@ enum arm_spe_op_type { ARM_SPE_ST =3D 1 << 1, }; =20 +enum arm_spe_neoverse_data_source { + ARM_SPE_NV_L1D =3D 0x0, + ARM_SPE_NV_L2 =3D 0x8, + ARM_SPE_NV_PEER_CORE =3D 0x9, + ARM_SPE_NV_LCL_CLSTR =3D 0xa, + ARM_SPE_NV_SYS_CACHE =3D 0xb, + ARM_SPE_NV_PEER_CLSTR =3D 0xc, + ARM_SPE_NV_REMOTE =3D 0xd, + ARM_SPE_NV_DRAM =3D 0xe, +}; + struct arm_spe_record { enum arm_spe_sample_type type; int err; u32 op; + u16 source; u32 latency; u64 from_ip; u64 to_ip; diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index d2b64e3f588b..d025af13f5e4 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -34,6 +34,7 @@ #include "arm-spe-decoder/arm-spe-decoder.h" #include "arm-spe-decoder/arm-spe-pkt-decoder.h" =20 +#include <../../../arch/arm64/include/asm/cputype.h> #define MAX_TIMESTAMP (~0ULL) =20 struct arm_spe { @@ -45,6 +46,7 @@ struct arm_spe { struct perf_session *session; struct machine *machine; u32 pmu_type; + u64 midr; =20 struct perf_tsc_conversion tc; =20 @@ -399,9 +401,16 @@ static bool arm_spe__is_memory_event(enum arm_spe_samp= le_type type) return false; } =20 -static u64 arm_spe__synth_data_source(const struct arm_spe_record *record) +static const struct midr_range neoverse_spe[] =3D { + MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1), + MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2), + {}, +}; + +static u64 arm_spe__synth_data_source(const struct arm_spe_record *record,= u64 midr) { union perf_mem_data_src data_src =3D { 0 }; + bool is_neoverse =3D is_midr_in_range(midr, neoverse_spe); =20 if (record->op =3D=3D ARM_SPE_LD) data_src.mem_op =3D PERF_MEM_OP_LOAD; @@ -409,19 +418,30 @@ static u64 arm_spe__synth_data_source(const struct ar= m_spe_record *record) data_src.mem_op =3D PERF_MEM_OP_STORE; =20 if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) { - data_src.mem_lvl =3D PERF_MEM_LVL_L3; + if (is_neoverse && record->source =3D=3D ARM_SPE_NV_DRAM) { + data_src.mem_lvl =3D PERF_MEM_LVL_LOC_RAM | PERF_MEM_LVL_HIT; + } else if (is_neoverse && record->source =3D=3D ARM_SPE_NV_PEER_CLSTR) { + data_src.mem_snoop =3D PERF_MEM_SNOOP_HITM; + data_src.mem_lvl =3D PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT; + } else { + data_src.mem_lvl =3D PERF_MEM_LVL_L3; =20 - if (record->type & ARM_SPE_LLC_MISS) - data_src.mem_lvl |=3D PERF_MEM_LVL_MISS; - else - data_src.mem_lvl |=3D PERF_MEM_LVL_HIT; + if (record->type & ARM_SPE_LLC_MISS) + data_src.mem_lvl |=3D PERF_MEM_LVL_MISS; + else + data_src.mem_lvl |=3D PERF_MEM_LVL_HIT; + } } else if (record->type & (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS)) { - data_src.mem_lvl =3D PERF_MEM_LVL_L1; + if (is_neoverse && record->source =3D=3D ARM_SPE_NV_L2) { + data_src.mem_lvl =3D PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT; + } else { + data_src.mem_lvl =3D PERF_MEM_LVL_L1; =20 - if (record->type & ARM_SPE_L1D_MISS) - data_src.mem_lvl |=3D PERF_MEM_LVL_MISS; - else - data_src.mem_lvl |=3D PERF_MEM_LVL_HIT; + if (record->type & ARM_SPE_L1D_MISS) + data_src.mem_lvl |=3D PERF_MEM_LVL_MISS; + else + data_src.mem_lvl |=3D PERF_MEM_LVL_HIT; + } } =20 if (record->type & ARM_SPE_REMOTE_ACCESS) @@ -446,7 +466,7 @@ static int arm_spe_sample(struct arm_spe_queue *speq) u64 data_src; int err; =20 - data_src =3D arm_spe__synth_data_source(record); + data_src =3D arm_spe__synth_data_source(record, spe->midr); =20 if (spe->sample_flc) { if (record->type & ARM_SPE_L1D_MISS) { @@ -796,6 +816,10 @@ static int arm_spe_process_event(struct perf_session *= session, u64 timestamp; struct arm_spe *spe =3D container_of(session->auxtrace, struct arm_spe, auxtrace); + const char *cpuid =3D perf_env__cpuid(session->evlist->env); + u64 midr =3D strtol(cpuid, NULL, 16); + + spe->midr =3D midr; =20 if (dump_trace) return 0; --=20 2.33.1