From nobody Thu Dec 18 20:18:33 2025 Received: from out203-205-221-231.mail.qq.com (out203-205-221-231.mail.qq.com [203.205.221.231]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4A2726B09E for ; Thu, 13 Feb 2025 15:12:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=203.205.221.231 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739459574; cv=none; b=kLjTnGM5oM9IEUSJlUS51zgxDv+4IkeNO/UJKwF5LBkJ+5/Mx0BDYWPaoYvnhye6gRQ6akG0KDTZMlKkRiTJeI+Y4ZupcpvrO420GqQ4oU5lW4aHTZcRz4V7mTn7RJ5P6K4VZhXIvOeUF/UHu6YjkR3SnzI1AMIWGFbDwtT//zk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739459574; c=relaxed/simple; bh=2jcKfs8j2570hfqmJFobbtJr+YFiijx6fS5+6LD050o=; h=Message-ID:From:To:Cc:Subject:Date:In-Reply-To:References: MIME-Version; b=RagN+g5Gcd93p2fJXSMi8AjTfQjF7jl8mp5LmF8d+dVhPnyIteGe/GCESaNtC9F9reJl6kVDT3OnUzSNx9kbXHNWuyvFoU36foBrKdtgYQHJwZIPiDsVOWUHuo+bGp0PhO/RCUS/VDSWlMHTWdMqMUlNU5cs3PMiRKvt8YAyrIo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=cyyself.name; spf=pass smtp.mailfrom=cyyself.name; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b=HseCZg/Z; arc=none smtp.client-ip=203.205.221.231 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=cyyself.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cyyself.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b="HseCZg/Z" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1739459559; bh=U37SpgKCfAQXTOl9oGsbvq6Zo2b4aluskpRdxYnCskE=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=HseCZg/ZrcNU8G8visZyvtZZdarZH3AWmmygTcA3ntfjv7Zxz1w+hue/z+1ByzW2I 7aXoEQajF8J5oB/DDW/O0UgjT05veR2AgdN1rukjKINrpYr3FcYozapVg3LgRmQVof gUi3ytpFsSpb3weY2wvRoQ6yRkNlutii8Ntq8+Go= Received: from cyy-pc.lan ([240e:379:2251:3600:f57b:26f9:9718:486c]) by newxmesmtplogicsvrszb16-1.qq.com (NewEsmtp) with SMTP id 3212741D; Thu, 13 Feb 2025 23:12:33 +0800 X-QQ-mid: xmsmtpt1739459553t5bpdbvqy Message-ID: X-QQ-XMAILINFO: NnIX2CK8LSsJ3nFwDszqQ65PatHZarqNE0mnYsZsqegbGZxPt75uhFbjNEJ9u4 Cox/zl30YWP1PXQZdh/tHMM1D+EmgORCQvbsbRYVXdHZYOLo3g80Jc1rckUcHPM54LMNBfJjxlvw ecw6YIYZdlBaSBT/siR+8+ZEzv16r/rYQQIUgik86Jtdr0j4pNgedMfVpFDoM9OzfpH/cqnYH6Ft iYtWU6p4ErA7f6KSTU0/+glN626I8JlEu046Pm4aMJA/KOkPa5uu3ZQUNWS5+/r3Fl3SDCLcTjNb EKbcArcEHOu1/JbjWhhDI81/VzWsQJ7Ifd0s96PWOn08bnJIQeLvM/q+lud7RACo7xYLJ/CySKvX IdItbCFSfDKhCYP4hCzQ1dmUlj9DTDnH7EJz3G/+VwFOklhD9f52pGdxhu3fwRSfmCw5ov+EstVE vOQRJjnp1WIZIVh/wuinmeuG+adKoks2EZN6QAEUnLJdVqRqrxlU0b87vdipl0xnf1LRULxQIybV Q+ivDM/QA1Xkq6ffVQZKJLyVdz6FWoFHw89iwJZj97lEFi9jT55kmENXsrajTgk7sqKFJg78vlsK aduYgdl8kHveBgBCyJj8lrtzV+LPorL4oEzahKa6xsbTMDXVeu75h204ae7IFjX79GKrVJBATaqX IiG/zhbxxKvgAwigMQcGBCC0mn+ISgSGYTDysjLblqpQLIrMXWC9LTqSGCc67QrtSwOQKbgU/wKe 5dzODMn1JxU+OfAj1Co5PvIS/Z44YAO6Y/4VlrLRcrMykVjPZStLmeY86tj/XASCvpxFmJlgyFpW s72S9EnYdeDuKx3uS5sYqIvOa6KjeWFMeyPTYr2L1eluSYOXIpgzjcd5+wZupd1+paguIbbqCv2v EE7SgDPV6MB4T04a2+62sCYUMP3BT7YtjJqKnj/RUlSAPeAk7zunUJi3/CZ5KKWgc6G4jlPFF8aw YY15TJ/p+Lnv6IVCuHR0oYknbLl95Iv8/3RhXpkAvOHMNRdj+JWW3EhyXpLVwfrMTbXUPsFlArO+ wRWMwEXg== X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: Yangyu Chen To: linux-perf-users@vger.kernel.org Cc: John Garry , Will Deacon , James Clark , Mike Leach , Leo Yan , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Liang Kan , Yoshihiro Furudera , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Yangyu Chen Subject: [PATCH 1/2] perf vendor events arm64: Add Cortex-A720 events/metrics Date: Thu, 13 Feb 2025 23:12:25 +0800 X-OQ-MSGID: <20250213151226.187205-1-cyy@cyyself.name> X-Mailer: git-send-email 2.47.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add JSON files for Cortex-A720 events and metrics. Using the existing Neoverse N3 JSON files as a template, I manually checked the missing and extra events/metrics using my script [1] and modified them according to the Arm Cortex-A720 Core Technical Reference Manual [2]. [1] https://github.com/cyyself/arm-pmu-check/tree/1075bebeb3f1441067448251a= 387df35af15bf16 [2] https://developer.arm.com/documentation/102530/0002/Performance-Monitor= s-Extension-support-/Performance-monitors-events Signed-off-by: Yangyu Chen --- .../arch/arm64/arm/cortex-a720/bus.json | 18 + .../arch/arm64/arm/cortex-a720/exception.json | 62 +++ .../arm64/arm/cortex-a720/fp_operation.json | 22 + .../arch/arm64/arm/cortex-a720/general.json | 10 + .../arch/arm64/arm/cortex-a720/l1d_cache.json | 50 ++ .../arch/arm64/arm/cortex-a720/l1i_cache.json | 14 + .../arch/arm64/arm/cortex-a720/l2_cache.json | 62 +++ .../arch/arm64/arm/cortex-a720/l3_cache.json | 22 + .../arch/arm64/arm/cortex-a720/ll_cache.json | 10 + .../arch/arm64/arm/cortex-a720/memory.json | 54 +++ .../arch/arm64/arm/cortex-a720/metrics.json | 436 ++++++++++++++++++ .../arch/arm64/arm/cortex-a720/pmu.json | 8 + .../arch/arm64/arm/cortex-a720/retired.json | 90 ++++ .../arch/arm64/arm/cortex-a720/spe.json | 42 ++ .../arm64/arm/cortex-a720/spec_operation.json | 90 ++++ .../arch/arm64/arm/cortex-a720/stall.json | 82 ++++ .../arch/arm64/arm/cortex-a720/sve.json | 50 ++ .../arch/arm64/arm/cortex-a720/tlb.json | 74 +++ .../arch/arm64/arm/cortex-a720/trace.json | 32 ++ tools/perf/pmu-events/arch/arm64/mapfile.csv | 1 + 20 files changed, 1229 insertions(+) create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.js= on create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/except= ion.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_ope= ration.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/genera= l.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_ca= che.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_ca= che.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cac= he.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cac= he.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cac= he.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory= .json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metric= s.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.js= on create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retire= d.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.js= on create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_o= peration.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.= json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.js= on create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.js= on create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.= json diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json b/to= ols/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json new file mode 100644 index 000000000000..2e11a8c4a484 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json @@ -0,0 +1,18 @@ +[ + { + "ArchStdEvent": "BUS_ACCESS", + "PublicDescription": "Counts memory transactions issued by the CPU= to the external bus, including snoop requests and snoop responses. Each be= at of data is counted individually." + }, + { + "ArchStdEvent": "BUS_CYCLES", + "PublicDescription": "Counts bus cycles in the CPU. Bus cycles rep= resent a clock cycle in which a transaction could be sent or received on th= e interface from the CPU to the external bus. Since that interface is drive= n at the same clock speed as the CPU, this event is a duplicate of CPU_CYCL= ES." + }, + { + "ArchStdEvent": "BUS_ACCESS_RD", + "PublicDescription": "Counts memory read transactions seen on the = external bus. Each beat of data is counted individually." + }, + { + "ArchStdEvent": "BUS_ACCESS_WR", + "PublicDescription": "Counts memory write transactions seen on the= external bus. Each beat of data is counted individually." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.jso= n b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json new file mode 100644 index 000000000000..7126fbf292e0 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json @@ -0,0 +1,62 @@ +[ + { + "ArchStdEvent": "EXC_TAKEN", + "PublicDescription": "Counts any taken architecturally visible exc= eptions such as IRQ, FIQ, SError, and other synchronous exceptions. Excepti= ons are counted whether or not they are taken locally." + }, + { + "ArchStdEvent": "EXC_RETURN", + "PublicDescription": "Counts any architecturally executed exceptio= n return instructions. For example: AArch64: ERET" + }, + { + "ArchStdEvent": "EXC_UNDEF", + "PublicDescription": "Counts the number of synchronous exceptions = which are taken locally that are due to attempting to execute an instructio= n that is UNDEFINED. Attempting to execute instruction bit patterns that ha= ve not been allocated. Attempting to execute instructions when they are dis= abled. Attempting to execute instructions at an inappropriate Exception lev= el. Attempting to execute an instruction when the value of PSTATE.IL is 1." + }, + { + "ArchStdEvent": "EXC_SVC", + "PublicDescription": "Counts SVC exceptions taken locally." + }, + { + "ArchStdEvent": "EXC_PABORT", + "PublicDescription": "Counts synchronous exceptions that are taken= locally and caused by Instruction Aborts." + }, + { + "ArchStdEvent": "EXC_DABORT", + "PublicDescription": "Counts exceptions that are taken locally and= are caused by data aborts or SErrors. Conditions that could cause those ex= ceptions are attempting to read or write memory where the MMU generates a f= ault, attempting to read or write memory with a misaligned address, interru= pts from the nSEI inputs and internally generated SErrors." + }, + { + "ArchStdEvent": "EXC_IRQ", + "PublicDescription": "Counts IRQ exceptions including the virtual = IRQs that are taken locally." + }, + { + "ArchStdEvent": "EXC_FIQ", + "PublicDescription": "Counts FIQ exceptions including the virtual = FIQs that are taken locally." + }, + { + "ArchStdEvent": "EXC_SMC", + "PublicDescription": "Counts SMC exceptions take to EL3." + }, + { + "ArchStdEvent": "EXC_HVC", + "PublicDescription": "Counts HVC exceptions taken to EL2." + }, + { + "ArchStdEvent": "EXC_TRAP_PABORT", + "PublicDescription": "Counts exceptions which are traps not taken = locally and are caused by Instruction Aborts. For example, attempting to ex= ecute an instruction with a misaligned PC." + }, + { + "ArchStdEvent": "EXC_TRAP_DABORT", + "PublicDescription": "Counts exceptions which are traps not taken = locally and are caused by Data Aborts or SError interrupts. Conditions that= could cause those exceptions are:\n\n1. Attempting to read or write memory= where the MMU generates a fault,\n2. Attempting to read or write memory wi= th a misaligned address,\n3. Interrupts from the SEI input.\n4. internally = generated SErrors." + }, + { + "ArchStdEvent": "EXC_TRAP_OTHER", + "PublicDescription": "Counts the number of synchronous trap except= ions which are not taken locally and are not SVC, SMC, HVC, data aborts, In= struction Aborts, or interrupts." + }, + { + "ArchStdEvent": "EXC_TRAP_IRQ", + "PublicDescription": "Counts IRQ exceptions including the virtual = IRQs that are not taken locally." + }, + { + "ArchStdEvent": "EXC_TRAP_FIQ", + "PublicDescription": "Counts FIQs which are not taken locally but = taken from EL0, EL1,\n or EL2 to EL3 (which would be the normal behavior fo= r FIQs when not executing\n in EL3)." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.= json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json new file mode 100644 index 000000000000..cec3435ac766 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json @@ -0,0 +1,22 @@ +[ + { + "ArchStdEvent": "FP_HP_SPEC", + "PublicDescription": "Counts speculatively executed half precision= floating point operations." + }, + { + "ArchStdEvent": "FP_SP_SPEC", + "PublicDescription": "Counts speculatively executed single precisi= on floating point operations." + }, + { + "ArchStdEvent": "FP_DP_SPEC", + "PublicDescription": "Counts speculatively executed double precisi= on floating point operations." + }, + { + "ArchStdEvent": "FP_SCALE_OPS_SPEC", + "PublicDescription": "Counts speculatively executed scalable singl= e precision floating point operations." + }, + { + "ArchStdEvent": "FP_FIXED_OPS_SPEC", + "PublicDescription": "Counts speculatively executed non-scalable s= ingle precision floating point operations." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json = b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json new file mode 100644 index 000000000000..c5dcdcf43c58 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json @@ -0,0 +1,10 @@ +[ + { + "ArchStdEvent": "CPU_CYCLES", + "PublicDescription": "Counts CPU clock cycles (not timer cycles). = The clock measured by this event is defined as the physical clock driving t= he CPU logic." + }, + { + "ArchStdEvent": "CNT_CYCLES", + "PublicDescription": "Increments at a constant frequency equal to = the rate of increment of the System Counter, CNTPCT_EL0." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.jso= n b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json new file mode 100644 index 000000000000..a6fee569f4c6 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json @@ -0,0 +1,50 @@ +[ + { + "ArchStdEvent": "L1D_CACHE_REFILL", + "PublicDescription": "Counts level 1 data cache refills caused by = speculatively executed load or store operations that missed in the level 1 = data cache. This event only counts one event per cache line." + }, + { + "ArchStdEvent": "L1D_CACHE", + "PublicDescription": "Counts level 1 data cache accesses from any = load/store operations. Atomic operations that resolve in the CPUs caches (n= ear atomic operations) counts as both a write access and read access. Each = access to a cache line is counted including the multiple accesses caused by= single instructions such as LDM or STM. Each access to other level 1 data = or unified memory structures, for example refill buffers, write buffers, an= d write-back buffers, are also counted." + }, + { + "ArchStdEvent": "L1D_CACHE_WB", + "PublicDescription": "Counts write-backs of dirty data from the L1= data cache to the L2 cache. This occurs when either a dirty cache line is = evicted from L1 data cache and allocated in the L2 cache or dirty data is w= ritten to the L2 and possibly to the next level of cache. This event counts= both victim cache line evictions and cache write-backs from snoops or cach= e maintenance operations. The following cache operations are not counted:\n= \n1. Invalidations which do not result in data being transferred out of the= L1 (such as evictions of clean data),\n2. Full line writes which write to = L2 without writing L1, such as write streaming mode." + }, + { + "ArchStdEvent": "L1D_CACHE_LMISS_RD", + "PublicDescription": "Counts cache line refills into the level 1 d= ata cache from any memory read operations, that incurred additional latency= ." + }, + { + "ArchStdEvent": "L1D_CACHE_RD", + "PublicDescription": "Counts level 1 data cache accesses from any = load operation. Atomic load operations that resolve in the CPUs caches coun= ts as both a write access and read access." + }, + { + "ArchStdEvent": "L1D_CACHE_WR", + "PublicDescription": "Counts level 1 data cache accesses generated= by store operations. This event also counts accesses caused by a DC ZVA (d= ata cache zero, specified by virtual address) instruction. Near atomic oper= ations that resolve in the CPUs caches count as a write access and read acc= ess." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_INNER", + "PublicDescription": "Counts level 1 data cache refills where the = cache line data came from caches inside the immediate cluster of the core." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_OUTER", + "PublicDescription": "Counts level 1 data cache refills for which = the cache line data came from outside the immediate cluster of the core, li= ke an SLC in the system interconnect or DRAM." + }, + { + "ArchStdEvent": "L1D_CACHE_INVAL", + "PublicDescription": "Counts each explicit invalidation of a cache= line in the level 1 data cache caused by:\n\n- Cache Maintenance Operation= s (CMO) that operate by a virtual address.\n- Broadcast cache coherency ope= rations from another CPU in the system.\n\nThis event does not count for th= e following conditions:\n\n1. A cache refill invalidates a cache line.\n2. = A CMO which is executed on that CPU and invalidates a cache line specified = by set/way.\n\nNote that CMOs that operate by set/way cannot be broadcast f= rom one CPU to another." + }, + { + "ArchStdEvent": "L1D_CACHE_RW", + "PublicDescription": "Counts level 1 data demand cache accesses fr= om any load or store operation. Near atomic operations that resolve in the = CPUs caches counts as both a write access and read access." + }, + { + "ArchStdEvent": "L1D_CACHE_PRF", + "BriefDescription": "This event counts fetch counted by either Lev= el 1 data hardware prefetch or Level 1 data software prefetch." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_PRF", + "BriefDescription": "This event counts hardware prefetch counted b= y L1D_CACHE_PRF that causes a refill of the Level 1 data cache from outside= of the Level 1 data cache." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.jso= n b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json new file mode 100644 index 000000000000..633f1030359d --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json @@ -0,0 +1,14 @@ +[ + { + "ArchStdEvent": "L1I_CACHE_REFILL", + "PublicDescription": "Counts cache line refills in the level 1 ins= truction cache caused by a missed instruction fetch. Instruction fetches ma= y include accessing multiple instructions, but the single cache line alloca= tion is counted once." + }, + { + "ArchStdEvent": "L1I_CACHE", + "PublicDescription": "Counts instruction fetches which access the = level 1 instruction cache. Instruction cache accesses caused by cache maint= enance operations are not counted." + }, + { + "ArchStdEvent": "L1I_CACHE_LMISS", + "PublicDescription": "Counts cache line refills into the level 1 i= nstruction cache, that incurred additional latency." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json= b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json new file mode 100644 index 000000000000..3806fef42b30 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json @@ -0,0 +1,62 @@ +[ + { + "ArchStdEvent": "L2D_CACHE", + "PublicDescription": "Counts accesses to the level 2 cache due to = data accesses. Level 2 cache is a unified cache for data and instruction ac= cesses. Accesses are for misses in the first level data cache or translatio= n resolutions due to accesses. This event also counts write back of dirty d= ata from level 1 data cache to the L2 cache." + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL", + "PublicDescription": "Counts cache line refills into the level 2 c= ache. Level 2 cache is a unified cache for data and instruction accesses. A= ccesses are for misses in the level 1 data cache or translation resolutions= due to accesses." + }, + { + "ArchStdEvent": "L2D_CACHE_WB", + "PublicDescription": "Counts write-backs of data from the L2 cache= to outside the CPU. This includes snoops to the L2 (from other CPUs) which= return data even if the snoops cause an invalidation. L2 cache line invali= dations which do not write data outside the CPU and snoops which return dat= a from an L1 cache are not counted. Data would not be written outside the c= ache when invalidating a clean cache line." + }, + { + "ArchStdEvent": "L2D_CACHE_ALLOCATE", + "PublicDescription": "Counts level 2 cache line allocates that do = not fetch data from outside the level 2 data or unified cache." + }, + { + "ArchStdEvent": "L2D_CACHE_RD", + "PublicDescription": "Counts level 2 data cache accesses due to me= mory read operations. Level 2 cache is a unified cache for data and instruc= tion accesses, accesses are for misses in the level 1 data cache or transla= tion resolutions due to accesses." + }, + { + "ArchStdEvent": "L2D_CACHE_WR", + "PublicDescription": "Counts level 2 cache accesses due to memory = write operations. Level 2 cache is a unified cache for data and instruction= accesses, accesses are for misses in the level 1 data cache or translation= resolutions due to accesses." + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_RD", + "PublicDescription": "Counts refills for memory accesses due to me= mory read operation counted by L2D_CACHE_RD. Level 2 cache is a unified cac= he for data and instruction accesses, accesses are for misses in the level = 1 data cache or translation resolutions due to accesses." + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_WR", + "PublicDescription": "Counts refills for memory accesses due to me= mory write operation counted by L2D_CACHE_WR. Level 2 cache is a unified ca= che for data and instruction accesses, accesses are for misses in the level= 1 data cache or translation resolutions due to accesses." + }, + { + "ArchStdEvent": "L2D_CACHE_WB_VICTIM", + "PublicDescription": "Counts evictions from the level 2 cache beca= use of a line being allocated into the L2 cache." + }, + { + "ArchStdEvent": "L2D_CACHE_WB_CLEAN", + "PublicDescription": "Counts write-backs from the level 2 cache th= at are a result of either:\n\n1. Cache maintenance operations,\n\n2. Snoop = responses or,\n\n3. Direct cache transfers to another CPU due to a forwardi= ng snoop request." + }, + { + "ArchStdEvent": "L2D_CACHE_INVAL", + "PublicDescription": "Counts each explicit invalidation of a cache= line in the level 2 cache by cache maintenance operations that operate by = a virtual address, or by external coherency operations. This event does not= count if either:\n\n1. A cache refill invalidates a cache line or,\n2. A C= ache Maintenance Operation (CMO), which invalidates a cache line specified = by set/way, is executed on that CPU.\n\nCMOs that operate by set/way cannot= be broadcast from one CPU to another." + }, + { + "ArchStdEvent": "L2D_CACHE_LMISS_RD", + "PublicDescription": "Counts cache line refills into the level 2 u= nified cache from any memory read operations that incurred additional laten= cy." + }, + { + "ArchStdEvent": "L2D_CACHE_RW", + "PublicDescription": "Counts level 2 cache demand accesses from an= y load/store operations. Level 2 cache is a unified cache for data and inst= ruction accesses, accesses are for misses in the level 1 data cache or tran= slation resolutions due to accesses." + }, + { + "ArchStdEvent": "L2D_CACHE_PRF", + "PublicDescription": "Counts level 2 data cache accesses from soft= ware preload or prefetch instructions or hardware prefetcher." + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_PRF", + "PublicDescription": "Counts refills due to accesses generated as = a result of prefetches." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json= b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json new file mode 100644 index 000000000000..4a2e72fc5ada --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json @@ -0,0 +1,22 @@ +[ + { + "ArchStdEvent": "L3D_CACHE_ALLOCATE", + "PublicDescription": "Counts level 3 cache line allocates that do = not fetch data from outside the level 3 data or unified cache. For example,= allocates due to streaming stores." + }, + { + "ArchStdEvent": "L3D_CACHE_REFILL", + "PublicDescription": "Counts level 3 accesses that receive data fr= om outside the L3 cache." + }, + { + "ArchStdEvent": "L3D_CACHE", + "PublicDescription": "Counts level 3 cache accesses. Level 3 cache= is a unified cache for data and instruction accesses. Accesses are for mis= ses in the lower level caches or translation resolutions due to accesses." + }, + { + "ArchStdEvent": "L3D_CACHE_RD", + "PublicDescription": "Counts level 3 cache accesses caused by any = memory read operation. Level 3 cache is a unified cache for data and instru= ction accesses. Accesses are for misses in the lower level caches or transl= ation resolutions due to accesses." + }, + { + "ArchStdEvent": "L3D_CACHE_LMISS_RD", + "PublicDescription": "Counts any cache line refill into the level = 3 cache from memory read operations that incurred additional latency." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json= b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json new file mode 100644 index 000000000000..fd5a2e0099b8 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json @@ -0,0 +1,10 @@ +[ + { + "ArchStdEvent": "LL_CACHE_RD", + "PublicDescription": "Counts read transactions that were returned = from outside the core cluster. This event counts for external last level ca= che when the system register CPUECTLR.EXTLLC bit is set, otherwise it coun= ts for the L3 cache. This event counts read transactions returned from outs= ide the core if those transactions are either hit in the system level cache= or missed in the SLC and are returned from any other external sources." + }, + { + "ArchStdEvent": "LL_CACHE_MISS_RD", + "PublicDescription": "Counts read transactions that were returned = from outside the core cluster but missed in the system level cache. This ev= ent counts for external last level cache when the system register CPUECTLR.= EXTLLC bit is set, otherwise it counts for L3 cache. This event counts read= transactions returned from outside the core if those transactions are miss= ed in the System level Cache. The data source of the transaction is indicat= ed by a field in the CHI transaction returning to the CPU. This event does = not count reads caused by cache maintenance operations." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json b= /tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json new file mode 100644 index 000000000000..f19204a5faae --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json @@ -0,0 +1,54 @@ +[ + { + "ArchStdEvent": "MEM_ACCESS", + "PublicDescription": "Counts memory accesses issued by the CPU loa= d store unit, where those accesses are issued due to load or store operatio= ns. This event counts memory accesses no matter whether the data is receive= d from any level of cache hierarchy or external memory. If memory accesses = are broken up into smaller transactions than what were specified in the loa= d or store instructions, then the event counts those smaller memory transac= tions." + }, + { + "ArchStdEvent": "REMOTE_ACCESS", + "PublicDescription": "Counts accesses to another chip, which is im= plemented as a different CMN mesh in the system. If the CHI bus response ba= ck to the core indicates that the data source is from another chip (mesh), = then the counter is updated. If no data is returned, even if the system sno= ops another chip/mesh, then the counter is not updated." + }, + { + "ArchStdEvent": "MEM_ACCESS_RD", + "PublicDescription": "Counts memory accesses issued by the CPU due= to load operations. The event counts any memory load access, no matter whe= ther the data is received from any level of cache hierarchy or external mem= ory. The event also counts atomic load operations. If memory accesses are b= roken up by the load/store unit into smaller transactions that are issued b= y the bus interface, then the event counts those smaller transactions." + }, + { + "ArchStdEvent": "MEM_ACCESS_WR", + "PublicDescription": "Counts memory accesses issued by the CPU due= to store operations. The event counts any memory store access, no matter w= hether the data is located in any level of cache or external memory. The ev= ent also counts atomic load and store operations. If memory accesses are br= oken up by the load/store unit into smaller transactions that are issued by= the bus interface, then the event counts those smaller transactions." + }, + { + "ArchStdEvent": "LDST_ALIGN_LAT", + "PublicDescription": "Counts the number of memory read and write a= ccesses in a cycle that incurred additional latency, due to the alignment o= f the address and the size of data being accessed, which results in store c= rossing a single cache line." + }, + { + "ArchStdEvent": "LD_ALIGN_LAT", + "PublicDescription": "Counts the number of memory read accesses in= a cycle that incurred additional latency, due to the alignment of the addr= ess and size of data being accessed, which results in load crossing a singl= e cache line." + }, + { + "ArchStdEvent": "ST_ALIGN_LAT", + "PublicDescription": "Counts the number of memory write access in = a cycle that incurred additional latency, due to the alignment of the addre= ss and size of data being accessed incurred additional latency." + }, + { + "ArchStdEvent": "MEM_ACCESS_CHECKED", + "PublicDescription": "Counts the number of memory read and write a= ccesses counted by MEM_ACCESS that are tag checked by the Memory Tagging Ex= tension (MTE). This event is implemented as the sum of MEM_ACCESS_CHECKED_R= D and MEM_ACCESS_CHECKED_WR" + }, + { + "ArchStdEvent": "MEM_ACCESS_CHECKED_RD", + "PublicDescription": "Counts the number of memory read accesses in= a cycle that are tag checked by the Memory Tagging Extension (MTE)." + }, + { + "ArchStdEvent": "MEM_ACCESS_CHECKED_WR", + "PublicDescription": "Counts the number of memory write accesses i= n a cycle that is tag checked by the Memory Tagging Extension (MTE)." + }, + { + "ArchStdEvent": "INST_FETCH_PERCYC", + "PublicDescription": "Counts number of instruction fetches outstan= ding per cycle, which will provide an average latency of instruction fetch." + }, + { + "ArchStdEvent": "MEM_ACCESS_RD_PERCYC", + "PublicDescription": "Counts the number of outstanding loads or me= mory read accesses per cycle." + }, + { + "ArchStdEvent": "INST_FETCH", + "PublicDescription": "Counts Instruction memory accesses that the = PE makes." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json = b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json new file mode 100644 index 000000000000..d8e8b5155cfa --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json @@ -0,0 +1,436 @@ +[ + { + "ArchStdEvent": "backend_bound" + }, + { + "MetricName": "backend_busy_bound", + "MetricExpr": "STALL_BACKEND_BUSY / STALL_BACKEND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to issue queues being full to accept operations= for execution.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_cache_l1d_bound", + "MetricExpr": "STALL_BACKEND_L1D / (STALL_BACKEND_L1D + STALL_BACK= END_MEM) * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to memory access latency issues caused by level= 1 data cache misses.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_cache_l2d_bound", + "MetricExpr": "STALL_BACKEND_MEM / (STALL_BACKEND_L1D + STALL_BACK= END_MEM) * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to memory access latency issues caused by level= 2 data cache misses.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_core_bound", + "MetricExpr": "STALL_BACKEND_CPUBOUND / STALL_BACKEND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to backend core resource constraints not relate= d to instruction fetch latency issues caused by memory access components.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_core_rename_bound", + "MetricExpr": "STALL_BACKEND_RENAME / STALL_BACKEND_CPUBOUND * 100= ", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend as the rename unit registers are unavailable.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_mem_bound", + "MetricExpr": "STALL_BACKEND_MEMBOUND / STALL_BACKEND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to backend core resource constraints related to= memory access latency issues caused by memory access components.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_mem_cache_bound", + "MetricExpr": "(STALL_BACKEND_L1D + STALL_BACKEND_MEM) / STALL_BAC= KEND_MEMBOUND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to memory latency issues caused by data cache m= isses.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_mem_store_bound", + "MetricExpr": "STALL_BACKEND_ST / STALL_BACKEND_MEMBOUND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to memory write pending caused by stores stall= ed in the pre-commit stage.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_mem_tlb_bound", + "MetricExpr": "STALL_BACKEND_TLB / STALL_BACKEND_MEMBOUND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to memory access latency issues caused by data = TLB misses.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_stalled_cycles", + "MetricExpr": "STALL_BACKEND / CPU_CYCLES * 100", + "BriefDescription": "This metric is the percentage of cycles that = were stalled due to resource constraints in the backend unit of the process= or.", + "MetricGroup": "Cycle_Accounting", + "ScaleUnit": "1percent of cycles" + }, + { + "ArchStdEvent": "bad_speculation", + "MetricExpr": "(1 - STALL_SLOT / (10 * CPU_CYCLES)) * (1 - OP_RETI= RED / OP_SPEC) * 100 + STALL_FRONTEND_FLUSH / CPU_CYCLES * 100" + }, + { + "MetricName": "barrier_percentage", + "MetricExpr": "(ISB_SPEC + DSB_SPEC + DMB_SPEC) / INST_SPEC * 100", + "BriefDescription": "This metric measures instruction and data bar= rier operations as a percentage of operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "branch_direct_ratio", + "MetricExpr": "BR_IMMED_RETIRED / BR_RETIRED", + "BriefDescription": "This metric measures the ratio of direct bran= ches retired to the total number of branches architecturally executed.", + "MetricGroup": "Branch_Effectiveness", + "ScaleUnit": "1per branch" + }, + { + "MetricName": "branch_indirect_ratio", + "MetricExpr": "BR_IND_RETIRED / BR_RETIRED", + "BriefDescription": "This metric measures the ratio of indirect br= anches retired, including function returns, to the total number of branches= architecturally executed.", + "MetricGroup": "Branch_Effectiveness", + "ScaleUnit": "1per branch" + }, + { + "MetricName": "branch_misprediction_ratio", + "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED", + "BriefDescription": "This metric measures the ratio of branches mi= spredicted to the total number of branches architecturally executed. This g= ives an indication of the effectiveness of the branch prediction unit.", + "MetricGroup": "Miss_Ratio;Branch_Effectiveness", + "ScaleUnit": "100percent of branches" + }, + { + "MetricName": "branch_mpki", + "MetricExpr": "BR_MIS_PRED_RETIRED / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of branch mis= predictions per thousand instructions executed.", + "MetricGroup": "MPKI;Branch_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "branch_return_ratio", + "MetricExpr": "BR_RETURN_RETIRED / BR_RETIRED", + "BriefDescription": "This metric measures the ratio of branches re= tired that are function returns to the total number of branches architectur= ally executed.", + "MetricGroup": "Branch_Effectiveness", + "ScaleUnit": "1per branch" + }, + { + "MetricName": "crypto_percentage", + "MetricExpr": "CRYPTO_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures crypto operations as a p= ercentage of operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "dtlb_mpki", + "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of data TLB W= alks per thousand instructions executed.", + "MetricGroup": "MPKI;DTLB_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "dtlb_walk_ratio", + "MetricExpr": "DTLB_WALK / L1D_TLB", + "BriefDescription": "This metric measures the ratio of data TLB Wa= lks to the total number of data TLB accesses. This gives an indication of t= he effectiveness of the data TLB accesses.", + "MetricGroup": "Miss_Ratio;DTLB_Effectiveness", + "ScaleUnit": "100percent of TLB accesses" + }, + { + "MetricName": "fp16_percentage", + "MetricExpr": "FP_HP_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures half-precision floating = point operations as a percentage of operations speculatively executed.", + "MetricGroup": "FP_Precision_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "fp32_percentage", + "MetricExpr": "FP_SP_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures single-precision floatin= g point operations as a percentage of operations speculatively executed.", + "MetricGroup": "FP_Precision_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "fp64_percentage", + "MetricExpr": "FP_DP_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures double-precision floatin= g point operations as a percentage of operations speculatively executed.", + "MetricGroup": "FP_Precision_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "fp_ops_per_cycle", + "MetricExpr": "(FP_SCALE_OPS_SPEC + FP_FIXED_OPS_SPEC) / CPU_CYCLE= S", + "BriefDescription": "This metric measures floating point operation= s per cycle in any precision performed by any instruction. Operations are c= ounted by computation and by vector lanes, fused computations such as multi= ply-add count as twice per vector lane for example.", + "MetricGroup": "FP_Arithmetic_Intensity", + "ScaleUnit": "1operations per cycle" + }, + { + "MetricName": "frontend_cache_l1i_bound", + "MetricExpr": "STALL_FRONTEND_L1I / (STALL_FRONTEND_L1I + STALL_FR= ONTEND_MEM) * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to memory access latency issues caused by leve= l 1 instruction cache misses.", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_cache_l2i_bound", + "MetricExpr": "STALL_FRONTEND_MEM / (STALL_FRONTEND_L1I + STALL_FR= ONTEND_MEM) * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to memory access latency issues caused by leve= l 2 instruction cache misses.", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_core_bound", + "MetricExpr": "STALL_FRONTEND_CPUBOUND / STALL_FRONTEND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to frontend core resource constraints not rela= ted to instruction fetch latency issues caused by memory access components.= ", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_core_flush_bound", + "MetricExpr": "STALL_FRONTEND_FLUSH / STALL_FRONTEND_CPUBOUND * 10= 0", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend as the processor is recovering from a pipeline flu= sh caused by bad speculation or other machine resteers.", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_mem_bound", + "MetricExpr": "STALL_FRONTEND_MEMBOUND / STALL_FRONTEND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to frontend core resource constraints related = to the instruction fetch latency issues caused by memory access components.= ", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_mem_cache_bound", + "MetricExpr": "(STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) / STALL_F= RONTEND_MEMBOUND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to instruction fetch latency issues caused by = instruction cache misses.", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_mem_tlb_bound", + "MetricExpr": "STALL_FRONTEND_TLB / STALL_FRONTEND_MEMBOUND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to instruction fetch latency issues caused by = instruction TLB misses.", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_stalled_cycles", + "MetricExpr": "STALL_FRONTEND / CPU_CYCLES * 100", + "BriefDescription": "This metric is the percentage of cycles that = were stalled due to resource constraints in the frontend unit of the proces= sor.", + "MetricGroup": "Cycle_Accounting", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "integer_dp_percentage", + "MetricExpr": "DP_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures scalar integer operation= s as a percentage of operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "ipc", + "MetricExpr": "INST_RETIRED / CPU_CYCLES", + "BriefDescription": "This metric measures the number of instructio= ns retired per cycle.", + "MetricGroup": "General", + "ScaleUnit": "1per cycle" + }, + { + "MetricName": "itlb_mpki", + "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of instructio= n TLB Walks per thousand instructions executed.", + "MetricGroup": "MPKI;ITLB_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "itlb_walk_ratio", + "MetricExpr": "ITLB_WALK / L1I_TLB", + "BriefDescription": "This metric measures the ratio of instruction= TLB Walks to the total number of instruction TLB accesses. This gives an i= ndication of the effectiveness of the instruction TLB accesses.", + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness", + "ScaleUnit": "100percent of TLB accesses" + }, + { + "MetricName": "l1d_cache_miss_ratio", + "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE", + "BriefDescription": "This metric measures the ratio of level 1 dat= a cache accesses missed to the total number of level 1 data cache accesses.= This gives an indication of the effectiveness of the level 1 data cache.", + "MetricGroup": "Miss_Ratio;L1D_Cache_Effectiveness", + "ScaleUnit": "100percent of cache accesses" + }, + { + "MetricName": "l1d_cache_mpki", + "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of level 1 da= ta cache accesses missed per thousand instructions executed.", + "MetricGroup": "MPKI;L1D_Cache_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "l1d_tlb_miss_ratio", + "MetricExpr": "L1D_TLB_REFILL / L1D_TLB", + "BriefDescription": "This metric measures the ratio of level 1 dat= a TLB accesses missed to the total number of level 1 data TLB accesses. Thi= s gives an indication of the effectiveness of the level 1 data TLB.", + "MetricGroup": "Miss_Ratio;DTLB_Effectiveness", + "ScaleUnit": "100percent of TLB accesses" + }, + { + "MetricName": "l1d_tlb_mpki", + "MetricExpr": "L1D_TLB_REFILL / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of level 1 da= ta TLB accesses missed per thousand instructions executed.", + "MetricGroup": "MPKI;DTLB_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "l1i_cache_miss_ratio", + "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE", + "BriefDescription": "This metric measures the ratio of level 1 ins= truction cache accesses missed to the total number of level 1 instruction c= ache accesses. This gives an indication of the effectiveness of the level 1= instruction cache.", + "MetricGroup": "Miss_Ratio;L1I_Cache_Effectiveness", + "ScaleUnit": "100percent of cache accesses" + }, + { + "MetricName": "l1i_cache_mpki", + "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of level 1 in= struction cache accesses missed per thousand instructions executed.", + "MetricGroup": "MPKI;L1I_Cache_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "l1i_tlb_miss_ratio", + "MetricExpr": "L1I_TLB_REFILL / L1I_TLB", + "BriefDescription": "This metric measures the ratio of level 1 ins= truction TLB accesses missed to the total number of level 1 instruction TLB= accesses. This gives an indication of the effectiveness of the level 1 ins= truction TLB.", + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness", + "ScaleUnit": "100percent of TLB accesses" + }, + { + "MetricName": "l1i_tlb_mpki", + "MetricExpr": "L1I_TLB_REFILL / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of level 1 in= struction TLB accesses missed per thousand instructions executed.", + "MetricGroup": "MPKI;ITLB_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "l2_cache_miss_ratio", + "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE", + "BriefDescription": "This metric measures the ratio of level 2 cac= he accesses missed to the total number of level 2 cache accesses. This give= s an indication of the effectiveness of the level 2 cache, which is a unifi= ed cache that stores both data and instruction. Note that cache accesses in= this cache are either data memory access or instruction fetch as this is a= unified cache.", + "MetricGroup": "Miss_Ratio;L2_Cache_Effectiveness", + "ScaleUnit": "100percent of cache accesses" + }, + { + "MetricName": "l2_cache_mpki", + "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of level 2 un= ified cache accesses missed per thousand instructions executed. Note that c= ache accesses in this cache are either data memory access or instruction fe= tch as this is a unified cache.", + "MetricGroup": "MPKI;L2_Cache_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "l2_tlb_miss_ratio", + "MetricExpr": "L2D_TLB_REFILL / L2D_TLB", + "BriefDescription": "This metric measures the ratio of level 2 uni= fied TLB accesses missed to the total number of level 2 unified TLB accesse= s. This gives an indication of the effectiveness of the level 2 TLB.", + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness;DTLB_Effectiveness", + "ScaleUnit": "100percent of TLB accesses" + }, + { + "MetricName": "l2_tlb_mpki", + "MetricExpr": "L2D_TLB_REFILL / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of level 2 un= ified TLB accesses missed per thousand instructions executed.", + "MetricGroup": "MPKI;ITLB_Effectiveness;DTLB_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "ll_cache_read_hit_ratio", + "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD", + "BriefDescription": "This metric measures the ratio of last level = cache read accesses hit in the cache to the total number of last level cach= e accesses. This gives an indication of the effectiveness of the last level= cache for read traffic. Note that cache accesses in this cache are either = data memory access or instruction fetch as this is a system level cache.", + "MetricGroup": "LL_Cache_Effectiveness", + "ScaleUnit": "100percent of cache accesses" + }, + { + "MetricName": "ll_cache_read_miss_ratio", + "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD", + "BriefDescription": "This metric measures the ratio of last level = cache read accesses missed to the total number of last level cache accesses= . This gives an indication of the effectiveness of the last level cache for= read traffic. Note that cache accesses in this cache are either data memor= y access or instruction fetch as this is a system level cache.", + "MetricGroup": "Miss_Ratio;LL_Cache_Effectiveness", + "ScaleUnit": "100percent of cache accesses" + }, + { + "MetricName": "ll_cache_read_mpki", + "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of last level= cache read accesses missed per thousand instructions executed.", + "MetricGroup": "MPKI;LL_Cache_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "load_percentage", + "MetricExpr": "LD_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures load operations as a per= centage of operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "nonsve_fp_ops_per_cycle", + "MetricExpr": "FP_FIXED_OPS_SPEC / CPU_CYCLES", + "BriefDescription": "This metric measures floating point operation= s per cycle in any precision performed by an instruction that is not an SVE= instruction. Operations are counted by computation and by vector lanes, fu= sed computations such as multiply-add count as twice per vector lane for ex= ample.", + "MetricGroup": "FP_Arithmetic_Intensity", + "ScaleUnit": "1operations per cycle" + }, + { + "MetricName": "scalar_fp_percentage", + "MetricExpr": "VFP_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures scalar floating point op= erations as a percentage of operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "simd_percentage", + "MetricExpr": "ASE_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures advanced SIMD operations= as a percentage of total operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "store_percentage", + "MetricExpr": "ST_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures store operations as a pe= rcentage of operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "sve_all_percentage", + "MetricExpr": "SVE_INST_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures scalable vector operatio= ns, including loads and stores, as a percentage of operations speculatively= executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "sve_fp_ops_per_cycle", + "MetricExpr": "FP_SCALE_OPS_SPEC / CPU_CYCLES", + "BriefDescription": "This metric measures floating point operation= s per cycle in any precision performed by SVE instructions. Operations are = counted by computation and by vector lanes, fused computations such as mult= iply-add count as twice per vector lane for example.", + "MetricGroup": "FP_Arithmetic_Intensity", + "ScaleUnit": "1operations per cycle" + }, + { + "MetricName": "sve_predicate_empty_percentage", + "MetricExpr": "SVE_PRED_EMPTY_SPEC / SVE_PRED_SPEC * 100", + "BriefDescription": "This metric measures scalable vector operatio= ns with no active predicates as a percentage of sve predicated operations s= peculatively executed.", + "MetricGroup": "SVE_Effectiveness", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "sve_predicate_full_percentage", + "MetricExpr": "SVE_PRED_FULL_SPEC / SVE_PRED_SPEC * 100", + "BriefDescription": "This metric measures scalable vector operatio= ns with all active predicates as a percentage of sve predicated operations = speculatively executed.", + "MetricGroup": "SVE_Effectiveness", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "sve_predicate_partial_percentage", + "MetricExpr": "SVE_PRED_PARTIAL_SPEC / SVE_PRED_SPEC * 100", + "BriefDescription": "This metric measures scalable vector operatio= ns with at least one active predicates as a percentage of sve predicated op= erations speculatively executed.", + "MetricGroup": "SVE_Effectiveness", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "sve_predicate_percentage", + "MetricExpr": "SVE_PRED_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures scalable vector operatio= ns with predicates as a percentage of operations speculatively executed.", + "MetricGroup": "SVE_Effectiveness", + "ScaleUnit": "1percent of operations" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json b/to= ols/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json new file mode 100644 index 000000000000..d8b7b9f9e5fa --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json @@ -0,0 +1,8 @@ +[ + { + "ArchStdEvent": "PMU_OVFS" + }, + { + "ArchStdEvent": "PMU_HOVFS" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json = b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json new file mode 100644 index 000000000000..69f9a0b0c7ff --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json @@ -0,0 +1,90 @@ +[ + { + "ArchStdEvent": "SW_INCR", + "PublicDescription": "Counts software writes to the PMSWINC_EL0 (s= oftware PMU increment) register. The PMSWINC_EL0 register is a manually upd= ated counter for use by application software.\n\nThis event could be used t= o measure any user program event, such as accesses to a particular data str= ucture (by writing to the PMSWINC_EL0 register each time the data structure= is accessed).\n\nTo use the PMSWINC_EL0 register and event, developers mus= t insert instructions that write to the PMSWINC_EL0 register into the sourc= e code.\n\nSince the SW_INCR event records writes to the PMSWINC_EL0 regist= er, there is no need to do a read/increment/write sequence to the PMSWINC_E= L0 register." + }, + { + "ArchStdEvent": "INST_RETIRED", + "PublicDescription": "Counts instructions that have been architect= urally executed." + }, + { + "ArchStdEvent": "CID_WRITE_RETIRED", + "PublicDescription": "Counts architecturally executed writes to th= e CONTEXTIDR_EL1 register, which usually contain the kernel PID and can be = output with hardware trace." + }, + { + "ArchStdEvent": "PC_WRITE_RETIRED", + "PublicDescription": "Counts branch instructions that caused a cha= nge of Program Counter, which effectively causes a change in the control fl= ow of the program." + }, + { + "ArchStdEvent": "BR_IMMED_RETIRED", + "PublicDescription": "Counts architecturally executed direct branc= hes." + }, + { + "ArchStdEvent": "BR_RETURN_RETIRED", + "PublicDescription": "Counts architecturally executed procedure re= turns." + }, + { + "ArchStdEvent": "TTBR_WRITE_RETIRED", + "PublicDescription": "Counts architectural writes to TTBR0/1_EL1. = If virtualization host extensions are enabled (by setting the HCR_EL2.E2H b= it to 1), then accesses to TTBR0/1_EL1 that are redirected to TTBR0/1_EL2, = or accesses to TTBR0/1_EL12, are counted. TTBRn registers are typically upd= ated when the kernel is swapping user-space threads or applications." + }, + { + "ArchStdEvent": "BR_RETIRED", + "PublicDescription": "Counts architecturally executed branches, wh= ether the branch is taken or not. Instructions that explicitly write to the= PC are also counted. Note that exception generating instructions, exceptio= n return instructions and context synchronization instructions are not coun= ted." + }, + { + "ArchStdEvent": "BR_MIS_PRED_RETIRED", + "PublicDescription": "Counts branches counted by BR_RETIRED which = were mispredicted and caused a pipeline flush." + }, + { + "ArchStdEvent": "OP_RETIRED", + "PublicDescription": "Counts micro-operations that are architectur= ally executed. This is a count of number of micro-operations retired from t= he commit queue in a single cycle." + }, + { + "ArchStdEvent": "BR_IMMED_TAKEN_RETIRED", + "PublicDescription": "Counts architecturally executed immediate br= anches that were taken." + }, + { + "ArchStdEvent": "BR_INDNR_TAKEN_RETIRED", + "PublicDescription": "Counts architecturally executed indirect bra= nches excluding procedure returns that were taken." + }, + { + "ArchStdEvent": "BR_IMMED_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed direct branc= hes that were correctly predicted." + }, + { + "ArchStdEvent": "BR_IMMED_MIS_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed direct branc= hes that were mispredicted and caused a pipeline flush." + }, + { + "ArchStdEvent": "BR_IND_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed indirect bra= nches including procedure returns that were correctly predicted." + }, + { + "ArchStdEvent": "BR_IND_MIS_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed indirect bra= nches including procedure returns that were mispredicted and caused a pipel= ine flush." + }, + { + "ArchStdEvent": "BR_RETURN_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed procedure re= turns that were correctly predicted." + }, + { + "ArchStdEvent": "BR_RETURN_MIS_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed procedure re= turns that were mispredicted and caused a pipeline flush." + }, + { + "ArchStdEvent": "BR_INDNR_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed indirect bra= nches excluding procedure returns that were correctly predicted." + }, + { + "ArchStdEvent": "BR_INDNR_MIS_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed indirect bra= nches excluding procedure returns that were mispredicted and caused a pipel= ine flush." + }, + { + "ArchStdEvent": "BR_PRED_RETIRED", + "PublicDescription": "Counts branch instructions counted by BR_RET= IRED which were correctly predicted." + }, + { + "ArchStdEvent": "BR_IND_RETIRED", + "PublicDescription": "Counts architecturally executed indirect bra= nches including procedure returns." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json b/to= ols/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json new file mode 100644 index 000000000000..ca0217fa4681 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json @@ -0,0 +1,42 @@ +[ + { + "ArchStdEvent": "SAMPLE_POP", + "PublicDescription": "Counts statistical profiling sample populati= on, the count of all operations that could be sampled but may or may not be= chosen for sampling." + }, + { + "ArchStdEvent": "SAMPLE_FEED", + "PublicDescription": "Counts statistical profiling samples taken f= or sampling." + }, + { + "ArchStdEvent": "SAMPLE_FILTRATE", + "PublicDescription": "Counts statistical profiling samples taken w= hich are not removed by filtering." + }, + { + "ArchStdEvent": "SAMPLE_COLLISION", + "PublicDescription": "Counts statistical profiling samples that ha= ve collided with a previous sample and so therefore not taken." + }, + { + "ArchStdEvent": "SAMPLE_FEED_BR", + "PublicDescription": "Counts statistical profiling samples taken w= hich are branches." + }, + { + "ArchStdEvent": "SAMPLE_FEED_LD", + "PublicDescription": "Counts statistical profiling samples taken w= hich are loads or load atomic operations." + }, + { + "ArchStdEvent": "SAMPLE_FEED_ST", + "PublicDescription": "Counts statistical profiling samples taken w= hich are stores or store atomic operations." + }, + { + "ArchStdEvent": "SAMPLE_FEED_OP", + "PublicDescription": "Counts statistical profiling samples taken w= hich are matching any operation type filters supported." + }, + { + "ArchStdEvent": "SAMPLE_FEED_EVENT", + "PublicDescription": "Counts statistical profiling samples taken w= hich are matching event packet filter constraints." + }, + { + "ArchStdEvent": "SAMPLE_FEED_LAT", + "PublicDescription": "Counts statistical profiling samples taken w= hich are exceeding minimum latency set by operation latency filter constrai= nts." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operatio= n.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.js= on new file mode 100644 index 000000000000..f91eb18d683c --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json @@ -0,0 +1,90 @@ +[ + { + "ArchStdEvent": "BR_MIS_PRED", + "PublicDescription": "Counts branches which are speculatively exec= uted and mispredicted." + }, + { + "ArchStdEvent": "BR_PRED", + "PublicDescription": "Counts all speculatively executed branches." + }, + { + "ArchStdEvent": "INST_SPEC", + "PublicDescription": "Counts operations that have been speculative= ly executed." + }, + { + "ArchStdEvent": "OP_SPEC", + "PublicDescription": "Counts micro-operations speculatively execut= ed. This is the count of the number of micro-operations dispatched in a cyc= le." + }, + { + "ArchStdEvent": "STREX_FAIL_SPEC", + "PublicDescription": "Counts store-exclusive operations that have = been speculatively executed and have not successfully completed the store o= peration." + }, + { + "ArchStdEvent": "STREX_SPEC", + "PublicDescription": "Counts store-exclusive operations that have = been speculatively executed." + }, + { + "ArchStdEvent": "LD_SPEC", + "PublicDescription": "Counts speculatively executed load operation= s including Single Instruction Multiple Data (SIMD) load operations." + }, + { + "ArchStdEvent": "ST_SPEC", + "PublicDescription": "Counts speculatively executed store operatio= ns including Single Instruction Multiple Data (SIMD) store operations." + }, + { + "ArchStdEvent": "DP_SPEC", + "PublicDescription": "Counts speculatively executed logical or ari= thmetic instructions such as MOV/MVN operations." + }, + { + "ArchStdEvent": "ASE_SPEC", + "PublicDescription": "Counts speculatively executed Advanced SIMD = operations excluding load, store and move micro-operations that move data t= o or from SIMD (vector) registers." + }, + { + "ArchStdEvent": "VFP_SPEC", + "PublicDescription": "Counts speculatively executed floating point= operations. This event does not count operations that move data to or from= floating point (vector) registers." + }, + { + "ArchStdEvent": "PC_WRITE_SPEC", + "PublicDescription": "Counts speculatively executed operations whi= ch cause software changes of the PC. Those operations include all taken bra= nch operations." + }, + { + "ArchStdEvent": "CRYPTO_SPEC", + "PublicDescription": "Counts speculatively executed cryptographic = operations except for PMULL and VMULL operations." + }, + { + "ArchStdEvent": "ISB_SPEC", + "PublicDescription": "Counts ISB operations that are executed." + }, + { + "ArchStdEvent": "DSB_SPEC", + "PublicDescription": "Counts DSB operations that are speculatively= issued to Load/Store unit in the CPU." + }, + { + "ArchStdEvent": "DMB_SPEC", + "PublicDescription": "Counts DMB operations that are speculatively= issued to the Load/Store unit in the CPU. This event does not count implie= d barriers from load acquire/store release operations." + }, + { + "ArchStdEvent": "RC_LD_SPEC", + "PublicDescription": "Counts any load acquire operations that are = speculatively executed. For example: LDAR, LDARH, LDARB" + }, + { + "ArchStdEvent": "RC_ST_SPEC", + "PublicDescription": "Counts any store release operations that are= speculatively executed. For example: STLR, STLRH, STLRB" + }, + { + "ArchStdEvent": "ASE_INST_SPEC", + "PublicDescription": "Counts speculatively executed Advanced SIMD = operations." + }, + { + "ArchStdEvent": "CAS_NEAR_PASS", + "PublicDescription": "Counts compare and swap instructions that ex= ecuted locally to the PE and updated the location accessed." + }, + { + "ArchStdEvent": "CAS_NEAR_SPEC", + "PublicDescription": "Counts compare and swap instructions that ex= ecuted locally to the PE." + }, + { + "ArchStdEvent": "CAS_FAR_SPEC", + "PublicDescription": "Counts compare and swap instructions that di= d not execute locally to the PE." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json b/= tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json new file mode 100644 index 000000000000..b1eae21bac07 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json @@ -0,0 +1,82 @@ +[ + { + "ArchStdEvent": "STALL_FRONTEND", + "PublicDescription": "Counts cycles when frontend could not send a= ny micro-operations to the rename stage because of frontend resource stalls= caused by fetch memory latency or branch prediction flow stalls. STALL_FRO= NTEND_SLOTS counts SLOTS during the cycle when this event counts." + }, + { + "ArchStdEvent": "STALL_BACKEND", + "PublicDescription": "Counts cycles whenever the rename unit is un= able to send any micro-operations to the backend of the pipeline because of= backend resource constraints. Backend resource constraints can include iss= ue stage fullness, execution stage fullness, or other internal pipeline res= ource fullness. All the backend slots were empty during the cycle when this= event counts." + }, + { + "ArchStdEvent": "STALL", + "PublicDescription": "Counts cycles when no operations are sent to= the rename unit from the frontend or from the rename unit to the backend f= or any reason (either frontend or backend stall). This event is the sum of = STALL_FRONTEND and STALL_BACKEND" + }, + { + "ArchStdEvent": "STALL_SLOT_BACKEND", + "PublicDescription": "Counts slots per cycle in which no operation= s are sent from the rename unit to the backend due to backend resource cons= traints. STALL_BACKEND counts during the cycle when STALL_SLOT_BACKEND coun= ts at least 1." + }, + { + "ArchStdEvent": "STALL_SLOT_FRONTEND", + "PublicDescription": "Counts slots per cycle in which no operation= s are sent to the rename unit from the frontend due to frontend resource co= nstraints." + }, + { + "ArchStdEvent": "STALL_SLOT", + "PublicDescription": "Counts slots per cycle in which no operation= s are sent to the rename unit from the frontend or from the rename unit to = the backend for any reason (either frontend or backend stall). STALL_SLOT i= s the sum of STALL_SLOT_FRONTEND and STALL_SLOT_BACKEND." + }, + { + "ArchStdEvent": "STALL_BACKEND_MEM", + "PublicDescription": "Counts cycles when the backend is stalled be= cause there is a pending demand load request in progress in the last level = core cache." + }, + { + "ArchStdEvent": "STALL_FRONTEND_MEMBOUND", + "PublicDescription": "Counts cycles when the frontend could not se= nd any micro-operations to the rename stage due to resource constraints in = the memory resources." + }, + { + "ArchStdEvent": "STALL_FRONTEND_L1I", + "PublicDescription": "Counts cycles when the frontend is stalled b= ecause there is an instruction fetch request pending in the level 1 instruc= tion cache." + }, + { + "ArchStdEvent": "STALL_FRONTEND_MEM", + "PublicDescription": "Counts cycles when the frontend is stalled b= ecause there is an instruction fetch request pending in the last level core= cache." + }, + { + "ArchStdEvent": "STALL_FRONTEND_TLB", + "PublicDescription": "Counts when the frontend is stalled on any T= LB misses being handled. This event also counts the TLB accesses made by ha= rdware prefetches." + }, + { + "ArchStdEvent": "STALL_FRONTEND_CPUBOUND", + "PublicDescription": "Counts cycles when the frontend could not se= nd any micro-operations to the rename stage due to resource constraints in = the CPU resources excluding memory resources." + }, + { + "ArchStdEvent": "STALL_FRONTEND_FLUSH", + "PublicDescription": "Counts cycles when the frontend could not se= nd any micro-operations to the rename stage as the frontend is recovering f= rom a machine flush or resteer. Example scenarios that cause a flush includ= e branch mispredictions, taken exceptions, micro-architectural flush etc." + }, + { + "ArchStdEvent": "STALL_BACKEND_MEMBOUND", + "PublicDescription": "Counts cycles when the backend could not acc= ept any micro-operations due to resource constraints in the memory resource= s." + }, + { + "ArchStdEvent": "STALL_BACKEND_L1D", + "PublicDescription": "Counts cycles when the backend is stalled be= cause there is a pending demand load request in progress in the level 1 dat= a cache." + }, + { + "ArchStdEvent": "STALL_BACKEND_TLB", + "PublicDescription": "Counts cycles when the backend is stalled on= any demand TLB misses being handled." + }, + { + "ArchStdEvent": "STALL_BACKEND_ST", + "PublicDescription": "Counts cycles when the backend is stalled an= d there is a store that has not reached the pre-commit stage." + }, + { + "ArchStdEvent": "STALL_BACKEND_CPUBOUND", + "PublicDescription": "Counts cycles when the backend could not acc= ept any micro-operations due to any resource constraints in the CPU excludi= ng memory resources." + }, + { + "ArchStdEvent": "STALL_BACKEND_BUSY", + "PublicDescription": "Counts cycles when the backend could not acc= ept any micro-operations because the issue queues are full to take any oper= ations for execution." + }, + { + "ArchStdEvent": "STALL_BACKEND_RENAME", + "PublicDescription": "Counts cycles when backend is stalled even w= hen operations are available from the frontend but at least one is not read= y to be sent to the backend because no rename register is available." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json b/to= ols/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json new file mode 100644 index 000000000000..51dab48cb2ba --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json @@ -0,0 +1,50 @@ +[ + { + "ArchStdEvent": "SVE_INST_SPEC", + "PublicDescription": "Counts speculatively executed operations tha= t are SVE operations." + }, + { + "ArchStdEvent": "SVE_PRED_SPEC", + "PublicDescription": "Counts speculatively executed predicated SVE= operations." + }, + { + "ArchStdEvent": "SVE_PRED_EMPTY_SPEC", + "PublicDescription": "Counts speculatively executed predicated SVE= operations with no active predicate elements." + }, + { + "ArchStdEvent": "SVE_PRED_FULL_SPEC", + "PublicDescription": "Counts speculatively executed predicated SVE= operations with all predicate elements active." + }, + { + "ArchStdEvent": "SVE_PRED_PARTIAL_SPEC", + "PublicDescription": "Counts speculatively executed predicated SVE= operations with at least one but not all active predicate elements." + }, + { + "ArchStdEvent": "SVE_PRED_NOT_FULL_SPEC", + "PublicDescription": "Counts speculatively executed predicated SVE= operations with at least one non active predicate elements." + }, + { + "ArchStdEvent": "SVE_LDFF_SPEC", + "PublicDescription": "Counts speculatively executed SVE first faul= t or non-fault load operations." + }, + { + "ArchStdEvent": "SVE_LDFF_FAULT_SPEC", + "PublicDescription": "Counts speculatively executed SVE first faul= t or non-fault load operations that clear at least one bit in the FFR." + }, + { + "ArchStdEvent": "ASE_SVE_INT8_SPEC", + "PublicDescription": "Counts speculatively executed Advanced SIMD = or SVE integer operations with the largest data type an 8-bit integer." + }, + { + "ArchStdEvent": "ASE_SVE_INT16_SPEC", + "PublicDescription": "Counts speculatively executed Advanced SIMD = or SVE integer operations with the largest data type a 16-bit integer." + }, + { + "ArchStdEvent": "ASE_SVE_INT32_SPEC", + "PublicDescription": "Counts speculatively executed Advanced SIMD = or SVE integer operations with the largest data type a 32-bit integer." + }, + { + "ArchStdEvent": "ASE_SVE_INT64_SPEC", + "PublicDescription": "Counts speculatively executed Advanced SIMD = or SVE integer operations with the largest data type a 64-bit integer." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json b/to= ols/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json new file mode 100644 index 000000000000..c7aa89c2f19f --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json @@ -0,0 +1,74 @@ +[ + { + "ArchStdEvent": "L1I_TLB_REFILL", + "PublicDescription": "Counts level 1 instruction TLB refills from = any Instruction fetch. If there are multiple misses in the TLB that are res= olved by the refill, then this event only counts once. This event will not = count if the translation table walk results in a fault (such as a translati= on or access fault), since there is no new translation created for the TLB." + }, + { + "ArchStdEvent": "L1D_TLB_REFILL", + "PublicDescription": "Counts level 1 data TLB accesses that result= ed in TLB refills. If there are multiple misses in the TLB that are resolve= d by the refill, then this event only counts once. This event counts for re= fills caused by preload instructions or hardware prefetch accesses. This ev= ent counts regardless of whether the miss hits in L2 or results in a transl= ation table walk. This event will not count if the translation table walk r= esults in a fault (such as a translation or access fault), since there is n= o new translation created for the TLB. This event will not count on an acce= ss from an AT(address translation) instruction." + }, + { + "ArchStdEvent": "L1D_TLB", + "PublicDescription": "Counts level 1 data TLB accesses caused by a= ny memory load or store operation. Note that load or store instructions can= be broken up into multiple memory operations. This event does not count TL= B maintenance operations." + }, + { + "ArchStdEvent": "L1I_TLB", + "PublicDescription": "Counts level 1 instruction TLB accesses, whe= ther the access hits or misses in the TLB. This event counts both demand ac= cesses and prefetch or preload generated accesses." + }, + { + "ArchStdEvent": "L2D_TLB_REFILL", + "PublicDescription": "Counts level 2 TLB refills caused by memory = operations from both data and instruction fetch, except for those caused by= TLB maintenance operations and hardware prefetches." + }, + { + "ArchStdEvent": "L2D_TLB", + "PublicDescription": "Counts level 2 TLB accesses except those cau= sed by TLB maintenance operations." + }, + { + "ArchStdEvent": "DTLB_WALK", + "PublicDescription": "Counts number of demand data translation tab= le walks caused by a miss in the L2 TLB and performing at least one memory = access. Translation table walks are counted even if the translation ended u= p taking a translation fault for reasons different than EPD, E0PD and NFD. = Note that partial translations that cause a translation table walk are also= counted. Also note that this event counts walks triggered by software prel= oads, but not walks triggered by hardware prefetchers, and that this event = does not count walks triggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "ITLB_WALK", + "PublicDescription": "Counts number of instruction translation tab= le walks caused by a miss in the L2 TLB and performing at least one memory = access. Translation table walks are counted even if the translation ended u= p taking a translation fault for reasons different than EPD, E0PD and NFD. = Note that partial translations that cause a translation table walk are also= counted. Also note that this event does not count walks triggered by TLB m= aintenance operations." + }, + { + "ArchStdEvent": "DTLB_WALK_PERCYC", + "PublicDescription": "Counts the number of data translation table = walks in progress per cycle." + }, + { + "ArchStdEvent": "ITLB_WALK_PERCYC", + "PublicDescription": "Counts the number of instruction translation= table walks in progress per cycle." + }, + { + "ArchStdEvent": "DTLB_HWUPD", + "PublicDescription": "Counts number of memory accesses triggered b= y a data translation table walk and performing an update of a translation t= able entry. Memory accesses are counted even if the translation ended up ta= king a translation fault for reasons different than EPD, E0PD and NFD. Note= that this event counts accesses triggered by software preloads, but not ac= cesses triggered by hardware prefetchers." + }, + { + "ArchStdEvent": "ITLB_HWUPD", + "PublicDescription": "Counts number of memory accesses triggered b= y an instruction translation table walk and performing an update of a trans= lation table entry. Memory accesses are counted even if the translation end= ed up taking a translation fault for reasons different than EPD, E0PD and N= FD." + }, + { + "ArchStdEvent": "DTLB_STEP", + "PublicDescription": "Counts number of memory accesses triggered b= y a demand data translation table walk and performing a read of a translati= on table entry. Memory accesses are counted even if the translation ended u= p taking a translation fault for reasons different than EPD, E0PD and NFD. = Note that this event counts accesses triggered by software preloads, but no= t accesses triggered by hardware prefetchers." + }, + { + "ArchStdEvent": "ITLB_STEP", + "PublicDescription": "Counts number of memory accesses triggered b= y an instruction translation table walk and performing a read of a translat= ion table entry. Memory accesses are counted even if the translation ended = up taking a translation fault for reasons different than EPD, E0PD and NFD." + }, + { + "ArchStdEvent": "DTLB_WALK_LARGE", + "PublicDescription": "Counts number of demand data translation tab= le walks caused by a miss in the L2 TLB and yielding a large page. The set = of large pages is defined as all pages with a final size higher than or equ= al to 2MB. Translation table walks that end up taking a translation fault a= re not counted, as the page size would be undefined in that case. If DTLB_W= ALK_BLOCK is implemented, then it is an alias for this event in this family= . Note that partial translations that cause a translation table walk are al= so counted. Also note that this event counts walks triggered by software pr= eloads, but not walks triggered by hardware prefetchers, and that this even= t does not count walks triggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "ITLB_WALK_LARGE", + "PublicDescription": "Counts number of instruction translation tab= le walks caused by a miss in the L2 TLB and yielding a large page. The set = of large pages is defined as all pages with a final size higher than or equ= al to 2MB. Translation table walks that end up taking a translation fault a= re not counted, as the page size would be undefined in that case. In this f= amily, this is equal to ITLB_WALK_BLOCK event. Note that partial translatio= ns that cause a translation table walk are also counted. Also note that thi= s event does not count walks triggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "DTLB_WALK_SMALL", + "PublicDescription": "Counts number of data translation table walk= s caused by a miss in the L2 TLB and yielding a small page. The set of smal= l pages is defined as all pages with a final size lower than 2MB. Translati= on table walks that end up taking a translation fault are not counted, as t= he page size would be undefined in that case. If DTLB_WALK_PAGE event is im= plemented, then it is an alias for this event in this family. Note that par= tial translations that cause a translation table walk are also counted. Als= o note that this event counts walks triggered by software preloads, but not= walks triggered by hardware prefetchers, and that this event does not coun= t walks triggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "ITLB_WALK_SMALL", + "PublicDescription": "Counts number of instruction translation tab= le walks caused by a miss in the L2 TLB and yielding a small page. The set = of small pages is defined as all pages with a final size lower than 2MB. Tr= anslation table walks that end up taking a translation fault are not counte= d, as the page size would be undefined in that case. In this family, this i= s equal to ITLB_WALK_PAGE event. Note that partial translations that cause = a translation table walk are also counted. Also note that this event does n= ot count walks triggered by TLB maintenance operations." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json b/= tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json new file mode 100644 index 000000000000..33672a8711d4 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json @@ -0,0 +1,32 @@ +[ + { + "ArchStdEvent": "TRB_WRAP" + }, + { + "ArchStdEvent": "TRB_TRIG" + }, + { + "ArchStdEvent": "TRCEXTOUT0" + }, + { + "ArchStdEvent": "TRCEXTOUT1" + }, + { + "ArchStdEvent": "TRCEXTOUT2" + }, + { + "ArchStdEvent": "TRCEXTOUT3" + }, + { + "ArchStdEvent": "CTI_TRIGOUT4" + }, + { + "ArchStdEvent": "CTI_TRIGOUT5" + }, + { + "ArchStdEvent": "CTI_TRIGOUT6" + }, + { + "ArchStdEvent": "CTI_TRIGOUT7" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pmu-= events/arch/arm64/mapfile.csv index bb3fa8a33496..ccfcae375750 100644 --- a/tools/perf/pmu-events/arch/arm64/mapfile.csv +++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv @@ -33,6 +33,7 @@ 0x00000000410fd4c0,v1,arm/cortex-x1,core 0x00000000410fd460,v1,arm/cortex-a510,core 0x00000000410fd470,v1,arm/cortex-a710,core +0x00000000410fd810,v1,arm/cortex-a720,core 0x00000000410fd480,v1,arm/cortex-x2,core 0x00000000410fd490,v1,arm/neoverse-n2-v2,core 0x00000000410fd4f0,v1,arm/neoverse-n2-v2,core --=20 2.47.2 From nobody Thu Dec 18 20:18:33 2025 Received: from out162-62-58-216.mail.qq.com (out162-62-58-216.mail.qq.com [162.62.58.216]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74C0770805 for ; Thu, 13 Feb 2025 15:31:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=162.62.58.216 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739460687; cv=none; b=CroIZedLQ+rBOk6ASlqrNG9bwaxYJ9vrdAzq+OMQA7bcRQT1TZPi4EJQwizubs0SVADqjjtQcWQGIogDpyJHLT/aRUKYznR9BDXN3ZN/Xaz+XOTKE4E8m70Aqj4gSzrJGK+dQXEU+MWEu1y7WEXRvB/ZiKNGlZ3oZpKR9jy5gLo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739460687; c=relaxed/simple; bh=75WtKV97demkDyvVvsBSVI/sHIJRofn2uIxOHJA6Cfk=; h=Message-ID:From:To:Cc:Subject:Date:In-Reply-To:References: MIME-Version; b=igpcyienQXskxz7NBjhhn2WLQI0RKN6Xklp4C0u94IX/B58WclO/CylpizB8vYOqKHAuOEWPhJ66/MQ1ReuufHuOdlhaq30/EB+AE6YBWZmdGXuO99nZVNKZ6rqN/QChiCRINaiqDG3ainHaMOfs1ifcDH0EVTWQUrQvaRTQsWI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=cyyself.name; spf=pass smtp.mailfrom=cyyself.name; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b=TZPiOWwp; arc=none smtp.client-ip=162.62.58.216 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=cyyself.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cyyself.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b="TZPiOWwp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1739460375; bh=Nzt2vQH7DSa3tUtAyFdTa0J3/Ncy+l1J5I0wQF068Hc=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=TZPiOWwpGow/Bwpxw3TypvX22B4kZsojpNvHb6PdivTitrJ7luoM/9YVFP7sBApZr 0Y6JhkftpZRCqt4X2Bt2DXQHs9TqhjnW5fVFneQ+PhleDnVRcyJqjjMzQ2OJQ+b+0T ZJ9WJShutVkztaRsgFII58N+45HZAw8MABKRso1Y= Received: from cyy-pc.lan ([240e:379:2251:3600:f57b:26f9:9718:486c]) by newxmesmtplogicsvrsza36-0.qq.com (NewEsmtp) with SMTP id 338A58E8; Thu, 13 Feb 2025 23:12:56 +0800 X-QQ-mid: xmsmtpt1739459576tgw6byg21 Message-ID: X-QQ-XMAILINFO: OIJV+wUmQOUAhsKvSrCboBKxoZ5TS2wuovy/4FSdvWKxhHFyehT7Mwr6nb1BVM UGsdKQjSd1PANX6UJ+LvUiaI5Ve38sAMRV1nELAOdZKCXB6HAc2aJLpa3qdYAUZ+3/huaJiMNQY8 CIFAF/EDJGU+39weutf9O5zhpL9MGQPCzhDuPlBc9OSSicHrtM+0A67bIzNIHgCnJmYQM5Xu6h3O c4lv7B5uUt5uoeT6EWnncKgZt8gSyh5W2Vqqw36qFp3hfGwNKn9PgsptjFsYkzUD7d3ffqwC5bu/ YKsYanM1qbZSVtfFgfCSGaSgDqpm5/jImpQk0IWI+ekCSpqyrTUJ+jEF38KzpFsUjwvYlVk5Xih4 VQ276N6xp80lUbBuEorLanWDG3nPmtlHPWBimaoUe/x5p5TGPKIVzgwaZQ2MHWXrCu75B7tJmGb3 uOJGJaHtgkRSnVcNWIKo/+bbLP2/0B2CrtcbMcWmRc3UfzNo4UXwJ6+FQ/RzwJ9pzrwdc00+G+eZ 3UBg/ONATwEN0AHv92nZPp54K0N7bHljJOVVEy3cRng5lUbkSn4VYnncL/OugO72pd7Ki1zUSBOI sU0+4+Y0b7sWTnNScFfBt/iaNXuRJiY52Hw5HY7SzSixEppM2gTsKfbBTDkVzHoTcAmInwxu2bdJ R1EnHIKts4WWRvngj1ljy+YRkL+7bN5mLZSkDvsMRvM+yipn0RTQQvlIYlOJfChPMK83AiksDLXo cwKdPqct8AA6B8iWtmBUBO+s/YukeH8e+3Pu9qvkkJwHj6wCFoYTqql7WdqXZrcimzlc+92aIoEa 3QqVBkMQHANBhzN1STF4uSzOslw8eVfnKnhUoq/lMRx9nP1JGm0q5UBfq5OQACzrDprS1lmDkVYv bQMLfTjgdvjoXOMJphY6fhE4fOWRYAYZvIHoe86+niC1JQkE2Odp4mCzftSFwKYotQR7GLp2brry tNr8QZtLugy3M1MoACDtsNOeweUhRzdHQrT5n6nLf4VBJOk8aR9yGjB6II1U/gv09rqjOXlXF2b2 1XAwWUFg/TcwzR/mrNJjU6dtlPrzNvjPlkKqhMS8yEjF3sSVWQ X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: Yangyu Chen To: linux-perf-users@vger.kernel.org Cc: John Garry , Will Deacon , James Clark , Mike Leach , Leo Yan , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Liang Kan , Yoshihiro Furudera , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Yangyu Chen Subject: [PATCH 2/2] perf vendor events arm64: Add Cortex-A520 events/metrics Date: Thu, 13 Feb 2025 23:12:52 +0800 X-OQ-MSGID: <20250213151252.187475-1-cyy@cyyself.name> X-Mailer: git-send-email 2.47.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add JSON files for Cortex-A520 events and metrics. Using the existing Neoverse N3 JSON files as a template, I manually checked the missing and extra events/metrics using my script [1] and modified them according to the Arm Cortex-A520 Core Technical Reference Manual [2]. [1] https://github.com/cyyself/arm-pmu-check/tree/1075bebeb3f1441067448251a= 387df35af15bf16 [2] https://developer.arm.com/documentation/102517/0004/Performance-Monitor= s-Extension-support-/Performance-monitors-events/Common-event-PMU-events Signed-off-by: Yangyu Chen --- .../arch/arm64/arm/cortex-a520/bus.json | 26 ++ .../arch/arm64/arm/cortex-a520/exception.json | 18 + .../arm64/arm/cortex-a520/fp_operation.json | 14 + .../arch/arm64/arm/cortex-a520/general.json | 6 + .../arch/arm64/arm/cortex-a520/l1d_cache.json | 50 +++ .../arch/arm64/arm/cortex-a520/l1i_cache.json | 14 + .../arch/arm64/arm/cortex-a520/l2_cache.json | 46 +++ .../arch/arm64/arm/cortex-a520/l3_cache.json | 21 + .../arch/arm64/arm/cortex-a520/ll_cache.json | 10 + .../arch/arm64/arm/cortex-a520/memory.json | 58 +++ .../arch/arm64/arm/cortex-a520/metrics.json | 373 ++++++++++++++++++ .../arch/arm64/arm/cortex-a520/pmu.json | 8 + .../arch/arm64/arm/cortex-a520/retired.json | 90 +++++ .../arm64/arm/cortex-a520/spec_operation.json | 70 ++++ .../arch/arm64/arm/cortex-a520/stall.json | 82 ++++ .../arch/arm64/arm/cortex-a520/sve.json | 22 ++ .../arch/arm64/arm/cortex-a520/tlb.json | 78 ++++ .../arch/arm64/arm/cortex-a520/trace.json | 32 ++ .../arch/arm64/common-and-microarch.json | 15 + tools/perf/pmu-events/arch/arm64/mapfile.csv | 1 + 20 files changed, 1034 insertions(+) create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.js= on create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/except= ion.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_ope= ration.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/genera= l.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_ca= che.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_ca= che.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cac= he.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cac= he.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cac= he.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory= .json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metric= s.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.js= on create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retire= d.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_o= peration.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.= json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.js= on create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.js= on create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.= json diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json b/to= ols/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json new file mode 100644 index 000000000000..884e42ab6a49 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json @@ -0,0 +1,26 @@ +[ + { + "ArchStdEvent": "BUS_ACCESS", + "PublicDescription": "Counts memory transactions issued by the CPU= to the external bus, including snoop requests and snoop responses. Each be= at of data is counted individually." + }, + { + "ArchStdEvent": "BUS_CYCLES", + "PublicDescription": "Counts bus cycles in the CPU. Bus cycles rep= resent a clock cycle in which a transaction could be sent or received on th= e interface from the CPU to the external bus. Since that interface is drive= n at the same clock speed as the CPU, this event is a duplicate of CPU_CYCL= ES." + }, + { + "ArchStdEvent": "BUS_ACCESS_RD", + "PublicDescription": "Counts memory read transactions seen on the = external bus. Each beat of data is counted individually." + }, + { + "ArchStdEvent": "BUS_ACCESS_WR", + "PublicDescription": "Counts memory write transactions seen on the= external bus. Each beat of data is counted individually." + }, + { + "ArchStdEvent": "BUS_REQ_RD_PERCYC", + "PublicDescription": "Bus read transactions in progress." + }, + { + "ArchStdEvent": "BUS_REQ_RD", + "BriefDescription": "Bus request, read" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.jso= n b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json new file mode 100644 index 000000000000..fbe580e15c2e --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json @@ -0,0 +1,18 @@ +[ + { + "ArchStdEvent": "EXC_TAKEN", + "PublicDescription": "Counts any taken architecturally visible exc= eptions such as IRQ, FIQ, SError, and other synchronous exceptions. Excepti= ons are counted whether or not they are taken locally." + }, + { + "ArchStdEvent": "EXC_RETURN", + "PublicDescription": "Counts any architecturally executed exceptio= n return instructions. For example: AArch64: ERET" + }, + { + "ArchStdEvent": "EXC_IRQ", + "PublicDescription": "Counts IRQ exceptions including the virtual = IRQs that are taken locally." + }, + { + "ArchStdEvent": "EXC_FIQ", + "PublicDescription": "Counts FIQ exceptions including the virtual = FIQs that are taken locally." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.= json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json new file mode 100644 index 000000000000..da0c4b05ad5b --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json @@ -0,0 +1,14 @@ +[ + { + "ArchStdEvent": "FP_HP_SPEC", + "PublicDescription": "Counts speculatively executed half precision= floating point operations." + }, + { + "ArchStdEvent": "FP_SP_SPEC", + "PublicDescription": "Counts speculatively executed single precisi= on floating point operations." + }, + { + "ArchStdEvent": "FP_DP_SPEC", + "PublicDescription": "Counts speculatively executed double precisi= on floating point operations." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json = b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json new file mode 100644 index 000000000000..20fada95ef97 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json @@ -0,0 +1,6 @@ +[ + { + "ArchStdEvent": "CPU_CYCLES", + "PublicDescription": "Counts CPU clock cycles (not timer cycles). = The clock measured by this event is defined as the physical clock driving t= he CPU logic." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.jso= n b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json new file mode 100644 index 000000000000..90e871c8986a --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json @@ -0,0 +1,50 @@ +[ + { + "ArchStdEvent": "L1D_CACHE_REFILL", + "PublicDescription": "Counts level 1 data cache refills caused by = speculatively executed load or store operations that missed in the level 1 = data cache. This event only counts one event per cache line." + }, + { + "ArchStdEvent": "L1D_CACHE", + "PublicDescription": "Counts level 1 data cache accesses from any = load/store operations. Atomic operations that resolve in the CPUs caches (n= ear atomic operations) counts as both a write access and read access. Each = access to a cache line is counted including the multiple accesses caused by= single instructions such as LDM or STM. Each access to other level 1 data = or unified memory structures, for example refill buffers, write buffers, an= d write-back buffers, are also counted." + }, + { + "ArchStdEvent": "L1D_CACHE_WB", + "PublicDescription": "Counts write-backs of dirty data from the L1= data cache to the L2 cache. This occurs when either a dirty cache line is = evicted from L1 data cache and allocated in the L2 cache or dirty data is w= ritten to the L2 and possibly to the next level of cache. This event counts= both victim cache line evictions and cache write-backs from snoops or cach= e maintenance operations. The following cache operations are not counted:\n= \n1. Invalidations which do not result in data being transferred out of the= L1 (such as evictions of clean data),\n2. Full line writes which write to = L2 without writing L1, such as write streaming mode." + }, + { + "ArchStdEvent": "L1D_CACHE_LMISS_RD", + "PublicDescription": "Counts cache line refills into the level 1 d= ata cache from any memory read operations, that incurred additional latency= ." + }, + { + "ArchStdEvent": "L1D_CACHE_RD", + "PublicDescription": "Counts level 1 data cache accesses from any = load operation. Atomic load operations that resolve in the CPUs caches coun= ts as both a write access and read access." + }, + { + "ArchStdEvent": "L1D_CACHE_WR", + "PublicDescription": "Counts level 1 data cache accesses generated= by store operations. This event also counts accesses caused by a DC ZVA (d= ata cache zero, specified by virtual address) instruction. Near atomic oper= ations that resolve in the CPUs caches count as a write access and read acc= ess." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_RD", + "PublicDescription": "Counts level 1 data cache refills caused by = speculatively executed load instructions where the memory read operation mi= sses in the level 1 data cache. This event only counts one event per cache = line." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_WR", + "PublicDescription": "Counts level 1 data cache refills caused by = speculatively executed store instructions where the memory write operation = misses in the level 1 data cache. This event only counts one event per cach= e line." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_INNER", + "PublicDescription": "Counts level 1 data cache refills where the = cache line data came from caches inside the immediate cluster of the core." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_OUTER", + "PublicDescription": "Counts level 1 data cache refills for which = the cache line data came from outside the immediate cluster of the core, li= ke an SLC in the system interconnect or DRAM." + }, + { + "ArchStdEvent": "L1D_CACHE_HWPRF", + "PublicDescription": "Counts level 1 data cache accesses from any = load/store operations generated by the hardware prefetcher." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_HWPRF", + "PublicDescription": "Counts level 1 data cache refills where the = cache line is requested by a hardware prefetcher." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.jso= n b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json new file mode 100644 index 000000000000..633f1030359d --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json @@ -0,0 +1,14 @@ +[ + { + "ArchStdEvent": "L1I_CACHE_REFILL", + "PublicDescription": "Counts cache line refills in the level 1 ins= truction cache caused by a missed instruction fetch. Instruction fetches ma= y include accessing multiple instructions, but the single cache line alloca= tion is counted once." + }, + { + "ArchStdEvent": "L1I_CACHE", + "PublicDescription": "Counts instruction fetches which access the = level 1 instruction cache. Instruction cache accesses caused by cache maint= enance operations are not counted." + }, + { + "ArchStdEvent": "L1I_CACHE_LMISS", + "PublicDescription": "Counts cache line refills into the level 1 i= nstruction cache, that incurred additional latency." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json= b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json new file mode 100644 index 000000000000..9874b1a7c94b --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json @@ -0,0 +1,46 @@ +[ + { + "ArchStdEvent": "L2D_CACHE", + "PublicDescription": "Counts accesses to the level 2 cache due to = data accesses. Level 2 cache is a unified cache for data and instruction ac= cesses. Accesses are for misses in the first level data cache or translatio= n resolutions due to accesses. This event also counts write back of dirty d= ata from level 1 data cache to the L2 cache." + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL", + "PublicDescription": "Counts cache line refills into the level 2 c= ache. Level 2 cache is a unified cache for data and instruction accesses. A= ccesses are for misses in the level 1 data cache or translation resolutions= due to accesses." + }, + { + "ArchStdEvent": "L2D_CACHE_WB", + "PublicDescription": "Counts write-backs of data from the L2 cache= to outside the CPU. This includes snoops to the L2 (from other CPUs) which= return data even if the snoops cause an invalidation. L2 cache line invali= dations which do not write data outside the CPU and snoops which return dat= a from an L1 cache are not counted. Data would not be written outside the c= ache when invalidating a clean cache line." + }, + { + "ArchStdEvent": "L2D_CACHE_ALLOCATE", + "PublicDescription": "Counts level 2 cache line allocates that do = not fetch data from outside the level 2 data or unified cache." + }, + { + "ArchStdEvent": "L2D_CACHE_RD", + "PublicDescription": "Counts level 2 data cache accesses due to me= mory read operations. Level 2 cache is a unified cache for data and instruc= tion accesses, accesses are for misses in the level 1 data cache or transla= tion resolutions due to accesses." + }, + { + "ArchStdEvent": "L2D_CACHE_WR", + "PublicDescription": "Counts level 2 cache accesses due to memory = write operations. Level 2 cache is a unified cache for data and instruction= accesses, accesses are for misses in the level 1 data cache or translation= resolutions due to accesses." + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_RD", + "PublicDescription": "Counts refills for memory accesses due to me= mory read operation counted by L2D_CACHE_RD. Level 2 cache is a unified cac= he for data and instruction accesses, accesses are for misses in the level = 1 data cache or translation resolutions due to accesses." + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_WR", + "PublicDescription": "Counts refills for memory accesses due to me= mory write operation counted by L2D_CACHE_WR. Level 2 cache is a unified ca= che for data and instruction accesses, accesses are for misses in the level= 1 data cache or translation resolutions due to accesses." + }, + { + "ArchStdEvent": "L2D_CACHE_LMISS_RD", + "PublicDescription": "Counts cache line refills into the level 2 u= nified cache from any memory read operations that incurred additional laten= cy." + }, + { + "ArchStdEvent": "L2D_CACHE_HWPRF", + "PublicDescription": "Counts level 2 data cache accesses generated= by L2D hardware prefetchers." + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_HWPRF", + "BriefDescription": "This event counts hardware prefetch counted b= y L2D_CACHE_HWPRF that causes a refill of the Level 2 cache, or any Level 1= data and instruction cache of this PE, from outside of those caches." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json= b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json new file mode 100644 index 000000000000..d5485d71babb --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json @@ -0,0 +1,21 @@ +[ + { + "ArchStdEvent": "L3D_CACHE", + "PublicDescription": "Counts level 3 cache accesses. Level 3 cache= is a unified cache for data and instruction accesses. Accesses are for mis= ses in the lower level caches or translation resolutions due to accesses." + }, + { + "ArchStdEvent": "L3D_CACHE_RD", + "PublicDescription": "Counts level 3 cache accesses caused by any = memory read operation. Level 3 cache is a unified cache for data and instru= ction accesses. Accesses are for misses in the lower level caches or transl= ation resolutions due to accesses." + }, + { + "ArchStdEvent": "L3D_CACHE_REFILL_RD" + }, + { + "ArchStdEvent": "L3D_CACHE_LMISS_RD", + "PublicDescription": "Counts any cache line refill into the level = 3 cache from memory read operations that incurred additional latency." + }, + { + "ArchStdEvent": "L3D_CACHE_HWPRF", + "PublicDescription": "Level 3 data cache hardware prefetch." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json= b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json new file mode 100644 index 000000000000..fd5a2e0099b8 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json @@ -0,0 +1,10 @@ +[ + { + "ArchStdEvent": "LL_CACHE_RD", + "PublicDescription": "Counts read transactions that were returned = from outside the core cluster. This event counts for external last level ca= che when the system register CPUECTLR.EXTLLC bit is set, otherwise it coun= ts for the L3 cache. This event counts read transactions returned from outs= ide the core if those transactions are either hit in the system level cache= or missed in the SLC and are returned from any other external sources." + }, + { + "ArchStdEvent": "LL_CACHE_MISS_RD", + "PublicDescription": "Counts read transactions that were returned = from outside the core cluster but missed in the system level cache. This ev= ent counts for external last level cache when the system register CPUECTLR.= EXTLLC bit is set, otherwise it counts for L3 cache. This event counts read= transactions returned from outside the core if those transactions are miss= ed in the System level Cache. The data source of the transaction is indicat= ed by a field in the CHI transaction returning to the CPU. This event does = not count reads caused by cache maintenance operations." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json b= /tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json new file mode 100644 index 000000000000..e7f7914ecd2b --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json @@ -0,0 +1,58 @@ +[ + { + "ArchStdEvent": "MEM_ACCESS", + "PublicDescription": "Counts memory accesses issued by the CPU loa= d store unit, where those accesses are issued due to load or store operatio= ns. This event counts memory accesses no matter whether the data is receive= d from any level of cache hierarchy or external memory. If memory accesses = are broken up into smaller transactions than what were specified in the loa= d or store instructions, then the event counts those smaller memory transac= tions." + }, + { + "ArchStdEvent": "MEMORY_ERROR", + "PublicDescription": "Counts any detected correctable or uncorrect= able physical memory errors (ECC or parity) in protected CPUs RAMs. On the = core, this event counts errors in the caches (including data and tag rams).= Any detected memory error (from either a speculative and abandoned access,= or an architecturally executed access) is counted. Note that errors are on= ly detected when the actual protected memory is accessed by an operation." + }, + { + "ArchStdEvent": "REMOTE_ACCESS_RD", + "PublicDescription": "Counts memory access to another socket in a = multi-socket system, read." + }, + { + "ArchStdEvent": "MEM_ACCESS_RD", + "PublicDescription": "Counts memory accesses issued by the CPU due= to load operations. The event counts any memory load access, no matter whe= ther the data is received from any level of cache hierarchy or external mem= ory. The event also counts atomic load operations. If memory accesses are b= roken up by the load/store unit into smaller transactions that are issued b= y the bus interface, then the event counts those smaller transactions." + }, + { + "ArchStdEvent": "MEM_ACCESS_WR", + "PublicDescription": "Counts memory accesses issued by the CPU due= to store operations. The event counts any memory store access, no matter w= hether the data is located in any level of cache or external memory. The ev= ent also counts atomic load and store operations. If memory accesses are br= oken up by the load/store unit into smaller transactions that are issued by= the bus interface, then the event counts those smaller transactions." + }, + { + "ArchStdEvent": "LDST_ALIGN_LAT", + "PublicDescription": "Counts the number of memory read and write a= ccesses in a cycle that incurred additional latency, due to the alignment o= f the address and the size of data being accessed, which results in store c= rossing a single cache line." + }, + { + "ArchStdEvent": "LD_ALIGN_LAT", + "PublicDescription": "Counts the number of memory read accesses in= a cycle that incurred additional latency, due to the alignment of the addr= ess and size of data being accessed, which results in load crossing a singl= e cache line." + }, + { + "ArchStdEvent": "ST_ALIGN_LAT", + "PublicDescription": "Counts the number of memory write access in = a cycle that incurred additional latency, due to the alignment of the addre= ss and size of data being accessed incurred additional latency." + }, + { + "ArchStdEvent": "MEM_ACCESS_CHECKED", + "PublicDescription": "Counts the number of memory read and write a= ccesses counted by MEM_ACCESS that are tag checked by the Memory Tagging Ex= tension (MTE). This event is implemented as the sum of MEM_ACCESS_CHECKED_R= D and MEM_ACCESS_CHECKED_WR" + }, + { + "ArchStdEvent": "MEM_ACCESS_CHECKED_RD", + "PublicDescription": "Counts the number of memory read accesses in= a cycle that are tag checked by the Memory Tagging Extension (MTE)." + }, + { + "ArchStdEvent": "MEM_ACCESS_CHECKED_WR", + "PublicDescription": "Counts the number of memory write accesses i= n a cycle that is tag checked by the Memory Tagging Extension (MTE)." + }, + { + "ArchStdEvent": "INST_FETCH_PERCYC", + "PublicDescription": "Counts number of instruction fetches outstan= ding per cycle, which will provide an average latency of instruction fetch." + }, + { + "ArchStdEvent": "MEM_ACCESS_RD_PERCYC", + "PublicDescription": "Counts the number of outstanding loads or me= mory read accesses per cycle." + }, + { + "ArchStdEvent": "INST_FETCH", + "PublicDescription": "Counts Instruction memory accesses that the = PE makes." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json = b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json new file mode 100644 index 000000000000..62cb910c8945 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json @@ -0,0 +1,373 @@ +[ + { + "ArchStdEvent": "backend_bound" + }, + { + "MetricName": "backend_busy_bound", + "MetricExpr": "STALL_BACKEND_BUSY / STALL_BACKEND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to issue queues being full to accept operations= for execution.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_cache_l1d_bound", + "MetricExpr": "STALL_BACKEND_L1D / (STALL_BACKEND_L1D + STALL_BACK= END_MEM) * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to memory access latency issues caused by level= 1 data cache misses.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_cache_l2d_bound", + "MetricExpr": "STALL_BACKEND_MEM / (STALL_BACKEND_L1D + STALL_BACK= END_MEM) * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to memory access latency issues caused by level= 2 data cache misses.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_mem_bound", + "MetricExpr": "STALL_BACKEND_MEMBOUND / STALL_BACKEND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to backend core resource constraints related to= memory access latency issues caused by memory access components.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_mem_cache_bound", + "MetricExpr": "(STALL_BACKEND_L1D + STALL_BACKEND_MEM) / STALL_BAC= KEND_MEMBOUND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to memory latency issues caused by data cache m= isses.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_mem_store_bound", + "MetricExpr": "STALL_BACKEND_ST / STALL_BACKEND_MEMBOUND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to memory write pending caused by stores stall= ed in the pre-commit stage.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_mem_tlb_bound", + "MetricExpr": "STALL_BACKEND_TLB / STALL_BACKEND_MEMBOUND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to memory access latency issues caused by data = TLB misses.", + "MetricGroup": "Topdown_Backend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "backend_stalled_cycles", + "MetricExpr": "STALL_BACKEND / CPU_CYCLES * 100", + "BriefDescription": "This metric is the percentage of cycles that = were stalled due to resource constraints in the backend unit of the process= or.", + "MetricGroup": "Cycle_Accounting", + "ScaleUnit": "1percent of cycles" + }, + { + "ArchStdEvent": "bad_speculation", + "MetricExpr": "(1 - STALL_SLOT / (10 * CPU_CYCLES)) * (1 - OP_RETI= RED / OP_SPEC) * 100 + STALL_FRONTEND_FLUSH / CPU_CYCLES * 100" + }, + { + "MetricName": "branch_direct_ratio", + "MetricExpr": "BR_IMMED_RETIRED / BR_RETIRED", + "BriefDescription": "This metric measures the ratio of direct bran= ches retired to the total number of branches architecturally executed.", + "MetricGroup": "Branch_Effectiveness", + "ScaleUnit": "1per branch" + }, + { + "MetricName": "branch_indirect_ratio", + "MetricExpr": "BR_IND_RETIRED / BR_RETIRED", + "BriefDescription": "This metric measures the ratio of indirect br= anches retired, including function returns, to the total number of branches= architecturally executed.", + "MetricGroup": "Branch_Effectiveness", + "ScaleUnit": "1per branch" + }, + { + "MetricName": "branch_misprediction_ratio", + "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED", + "BriefDescription": "This metric measures the ratio of branches mi= spredicted to the total number of branches architecturally executed. This g= ives an indication of the effectiveness of the branch prediction unit.", + "MetricGroup": "Miss_Ratio;Branch_Effectiveness", + "ScaleUnit": "100percent of branches" + }, + { + "MetricName": "branch_mpki", + "MetricExpr": "BR_MIS_PRED_RETIRED / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of branch mis= predictions per thousand instructions executed.", + "MetricGroup": "MPKI;Branch_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "branch_percentage", + "MetricExpr": "PC_WRITE_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures branch operations as a p= ercentage of operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "branch_return_ratio", + "MetricExpr": "BR_RETURN_RETIRED / BR_RETIRED", + "BriefDescription": "This metric measures the ratio of branches re= tired that are function returns to the total number of branches architectur= ally executed.", + "MetricGroup": "Branch_Effectiveness", + "ScaleUnit": "1per branch" + }, + { + "MetricName": "crypto_percentage", + "MetricExpr": "CRYPTO_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures crypto operations as a p= ercentage of operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "dtlb_mpki", + "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of data TLB W= alks per thousand instructions executed.", + "MetricGroup": "MPKI;DTLB_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "dtlb_walk_ratio", + "MetricExpr": "DTLB_WALK / L1D_TLB", + "BriefDescription": "This metric measures the ratio of data TLB Wa= lks to the total number of data TLB accesses. This gives an indication of t= he effectiveness of the data TLB accesses.", + "MetricGroup": "Miss_Ratio;DTLB_Effectiveness", + "ScaleUnit": "100percent of TLB accesses" + }, + { + "MetricName": "fp16_percentage", + "MetricExpr": "FP_HP_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures half-precision floating = point operations as a percentage of operations speculatively executed.", + "MetricGroup": "FP_Precision_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "fp32_percentage", + "MetricExpr": "FP_SP_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures single-precision floatin= g point operations as a percentage of operations speculatively executed.", + "MetricGroup": "FP_Precision_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "fp64_percentage", + "MetricExpr": "FP_DP_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures double-precision floatin= g point operations as a percentage of operations speculatively executed.", + "MetricGroup": "FP_Precision_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "frontend_cache_l1i_bound", + "MetricExpr": "STALL_FRONTEND_L1I / (STALL_FRONTEND_L1I + STALL_FR= ONTEND_MEM) * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to memory access latency issues caused by leve= l 1 instruction cache misses.", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_cache_l2i_bound", + "MetricExpr": "STALL_FRONTEND_MEM / (STALL_FRONTEND_L1I + STALL_FR= ONTEND_MEM) * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to memory access latency issues caused by leve= l 2 instruction cache misses.", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_core_bound", + "MetricExpr": "STALL_FRONTEND_CPUBOUND / STALL_FRONTEND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to frontend core resource constraints not rela= ted to instruction fetch latency issues caused by memory access components.= ", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_core_flush_bound", + "MetricExpr": "STALL_FRONTEND_FLUSH / STALL_FRONTEND_CPUBOUND * 10= 0", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend as the processor is recovering from a pipeline flu= sh caused by bad speculation or other machine resteers.", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_mem_bound", + "MetricExpr": "STALL_FRONTEND_MEMBOUND / STALL_FRONTEND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to frontend core resource constraints related = to the instruction fetch latency issues caused by memory access components.= ", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_mem_cache_bound", + "MetricExpr": "(STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) / STALL_F= RONTEND_MEMBOUND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to instruction fetch latency issues caused by = instruction cache misses.", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_mem_tlb_bound", + "MetricExpr": "STALL_FRONTEND_TLB / STALL_FRONTEND_MEMBOUND * 100", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to instruction fetch latency issues caused by = instruction TLB misses.", + "MetricGroup": "Topdown_Frontend", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "frontend_stalled_cycles", + "MetricExpr": "STALL_FRONTEND / CPU_CYCLES * 100", + "BriefDescription": "This metric is the percentage of cycles that = were stalled due to resource constraints in the frontend unit of the proces= sor.", + "MetricGroup": "Cycle_Accounting", + "ScaleUnit": "1percent of cycles" + }, + { + "MetricName": "integer_dp_percentage", + "MetricExpr": "DP_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures scalar integer operation= s as a percentage of operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "ipc", + "MetricExpr": "INST_RETIRED / CPU_CYCLES", + "BriefDescription": "This metric measures the number of instructio= ns retired per cycle.", + "MetricGroup": "General", + "ScaleUnit": "1per cycle" + }, + { + "MetricName": "itlb_mpki", + "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of instructio= n TLB Walks per thousand instructions executed.", + "MetricGroup": "MPKI;ITLB_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "itlb_walk_ratio", + "MetricExpr": "ITLB_WALK / L1I_TLB", + "BriefDescription": "This metric measures the ratio of instruction= TLB Walks to the total number of instruction TLB accesses. This gives an i= ndication of the effectiveness of the instruction TLB accesses.", + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness", + "ScaleUnit": "100percent of TLB accesses" + }, + { + "MetricName": "l1d_cache_miss_ratio", + "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE", + "BriefDescription": "This metric measures the ratio of level 1 dat= a cache accesses missed to the total number of level 1 data cache accesses.= This gives an indication of the effectiveness of the level 1 data cache.", + "MetricGroup": "Miss_Ratio;L1D_Cache_Effectiveness", + "ScaleUnit": "100percent of cache accesses" + }, + { + "MetricName": "l1d_cache_mpki", + "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of level 1 da= ta cache accesses missed per thousand instructions executed.", + "MetricGroup": "MPKI;L1D_Cache_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "l1d_tlb_miss_ratio", + "MetricExpr": "L1D_TLB_REFILL / L1D_TLB", + "BriefDescription": "This metric measures the ratio of level 1 dat= a TLB accesses missed to the total number of level 1 data TLB accesses. Thi= s gives an indication of the effectiveness of the level 1 data TLB.", + "MetricGroup": "Miss_Ratio;DTLB_Effectiveness", + "ScaleUnit": "100percent of TLB accesses" + }, + { + "MetricName": "l1d_tlb_mpki", + "MetricExpr": "L1D_TLB_REFILL / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of level 1 da= ta TLB accesses missed per thousand instructions executed.", + "MetricGroup": "MPKI;DTLB_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "l1i_cache_miss_ratio", + "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE", + "BriefDescription": "This metric measures the ratio of level 1 ins= truction cache accesses missed to the total number of level 1 instruction c= ache accesses. This gives an indication of the effectiveness of the level 1= instruction cache.", + "MetricGroup": "Miss_Ratio;L1I_Cache_Effectiveness", + "ScaleUnit": "100percent of cache accesses" + }, + { + "MetricName": "l1i_cache_mpki", + "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of level 1 in= struction cache accesses missed per thousand instructions executed.", + "MetricGroup": "MPKI;L1I_Cache_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "l1i_tlb_miss_ratio", + "MetricExpr": "L1I_TLB_REFILL / L1I_TLB", + "BriefDescription": "This metric measures the ratio of level 1 ins= truction TLB accesses missed to the total number of level 1 instruction TLB= accesses. This gives an indication of the effectiveness of the level 1 ins= truction TLB.", + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness", + "ScaleUnit": "100percent of TLB accesses" + }, + { + "MetricName": "l1i_tlb_mpki", + "MetricExpr": "L1I_TLB_REFILL / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of level 1 in= struction TLB accesses missed per thousand instructions executed.", + "MetricGroup": "MPKI;ITLB_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "l2_cache_miss_ratio", + "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE", + "BriefDescription": "This metric measures the ratio of level 2 cac= he accesses missed to the total number of level 2 cache accesses. This give= s an indication of the effectiveness of the level 2 cache, which is a unifi= ed cache that stores both data and instruction. Note that cache accesses in= this cache are either data memory access or instruction fetch as this is a= unified cache.", + "MetricGroup": "Miss_Ratio;L2_Cache_Effectiveness", + "ScaleUnit": "100percent of cache accesses" + }, + { + "MetricName": "l2_cache_mpki", + "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of level 2 un= ified cache accesses missed per thousand instructions executed. Note that c= ache accesses in this cache are either data memory access or instruction fe= tch as this is a unified cache.", + "MetricGroup": "MPKI;L2_Cache_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "l2_tlb_miss_ratio", + "MetricExpr": "L2D_TLB_REFILL / L2D_TLB", + "BriefDescription": "This metric measures the ratio of level 2 uni= fied TLB accesses missed to the total number of level 2 unified TLB accesse= s. This gives an indication of the effectiveness of the level 2 TLB.", + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness;DTLB_Effectiveness", + "ScaleUnit": "100percent of TLB accesses" + }, + { + "MetricName": "l2_tlb_mpki", + "MetricExpr": "L2D_TLB_REFILL / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of level 2 un= ified TLB accesses missed per thousand instructions executed.", + "MetricGroup": "MPKI;ITLB_Effectiveness;DTLB_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "ll_cache_read_hit_ratio", + "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD", + "BriefDescription": "This metric measures the ratio of last level = cache read accesses hit in the cache to the total number of last level cach= e accesses. This gives an indication of the effectiveness of the last level= cache for read traffic. Note that cache accesses in this cache are either = data memory access or instruction fetch as this is a system level cache.", + "MetricGroup": "LL_Cache_Effectiveness", + "ScaleUnit": "100percent of cache accesses" + }, + { + "MetricName": "ll_cache_read_miss_ratio", + "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD", + "BriefDescription": "This metric measures the ratio of last level = cache read accesses missed to the total number of last level cache accesses= . This gives an indication of the effectiveness of the last level cache for= read traffic. Note that cache accesses in this cache are either data memor= y access or instruction fetch as this is a system level cache.", + "MetricGroup": "Miss_Ratio;LL_Cache_Effectiveness", + "ScaleUnit": "100percent of cache accesses" + }, + { + "MetricName": "ll_cache_read_mpki", + "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000", + "BriefDescription": "This metric measures the number of last level= cache read accesses missed per thousand instructions executed.", + "MetricGroup": "MPKI;LL_Cache_Effectiveness", + "ScaleUnit": "1MPKI" + }, + { + "MetricName": "load_percentage", + "MetricExpr": "LD_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures load operations as a per= centage of operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "scalar_fp_percentage", + "MetricExpr": "VFP_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures scalar floating point op= erations as a percentage of operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "simd_percentage", + "MetricExpr": "ASE_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures advanced SIMD operations= as a percentage of total operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "store_percentage", + "MetricExpr": "ST_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures store operations as a pe= rcentage of operations speculatively executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + }, + { + "MetricName": "sve_all_percentage", + "MetricExpr": "SVE_INST_SPEC / INST_SPEC * 100", + "BriefDescription": "This metric measures scalable vector operatio= ns, including loads and stores, as a percentage of operations speculatively= executed.", + "MetricGroup": "Operation_Mix", + "ScaleUnit": "1percent of operations" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json b/to= ols/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json new file mode 100644 index 000000000000..d8b7b9f9e5fa --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json @@ -0,0 +1,8 @@ +[ + { + "ArchStdEvent": "PMU_OVFS" + }, + { + "ArchStdEvent": "PMU_HOVFS" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json = b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json new file mode 100644 index 000000000000..152f15c1253c --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json @@ -0,0 +1,90 @@ +[ + { + "ArchStdEvent": "SW_INCR", + "PublicDescription": "Counts software writes to the PMSWINC_EL0 (s= oftware PMU increment) register. The PMSWINC_EL0 register is a manually upd= ated counter for use by application software.\n\nThis event could be used t= o measure any user program event, such as accesses to a particular data str= ucture (by writing to the PMSWINC_EL0 register each time the data structure= is accessed).\n\nTo use the PMSWINC_EL0 register and event, developers mus= t insert instructions that write to the PMSWINC_EL0 register into the sourc= e code.\n\nSince the SW_INCR event records writes to the PMSWINC_EL0 regist= er, there is no need to do a read/increment/write sequence to the PMSWINC_E= L0 register." + }, + { + "ArchStdEvent": "LD_RETIRED", + "PublicDescription": "Counts instruction architecturally executed,= Condition code check pass, load." + }, + { + "ArchStdEvent": "ST_RETIRED", + "PublicDescription": "Counts instruction architecturally executed,= Condition code check pass, store." + }, + { + "ArchStdEvent": "INST_RETIRED", + "PublicDescription": "Counts instructions that have been architect= urally executed." + }, + { + "ArchStdEvent": "CID_WRITE_RETIRED", + "PublicDescription": "Counts architecturally executed writes to th= e CONTEXTIDR_EL1 register, which usually contain the kernel PID and can be = output with hardware trace." + }, + { + "ArchStdEvent": "PC_WRITE_RETIRED", + "PublicDescription": "Counts branch instructions that caused a cha= nge of Program Counter, which effectively causes a change in the control fl= ow of the program." + }, + { + "ArchStdEvent": "BR_IMMED_RETIRED", + "PublicDescription": "Counts architecturally executed direct branc= hes." + }, + { + "ArchStdEvent": "BR_RETURN_RETIRED", + "PublicDescription": "Counts architecturally executed procedure re= turns." + }, + { + "ArchStdEvent": "TTBR_WRITE_RETIRED", + "PublicDescription": "Counts architectural writes to TTBR0/1_EL1. = If virtualization host extensions are enabled (by setting the HCR_EL2.E2H b= it to 1), then accesses to TTBR0/1_EL1 that are redirected to TTBR0/1_EL2, = or accesses to TTBR0/1_EL12, are counted. TTBRn registers are typically upd= ated when the kernel is swapping user-space threads or applications." + }, + { + "ArchStdEvent": "BR_RETIRED", + "PublicDescription": "Counts architecturally executed branches, wh= ether the branch is taken or not. Instructions that explicitly write to the= PC are also counted. Note that exception generating instructions, exceptio= n return instructions and context synchronization instructions are not coun= ted." + }, + { + "ArchStdEvent": "BR_MIS_PRED_RETIRED", + "PublicDescription": "Counts branches counted by BR_RETIRED which = were mispredicted and caused a pipeline flush." + }, + { + "ArchStdEvent": "OP_RETIRED", + "PublicDescription": "Counts micro-operations that are architectur= ally executed. This is a count of number of micro-operations retired from t= he commit queue in a single cycle." + }, + { + "ArchStdEvent": "SVE_INST_RETIRED", + "PublicDescription": "Counts architecturally executed SVE instruct= ions." + }, + { + "ArchStdEvent": "BR_INDNR_TAKEN_RETIRED", + "PublicDescription": "Counts architecturally executed indirect bra= nches excluding procedure returns that were taken." + }, + { + "ArchStdEvent": "BR_IMMED_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed direct branc= hes that were correctly predicted." + }, + { + "ArchStdEvent": "BR_IMMED_MIS_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed direct branc= hes that were mispredicted and caused a pipeline flush." + }, + { + "ArchStdEvent": "BR_RETURN_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed procedure re= turns that were correctly predicted." + }, + { + "ArchStdEvent": "BR_RETURN_MIS_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed procedure re= turns that were mispredicted and caused a pipeline flush." + }, + { + "ArchStdEvent": "BR_INDNR_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed indirect bra= nches excluding procedure returns that were correctly predicted." + }, + { + "ArchStdEvent": "BR_INDNR_MIS_PRED_RETIRED", + "PublicDescription": "Counts architecturally executed indirect bra= nches excluding procedure returns that were mispredicted and caused a pipel= ine flush." + }, + { + "ArchStdEvent": "BR_PRED_RETIRED", + "PublicDescription": "Counts branch instructions counted by BR_RET= IRED which were correctly predicted." + }, + { + "ArchStdEvent": "BR_IND_RETIRED", + "PublicDescription": "Counts architecturally executed indirect bra= nches including procedure returns." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operatio= n.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.js= on new file mode 100644 index 000000000000..40c29be53cc0 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json @@ -0,0 +1,70 @@ +[ + { + "ArchStdEvent": "BR_MIS_PRED", + "PublicDescription": "Counts branches which are speculatively exec= uted and mispredicted." + }, + { + "ArchStdEvent": "BR_PRED", + "PublicDescription": "Counts all speculatively executed branches." + }, + { + "ArchStdEvent": "INST_SPEC", + "PublicDescription": "Counts operations that have been speculative= ly executed." + }, + { + "ArchStdEvent": "OP_SPEC", + "PublicDescription": "Counts micro-operations speculatively execut= ed. This is the count of the number of micro-operations dispatched in a cyc= le." + }, + { + "ArchStdEvent": "STREX_FAIL_SPEC", + "PublicDescription": "Counts store-exclusive operations that have = been speculatively executed and have not successfully completed the store o= peration." + }, + { + "ArchStdEvent": "STREX_SPEC", + "PublicDescription": "Counts store-exclusive operations that have = been speculatively executed." + }, + { + "ArchStdEvent": "LD_SPEC", + "PublicDescription": "Counts speculatively executed load operation= s including Single Instruction Multiple Data (SIMD) load operations." + }, + { + "ArchStdEvent": "ST_SPEC", + "PublicDescription": "Counts speculatively executed store operatio= ns including Single Instruction Multiple Data (SIMD) store operations." + }, + { + "ArchStdEvent": "LDST_SPEC", + "PublicDescription": "Counts speculatively executed load and store= operations." + }, + { + "ArchStdEvent": "DP_SPEC", + "PublicDescription": "Counts speculatively executed logical or ari= thmetic instructions such as MOV/MVN operations." + }, + { + "ArchStdEvent": "ASE_SPEC", + "PublicDescription": "Counts speculatively executed Advanced SIMD = operations excluding load, store and move micro-operations that move data t= o or from SIMD (vector) registers." + }, + { + "ArchStdEvent": "VFP_SPEC", + "PublicDescription": "Counts speculatively executed floating point= operations. This event does not count operations that move data to or from= floating point (vector) registers." + }, + { + "ArchStdEvent": "PC_WRITE_SPEC", + "PublicDescription": "Counts speculatively executed operations whi= ch cause software changes of the PC. Those operations include all taken bra= nch operations." + }, + { + "ArchStdEvent": "CRYPTO_SPEC", + "PublicDescription": "Counts speculatively executed cryptographic = operations except for PMULL and VMULL operations." + }, + { + "ArchStdEvent": "BR_IMMED_SPEC", + "PublicDescription": "Counts direct branch operations which are sp= eculatively executed." + }, + { + "ArchStdEvent": "BR_RETURN_SPEC", + "PublicDescription": "Counts procedure return operations (RET, RET= AA and RETAB) which are speculatively executed." + }, + { + "ArchStdEvent": "BR_INDIRECT_SPEC", + "PublicDescription": "Counts indirect branch operations including = procedure returns, which are speculatively executed. This includes operatio= ns that force a software change of the PC, other than exception-generating = operations and direct branch instructions. Some examples of the instruction= s counted by this event include BR Xn, RET, etc..." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json b/= tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json new file mode 100644 index 000000000000..d65aeb4b8808 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json @@ -0,0 +1,82 @@ +[ + { + "ArchStdEvent": "STALL_FRONTEND", + "PublicDescription": "Counts cycles when frontend could not send a= ny micro-operations to the rename stage because of frontend resource stalls= caused by fetch memory latency or branch prediction flow stalls. STALL_FRO= NTEND_SLOTS counts SLOTS during the cycle when this event counts." + }, + { + "ArchStdEvent": "STALL_BACKEND", + "PublicDescription": "Counts cycles whenever the rename unit is un= able to send any micro-operations to the backend of the pipeline because of= backend resource constraints. Backend resource constraints can include iss= ue stage fullness, execution stage fullness, or other internal pipeline res= ource fullness. All the backend slots were empty during the cycle when this= event counts." + }, + { + "ArchStdEvent": "STALL", + "PublicDescription": "Counts cycles when no operations are sent to= the rename unit from the frontend or from the rename unit to the backend f= or any reason (either frontend or backend stall). This event is the sum of = STALL_FRONTEND and STALL_BACKEND" + }, + { + "ArchStdEvent": "STALL_SLOT_BACKEND", + "PublicDescription": "Counts slots per cycle in which no operation= s are sent from the rename unit to the backend due to backend resource cons= traints. STALL_BACKEND counts during the cycle when STALL_SLOT_BACKEND coun= ts at least 1." + }, + { + "ArchStdEvent": "STALL_SLOT_FRONTEND", + "PublicDescription": "Counts slots per cycle in which no operation= s are sent to the rename unit from the frontend due to frontend resource co= nstraints." + }, + { + "ArchStdEvent": "STALL_SLOT", + "PublicDescription": "Counts slots per cycle in which no operation= s are sent to the rename unit from the frontend or from the rename unit to = the backend for any reason (either frontend or backend stall). STALL_SLOT i= s the sum of STALL_SLOT_FRONTEND and STALL_SLOT_BACKEND." + }, + { + "ArchStdEvent": "STALL_BACKEND_MEM", + "PublicDescription": "Counts cycles when the backend is stalled be= cause there is a pending demand load request in progress in the last level = core cache." + }, + { + "ArchStdEvent": "STALL_FRONTEND_MEMBOUND", + "PublicDescription": "Counts cycles when the frontend could not se= nd any micro-operations to the rename stage due to resource constraints in = the memory resources." + }, + { + "ArchStdEvent": "STALL_FRONTEND_L1I", + "PublicDescription": "Counts cycles when the frontend is stalled b= ecause there is an instruction fetch request pending in the level 1 instruc= tion cache." + }, + { + "ArchStdEvent": "STALL_FRONTEND_MEM", + "PublicDescription": "Counts cycles when the frontend is stalled b= ecause there is an instruction fetch request pending in the last level core= cache." + }, + { + "ArchStdEvent": "STALL_FRONTEND_TLB", + "PublicDescription": "Counts when the frontend is stalled on any T= LB misses being handled. This event also counts the TLB accesses made by ha= rdware prefetches." + }, + { + "ArchStdEvent": "STALL_FRONTEND_CPUBOUND", + "PublicDescription": "Counts cycles when the frontend could not se= nd any micro-operations to the rename stage due to resource constraints in = the CPU resources excluding memory resources." + }, + { + "ArchStdEvent": "STALL_FRONTEND_FLOW", + "PublicDescription": "Counts cycles when the frontend could not se= nd any micro-operations to the rename stage due to resource constraints in = the branch prediction unit." + }, + { + "ArchStdEvent": "STALL_FRONTEND_FLUSH", + "PublicDescription": "Counts cycles when the frontend could not se= nd any micro-operations to the rename stage as the frontend is recovering f= rom a machine flush or resteer. Example scenarios that cause a flush includ= e branch mispredictions, taken exceptions, micro-architectural flush etc." + }, + { + "ArchStdEvent": "STALL_BACKEND_MEMBOUND", + "PublicDescription": "Counts cycles when the backend could not acc= ept any micro-operations due to resource constraints in the memory resource= s." + }, + { + "ArchStdEvent": "STALL_BACKEND_L1D", + "PublicDescription": "Counts cycles when the backend is stalled be= cause there is a pending demand load request in progress in the level 1 dat= a cache." + }, + { + "ArchStdEvent": "STALL_BACKEND_TLB", + "PublicDescription": "Counts cycles when the backend is stalled on= any demand TLB misses being handled." + }, + { + "ArchStdEvent": "STALL_BACKEND_ST", + "PublicDescription": "Counts cycles when the backend is stalled an= d there is a store that has not reached the pre-commit stage." + }, + { + "ArchStdEvent": "STALL_BACKEND_BUSY", + "PublicDescription": "Counts cycles when the backend could not acc= ept any micro-operations because the issue queues are full to take any oper= ations for execution." + }, + { + "ArchStdEvent": "STALL_BACKEND_ILOCK", + "PublicDescription": "Counts cycles when the backend could not acc= ept any micro-operations due to resource constraints imposed by input depen= dency." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json b/to= ols/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json new file mode 100644 index 000000000000..21810ce5de8d --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json @@ -0,0 +1,22 @@ +[ + { + "ArchStdEvent": "SVE_INST_SPEC", + "PublicDescription": "Counts speculatively executed operations tha= t are SVE operations." + }, + { + "ArchStdEvent": "ASE_SVE_INT8_SPEC", + "PublicDescription": "Counts speculatively executed Advanced SIMD = or SVE integer operations with the largest data type an 8-bit integer." + }, + { + "ArchStdEvent": "ASE_SVE_INT16_SPEC", + "PublicDescription": "Counts speculatively executed Advanced SIMD = or SVE integer operations with the largest data type a 16-bit integer." + }, + { + "ArchStdEvent": "ASE_SVE_INT32_SPEC", + "PublicDescription": "Counts speculatively executed Advanced SIMD = or SVE integer operations with the largest data type a 32-bit integer." + }, + { + "ArchStdEvent": "ASE_SVE_INT64_SPEC", + "PublicDescription": "Counts speculatively executed Advanced SIMD = or SVE integer operations with the largest data type a 64-bit integer." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json b/to= ols/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json new file mode 100644 index 000000000000..1de56300e581 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json @@ -0,0 +1,78 @@ +[ + { + "ArchStdEvent": "L1I_TLB_REFILL", + "PublicDescription": "Counts level 1 instruction TLB refills from = any Instruction fetch. If there are multiple misses in the TLB that are res= olved by the refill, then this event only counts once. This event will not = count if the translation table walk results in a fault (such as a translati= on or access fault), since there is no new translation created for the TLB." + }, + { + "ArchStdEvent": "L1D_TLB_REFILL", + "PublicDescription": "Counts level 1 data TLB accesses that result= ed in TLB refills. If there are multiple misses in the TLB that are resolve= d by the refill, then this event only counts once. This event counts for re= fills caused by preload instructions or hardware prefetch accesses. This ev= ent counts regardless of whether the miss hits in L2 or results in a transl= ation table walk. This event will not count if the translation table walk r= esults in a fault (such as a translation or access fault), since there is n= o new translation created for the TLB. This event will not count on an acce= ss from an AT(address translation) instruction." + }, + { + "ArchStdEvent": "L1D_TLB", + "PublicDescription": "Counts level 1 data TLB accesses caused by a= ny memory load or store operation. Note that load or store instructions can= be broken up into multiple memory operations. This event does not count TL= B maintenance operations." + }, + { + "ArchStdEvent": "L1I_TLB", + "PublicDescription": "Counts level 1 instruction TLB accesses, whe= ther the access hits or misses in the TLB. This event counts both demand ac= cesses and prefetch or preload generated accesses." + }, + { + "ArchStdEvent": "L2D_TLB_REFILL", + "PublicDescription": "Counts level 2 TLB refills caused by memory = operations from both data and instruction fetch, except for those caused by= TLB maintenance operations and hardware prefetches." + }, + { + "ArchStdEvent": "L2D_TLB", + "PublicDescription": "Counts level 2 TLB accesses except those cau= sed by TLB maintenance operations." + }, + { + "ArchStdEvent": "DTLB_WALK", + "PublicDescription": "Counts number of demand data translation tab= le walks caused by a miss in the L2 TLB and performing at least one memory = access. Translation table walks are counted even if the translation ended u= p taking a translation fault for reasons different than EPD, E0PD and NFD. = Note that partial translations that cause a translation table walk are also= counted. Also note that this event counts walks triggered by software prel= oads, but not walks triggered by hardware prefetchers, and that this event = does not count walks triggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "ITLB_WALK", + "PublicDescription": "Counts number of instruction translation tab= le walks caused by a miss in the L2 TLB and performing at least one memory = access. Translation table walks are counted even if the translation ended u= p taking a translation fault for reasons different than EPD, E0PD and NFD. = Note that partial translations that cause a translation table walk are also= counted. Also note that this event does not count walks triggered by TLB m= aintenance operations." + }, + { + "ArchStdEvent": "DTLB_WALK_PERCYC", + "PublicDescription": "Counts the number of data translation table = walks in progress per cycle." + }, + { + "ArchStdEvent": "ITLB_WALK_PERCYC", + "PublicDescription": "Counts the number of instruction translation= table walks in progress per cycle." + }, + { + "ArchStdEvent": "DTLB_HWUPD", + "PublicDescription": "Counts number of memory accesses triggered b= y a data translation table walk and performing an update of a translation t= able entry. Memory accesses are counted even if the translation ended up ta= king a translation fault for reasons different than EPD, E0PD and NFD. Note= that this event counts accesses triggered by software preloads, but not ac= cesses triggered by hardware prefetchers." + }, + { + "ArchStdEvent": "ITLB_HWUPD", + "PublicDescription": "Counts number of memory accesses triggered b= y an instruction translation table walk and performing an update of a trans= lation table entry. Memory accesses are counted even if the translation end= ed up taking a translation fault for reasons different than EPD, E0PD and N= FD." + }, + { + "ArchStdEvent": "DTLB_STEP", + "PublicDescription": "Counts number of memory accesses triggered b= y a demand data translation table walk and performing a read of a translati= on table entry. Memory accesses are counted even if the translation ended u= p taking a translation fault for reasons different than EPD, E0PD and NFD. = Note that this event counts accesses triggered by software preloads, but no= t accesses triggered by hardware prefetchers." + }, + { + "ArchStdEvent": "ITLB_STEP", + "PublicDescription": "Counts number of memory accesses triggered b= y an instruction translation table walk and performing a read of a translat= ion table entry. Memory accesses are counted even if the translation ended = up taking a translation fault for reasons different than EPD, E0PD and NFD." + }, + { + "ArchStdEvent": "DTLB_WALK_LARGE", + "PublicDescription": "Counts number of demand data translation tab= le walks caused by a miss in the L2 TLB and yielding a large page. The set = of large pages is defined as all pages with a final size higher than or equ= al to 2MB. Translation table walks that end up taking a translation fault a= re not counted, as the page size would be undefined in that case. If DTLB_W= ALK_BLOCK is implemented, then it is an alias for this event in this family= . Note that partial translations that cause a translation table walk are al= so counted. Also note that this event counts walks triggered by software pr= eloads, but not walks triggered by hardware prefetchers, and that this even= t does not count walks triggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "ITLB_WALK_LARGE", + "PublicDescription": "Counts number of instruction translation tab= le walks caused by a miss in the L2 TLB and yielding a large page. The set = of large pages is defined as all pages with a final size higher than or equ= al to 2MB. Translation table walks that end up taking a translation fault a= re not counted, as the page size would be undefined in that case. In this f= amily, this is equal to ITLB_WALK_BLOCK event. Note that partial translatio= ns that cause a translation table walk are also counted. Also note that thi= s event does not count walks triggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "DTLB_WALK_SMALL", + "PublicDescription": "Counts number of data translation table walk= s caused by a miss in the L2 TLB and yielding a small page. The set of smal= l pages is defined as all pages with a final size lower than 2MB. Translati= on table walks that end up taking a translation fault are not counted, as t= he page size would be undefined in that case. If DTLB_WALK_PAGE event is im= plemented, then it is an alias for this event in this family. Note that par= tial translations that cause a translation table walk are also counted. Als= o note that this event counts walks triggered by software preloads, but not= walks triggered by hardware prefetchers, and that this event does not coun= t walks triggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "ITLB_WALK_SMALL", + "PublicDescription": "Counts number of instruction translation tab= le walks caused by a miss in the L2 TLB and yielding a small page. The set = of small pages is defined as all pages with a final size lower than 2MB. Tr= anslation table walks that end up taking a translation fault are not counte= d, as the page size would be undefined in that case. In this family, this i= s equal to ITLB_WALK_PAGE event. Note that partial translations that cause = a translation table walk are also counted. Also note that this event does n= ot count walks triggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "DTLB_WALK_RW", + "PublicDescription": "Counts number of demand data translation tab= le walks caused by a miss in the L2 TLB and performing at least one memory = access. Translation table walks are counted even if the translation ended u= p taking a translation fault for reasons different than EPD, E0PD and NFD. = Note that partial translations that cause a translation table walk are also= counted. Also note that this event does not count walks triggered by TLB m= aintenance operations." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json b/= tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json new file mode 100644 index 000000000000..33672a8711d4 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json @@ -0,0 +1,32 @@ +[ + { + "ArchStdEvent": "TRB_WRAP" + }, + { + "ArchStdEvent": "TRB_TRIG" + }, + { + "ArchStdEvent": "TRCEXTOUT0" + }, + { + "ArchStdEvent": "TRCEXTOUT1" + }, + { + "ArchStdEvent": "TRCEXTOUT2" + }, + { + "ArchStdEvent": "TRCEXTOUT3" + }, + { + "ArchStdEvent": "CTI_TRIGOUT4" + }, + { + "ArchStdEvent": "CTI_TRIGOUT5" + }, + { + "ArchStdEvent": "CTI_TRIGOUT6" + }, + { + "ArchStdEvent": "CTI_TRIGOUT7" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json b/t= ools/perf/pmu-events/arch/arm64/common-and-microarch.json index e40be37addf8..3e774c1e1413 100644 --- a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json +++ b/tools/perf/pmu-events/arch/arm64/common-and-microarch.json @@ -1339,6 +1339,11 @@ "EventName": "INST_FETCH", "BriefDescription": "Instruction memory access" }, + { + "EventCode": "0x8125", + "EventName": "BUS_REQ_RD_PERCYC", + "BriefDescription": "Bus read transactions in progress" + }, { "EventCode": "0x8128", "EventName": "DTLB_WALK_PERCYC", @@ -1539,6 +1544,11 @@ "EventName": "L2D_CACHE_HWPRF", "BriefDescription": "Level 2 data cache hardware prefetch." }, + { + "EventCode": "0x8156", + "EventName": "L3D_CACHE_HWPRF", + "BriefDescription": "Level 3 data cache hardware prefetch." + }, { "EventCode": "0x8158", "EventName": "STALL_FRONTEND_MEMBOUND", @@ -1674,6 +1684,11 @@ "EventName": "DTLB_WALK_PAGE", "BriefDescription": "Data TLB page translation table walk." }, + { + "EventCode": "0x818D", + "EventName": "BUS_REQ_RD", + "BriefDescription": "Bus request, read" + }, { "EventCode": "0x818B", "EventName": "ITLB_WALK_PAGE", diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pmu-= events/arch/arm64/mapfile.csv index ccfcae375750..6b98632636e1 100644 --- a/tools/perf/pmu-events/arch/arm64/mapfile.csv +++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv @@ -32,6 +32,7 @@ 0x00000000410fd440,v1,arm/cortex-x1,core 0x00000000410fd4c0,v1,arm/cortex-x1,core 0x00000000410fd460,v1,arm/cortex-a510,core +0x00000000410fd800,v1,arm/cortex-a520,core 0x00000000410fd470,v1,arm/cortex-a710,core 0x00000000410fd810,v1,arm/cortex-a720,core 0x00000000410fd480,v1,arm/cortex-x2,core --=20 2.47.2