From nobody Sun Dec 14 13:55:22 2025 Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 821A335BDD9 for ; Tue, 11 Nov 2025 11:38:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762861123; cv=none; b=kqAsTOyesgKbJYrzyvdw1VPiNYj3tH55hhATfKpTskV8oSx7LJ0CnxcbAM7TU/kqbY6q+wBOGn76fkud28cpBGvGqfExsc5KyHcMzc50X4iRm1baRMK5EMPpkHrfiGVIVQFu0oqZ97trju6ghrGvy/VAVlaIEX2Iinmp+vczc/c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762861123; c=relaxed/simple; bh=HCAve+/meC9Wab9CoQxypeNR03XvBSi8OfLDRXMNQeE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=CGBiavbEp7fgyKYldXhY4uNqBwehFX2WlI1JXfzdyEgS4sZtfGL8NhvNEYiakakOyBwMveZ7n6QaAbPRbkemf9k34hYTJ5fecavJSwq6sjwJOwwp7Z5yTIw7qEYmY+OPtlMcIuVK8EfrzCPb7fDnjqmtrIvVnZG0+4uPtG1NJc4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org; spf=pass smtp.mailfrom=linaro.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b=UjewDlo7; arc=none smtp.client-ip=209.85.221.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linaro.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="UjewDlo7" Received: by mail-wr1-f49.google.com with SMTP id ffacd0b85a97d-42b3c965cc4so322306f8f.0 for ; Tue, 11 Nov 2025 03:38:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1762861116; x=1763465916; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=503A5zbvnWzbfI+gim0EXQRVNQ1ZkO34HFUXzTRqSG8=; b=UjewDlo7FHtwTCQXyX833zwBaZQsf9AKQm3ZRrJx883YHy2XDpwOaL9m2ihHEPO41c 3KsXA6V9xMabyxqprcukYoTywR+ZDYIO0BKWa+/sUJ//mrhgI+LIWgpGl1OFnjM13zp4 ge1J1bJ2c98C7+ObspePewXbWqYPnmK/wnZaItJOi4zZTOJmjhdXouYS6SbFA2Sp78pH 56Sc4BbxkOztu6GzQrLlsgbZtrx1CJT78AIqTC6v7klPDQ29Cpini1KGRIGdTe/zmIiA g1AeVtFIaUjJo6MvPKAu2waj9RE+S/ASdHnv3/kWg3ROJ5Ap0V63AKeMdt8HoNgZqhFw w66A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762861116; x=1763465916; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=503A5zbvnWzbfI+gim0EXQRVNQ1ZkO34HFUXzTRqSG8=; b=apPPhnujckBlf7N0xy0Qss23HIlehGHv7FAYkl7ktA+QhThfRz6/0qgKweFgeanmQ2 VMMNYhgYHnAuH3sGRfLvbSAGauc4LS2eXjInm47fDRhAsxfpkFJt8/3iZO4LcA69csPI lOYmcktDHbwOzxX2DhqUSrO4lSibL5qxD66FKyCcAXO/SU74OqwvKSbaXmO5zr5MWIAk wCxJzk2eIvp6MC/l390raTn1I7vJKMDmsTyRYDjRNrCy1ajzUo+8Qiz6RvMPUk7P85LU ay7aZzkKXZye9hA+Z3JIKWqnczqWKtqLlLJ8niJLS+FdL+USNEhAuv3+Ly8JpYPmQWvd HkrA== X-Forwarded-Encrypted: i=1; AJvYcCWGqAxWNdjZKGPN4PLTfDARx9dnZ/eXOIjRcZjzhwvnmmYDeZMTSvRZZ62oXSE2CQuavWUxf6XRmCdNul8=@vger.kernel.org X-Gm-Message-State: AOJu0YwCqz5pGwNYzGKtjTvLPPO7sU1HC6VOmb2NvuiPAZ3S7K+XiKhW QQdJn1i6tp8LAXkgWkWuNDb4jtr0TehjLPKB4hG74xcskhP3aTOImreTJapVrGEn/Qc= X-Gm-Gg: ASbGncsBI/nHiEUQAIYc22SrzNfDG7Xz/rY7CMclN6Vfv6fAq077w220JqZmxwSQEDk tVoz2Tysi1K8tGBLlcEYAB9PwhHdCxK/HPoDTXHlLr7UaktVO3+vDSobZDkb7h+g6AJYcXZy8An w0y6+bxN+afYLTkQZSF1zeHi1uBGG4FIbYkvGDOQiJ4GuJRwhSMsTFF706MC4wU35G8qZ0Q0v/Y Fg+o/fqd6QkPcbipgIVDiHPxqnCjNyfLyIchL14sdqZ3sJyT7O451FHubG8oxozH/7nbjJjlGvf jih49XG1MKHErcMgl/fSn0BK2McjxQe8RUroJB6H6iiwvF76nYUGrUx4biM/BO7h5pvg9gUgtNL jew82prg8QUZJHtnxgndaC45gO1FcGJ9JK6Pnrk5rkLFvGTwALXUDm4pbbqEOjRgEykK88lrUUc 8HmnJYbwOO9NqB7ybhLsqY X-Google-Smtp-Source: AGHT+IEOr1v9/Pc9moQWAwZ23g36hWIfx3hLWAVY2mXXyIWQG5MuiZA0cHtu3NhIW7NhqiecKM4eWQ== X-Received: by 2002:a05:6000:2203:b0:42b:2a41:f2b with SMTP id ffacd0b85a97d-42b432b1d34mr2998046f8f.7.1762861116244; Tue, 11 Nov 2025 03:38:36 -0800 (PST) Received: from ho-tower-lan.lan ([185.48.77.170]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-42ac675cd25sm28133486f8f.22.2025.11.11.03.38.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Nov 2025 03:38:35 -0800 (PST) From: James Clark Date: Tue, 11 Nov 2025 11:37:59 +0000 Subject: [PATCH v10 5/5] perf docs: arm-spe: Document new SPE filtering features Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251111-james-perf-feat_spe_eft-v10-5-1e1b5bf2cd05@linaro.org> References: <20251111-james-perf-feat_spe_eft-v10-0-1e1b5bf2cd05@linaro.org> In-Reply-To: <20251111-james-perf-feat_spe_eft-v10-0-1e1b5bf2cd05@linaro.org> To: Catalin Marinas , Will Deacon , Mark Rutland , Jonathan Corbet , Marc Zyngier , Oliver Upton , Joey Gouly , Suzuki K Poulose , Zenghui Yu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Leo Yan , Anshuman Khandual Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-doc@vger.kernel.org, kvmarm@lists.linux.dev, James Clark X-Mailer: b4 0.14.0 FEAT_SPE_EFT and FEAT_SPE_FDS etc have new user facing format attributes so document them. Also document existing 'event_filter' bits that were missing from the doc and the fact that latency values are stored in the weight field. Reviewed-by: Leo Yan Tested-by: Leo Yan Reviewed-by: Ian Rogers Signed-off-by: James Clark --- tools/perf/Documentation/perf-arm-spe.txt | 104 ++++++++++++++++++++++++++= +--- 1 file changed, 95 insertions(+), 9 deletions(-) diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documen= tation/perf-arm-spe.txt index cda8dd47fc4d..8b02e5b983fa 100644 --- a/tools/perf/Documentation/perf-arm-spe.txt +++ b/tools/perf/Documentation/perf-arm-spe.txt @@ -141,27 +141,65 @@ Config parameters These are placed between the // in the event and comma separated. For exam= ple '-e arm_spe/load_filter=3D1,min_latency=3D10/' =20 - branch_filter=3D1 - collect branches only (PMSFCR.B) - event_filter=3D - filter on specific events (PMSEVFR) - see bitfie= ld description below + event_filter=3D - logical AND filter on specific events (PMSEVFR) = - see bitfield description below + inv_event_filter=3D - logical OR to filter out specific events (PM= SNEVFR, FEAT_SPEv1p2) - see bitfield description below jitter=3D1 - use jitter to avoid resonance when sampling (PMS= IRR.RND) - load_filter=3D1 - collect loads only (PMSFCR.LD) min_latency=3D - collect only samples with this latency or higher= * (PMSLATFR) pa_enable=3D1 - collect physical address (as well as VA) of load= s/stores (PMSCR.PA) - requires privilege pct_enable=3D1 - collect physical timestamp instead of virtual ti= mestamp (PMSCR.PCT) - requires privilege - store_filter=3D1 - collect stores only (PMSFCR.ST) ts_enable=3D1 - enable timestamping with value of generic timer = (PMSCR.TS) discard=3D1 - enable SPE PMU events but don't collect sample d= ata - see 'Discard mode' (PMBLIMITR.FM =3D DISCARD) + inv_data_src_filter=3D - mask to filter from 0-63 possible data so= urces (PMSDSFR, FEAT_SPE_FDS) - See 'Data source filtering' =20 +++*+++ Latency is the total latency from the point at which sampling star= ted on that instruction, rather than only the execution latency. =20 -Only some events can be filtered on; these include: - - bit 1 - instruction retired (i.e. omit speculative instructions) +Only some events can be filtered on using 'event_filter' bits. The overall +filter is the logical AND of these bits, for example if bits 3 and 5 are s= et +only samples that have both 'L1D cache refill' AND 'TLB walk' are recorded= . When +FEAT_SPEv1p2 is implemented 'inv_event_filter' can also be used to exclude +events that have any (OR) of the filter's bits set. For example setting bi= ts 3 +and 5 in 'inv_event_filter' will exclude any events that are either L1D ca= che +refill OR TLB walk. If the same bit is set in both filters it's UNPREDICTA= BLE +whether the sample is included or excluded. Filter bits for both event_fil= ter +and inv_event_filter are: + + bit 1 - Instruction retired (i.e. omit speculative instructions) + bit 2 - L1D access (FEAT_SPEv1p4) bit 3 - L1D refill + bit 4 - TLB access (FEAT_SPEv1p4) bit 5 - TLB refill - bit 7 - mispredict - bit 11 - misaligned access + bit 6 - Not taken event (FEAT_SPEv1p2) + bit 7 - Mispredict + bit 8 - Last level cache access (FEAT_SPEv1p4) + bit 9 - Last level cache miss (FEAT_SPEv1p4) + bit 10 - Remote access (FEAT_SPEv1p4) + bit 11 - Misaligned access (FEAT_SPEv1p1) + bit 12-15 - IMPLEMENTATION DEFINED events (when implemented) + bit 16 - Transaction (FEAT_TME) + bit 17 - Partial or empty SME or SVE predicate (FEAT_SPEv1p1) + bit 18 - Empty SME or SVE predicate (FEAT_SPEv1p1) + bit 19 - L2D access (FEAT_SPEv1p4) + bit 20 - L2D miss (FEAT_SPEv1p4) + bit 21 - Cache data modified (FEAT_SPEv1p4) + bit 22 - Recently fetched (FEAT_SPEv1p4) + bit 23 - Data snooped (FEAT_SPEv1p4) + bit 24 - Streaming SVE mode event (when FEAT_SPE_SME is implemented),= or + IMPLEMENTATION DEFINED event 24 (when implemented, only vers= ions + less than FEAT_SPEv1p4) + bit 25 - SMCU or external coprocessor operation event when FEAT_SPE_S= ME is + implemented, or IMPLEMENTATION DEFINED event 25 (when implem= ented, + only versions less than FEAT_SPEv1p4) + bit 26-31 - IMPLEMENTATION DEFINED events (only versions less than FEAT_= SPEv1p4) + bit 48-63 - IMPLEMENTATION DEFINED events (when implemented) + +For IMPLEMENTATION DEFINED bits, refer to the CPU TRM if these bits are +implemented. + +The driver will reject events if requested filter bits require unimplement= ed SPE +versions, but will not reject filter bits for unimplemented IMPDEF bits or= when +their related feature is not present (e.g. SME). For example, if FEAT_SPEv= 1p2 is +not implemented, filtering on "Not taken event" (bit 6) will be rejected. =20 So to sample just retired instructions: =20 @@ -171,6 +209,31 @@ or just mispredicted branches: =20 perf record -e arm_spe/event_filter=3D0x80/ -- ./mybench =20 +When set, the following filters can be used to select samples that match a= ny of +the operation types (OR filtering). If only one is set then only samples o= f that +type are collected: + + branch_filter=3D1 - Collect branches (PMSFCR.B) + load_filter=3D1 - Collect loads (PMSFCR.LD) + store_filter=3D1 - Collect stores (PMSFCR.ST) + +When extended filtering is supported (FEAT_SPE_EFT), SIMD and float +pointer operations can also be selected: + + simd_filter=3D1 - Collect SIMD loads, stores and operations (PMS= FCR.SIMD) + float_filter=3D1 - Collect floating point loads, stores and opera= tions (PMSFCR.FP) + +When extended filtering is supported (FEAT_SPE_EFT), operation type filter= s can +be changed to AND using _mask fields. For example samples could be selecte= d if +they are store AND SIMD by setting 'store_filter=3D1,simd_filter=3D1, +store_filter_mask=3D1,simd_filter_mask=3D1'. The new masks are as follows: + + branch_filter_mask=3D1 - Change branch filter behavior from OR to AND (= PMSFCR.Bm) + load_filter_mask=3D1 - Change load filter behavior from OR to AND (PM= SFCR.LDm) + store_filter_mask=3D1 - Change store filter behavior from OR to AND (P= MSFCR.STm) + simd_filter_mask=3D1 - Change SIMD filter behavior from OR to AND (PM= SFCR.SIMDm) + float_filter_mask=3D1 - Change floating point filter behavior from OR = to AND (PMSFCR.FPm) + Viewing the data ~~~~~~~~~~~~~~~~~ =20 @@ -210,6 +273,10 @@ Memory access details are also stored on the samples a= nd this can be viewed with =20 perf report --mem-mode =20 +The latency value from the SPE sample is stored in the 'weight' field of t= he +Perf samples and can be displayed in Perf script and report outputs by ena= bling +its display from the command line. + Common errors ~~~~~~~~~~~~~ =20 @@ -253,6 +320,25 @@ to minimize output. Then run perf stat: perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null= & perf stat -e SAMPLE_FEED_LD =20 +Data source filtering +~~~~~~~~~~~~~~~~~~~~~ + +When FEAT_SPE_FDS is present, 'inv_data_src_filter' can be used as a mask = to +filter on a subset (0 - 63) of possible data source IDs. The full range of= data +sources is 0 - 65535 although these are unlikely to be used in practice. D= ata +sources are IMPDEF so refer to the TRM for the mappings. Each bit N of the +filter maps to data source N. The filter is an OR of all the bits, and the= value +provided inv_data_src_filter is inverted before writing to PMSDSFR_EL1 so = that +set bits exclude that data source and cleared bits include that data sourc= e. +Therefore the default value of 0 is equivalent to no filtering (all data s= ources +included). + +For example, to include only data sources 0 and 3, clear bits 0 and 3 +(0xFFFFFFFFFFFFFFF6) + +When 'inv_data_src_filter' is set to 0xFFFFFFFFFFFFFFFF, any samples with = any +data source set are excluded. + SEE ALSO -------- =20 --=20 2.34.1