From nobody Sat Feb 22 00:03:52 2025 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA13F20CCEA for ; Thu, 13 Feb 2025 09:08:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437717; cv=none; b=IPO4g5eN8diU5O4n/X5CpASc/1JllJgw5/rt0nziYXkM/qdiitB0ctNO+RUL6j9iupzzx90r3l3IuLi+jjG3aBKe2JZuPbQEY9i8MpDVIMn5FRw7q+EhgfFQ8oC8tejTsmj92cUftmBUmttUrGTYXnB9m66lcxvNCjjs5UhiDec= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437717; c=relaxed/simple; bh=DwGz+VSOEGNrHVXabO2VP39G++ha4Ni3nAPmzKDN9tI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=H9eNfdoUxWafrrIJbC7oe3YxoWXPVi2AL3y37hC+kDYWEV0Op/+FOs/9aleyYQFWmc+skbWmiwf5qq+qAKYidMCtnxELqrkwSg1qRNv55DsHIjOsUJno0d7A6eGP8d3GP7fpkCw7p7pAQza+vTQRBHo2FIdRg39k5Erl1/NdnGU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=lxrxyGcp; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="lxrxyGcp" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-438da39bb69so4627735e9.0 for ; Thu, 13 Feb 2025 01:08:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739437713; x=1740042513; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=VAwgMnA3DPssQgMLBmkgJOfyrB3zb9gYbZ4ecfMuZYg=; b=lxrxyGcptL7Usw7Wir+TrhRRK4qBKRMVZ3xW1JPxriLECznQeKO287O26h+RbUNRQq +leuyCKDsv6UiUbbOb/WIypSeRs448YgIi2j2Arq9vH+IVfdypNMVhum/ZNGKaRxgdBn zdYNov85AYomBf9sdl+C9aZEvztulmizmvgRex8FA54DdyhDNusmbbftGag7yIsfdo7s RkZ3e3Y0DhV260XjChML1Ow4uFYBqkzm7yHg36cD7lnrTdb35sBrxJgOhsR/A5s3aB0O rlYnSKWtjAYxj4jnhPAib7N/BbT8rYc4Rxer+jrgaYNFV4uiKBQ3qjnJr1tSFcLzAUoS MMCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739437713; x=1740042513; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VAwgMnA3DPssQgMLBmkgJOfyrB3zb9gYbZ4ecfMuZYg=; b=hxguscN+JoSl6Q3ggHWdomXYAZJy1qXXMeSB9yvfWhCfPXl9+DnHxh0hw4W8rzeBY8 YjSJo+vHmBS+FAqEhbm0S98XeIEYlmcJYHkMnJ2YzOOsqWzlU5K0Be0EkbgcoBIA4kSL jdO/UwHmefoq1OFqyUpmD3Yh4TSGr7GGoI1DwwDnrVAY5tInjXJbQg43J9jfE7EUnYkV DGHakyfOQ4HrTEilldcoHHvL8KjMmH/CsuAhsbaAf1ERe4a0YBrlfwuYW162TaTRae2f m6stJhZ1sg1zWNpNUot6H4DSXCwpYE/1/BQ0ggJ8qIcrPA8iVU9EZ3tZFwoxhAEtYIFp fV6w== X-Forwarded-Encrypted: i=1; AJvYcCUXDHn2q6uZBf89CAgJm5otMs4Za0kFURLbnLMI5q0GyFqPLRQYDOClC/zCJfKGMA6SwtP4YcUlXS8NP9A=@vger.kernel.org X-Gm-Message-State: AOJu0YweiFja3BHcGB1EBuXmkx94ZTMmfBpnf6sl6R+T0xBS1KKCK3FL vGLJFC/AwCsPlwGS15EjnWkDnEYdpEWJLIhS5kdK/6VEMpQugWiu8kw40RdtPGjkaucq8PpFGSj ybATEZA== X-Google-Smtp-Source: AGHT+IHITxXoY73MysFl5c5WcinYi/Q4/VDUYpm6Cd+cd58+lEq8ufzDgzTx5Cex3FOvb77XUTXeME6mypKb X-Received: from wmbeo3.prod.google.com ([2002:a05:600c:82c3:b0:435:f81b:bac8]) (user=dvyukov job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4fc9:b0:435:32e:8270 with SMTP id 5b1f17b1804b1-43958176b4emr67391965e9.14.1739437713287; Thu, 13 Feb 2025 01:08:33 -0800 (PST) Date: Thu, 13 Feb 2025 10:08:14 +0100 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.48.1.502.g6dc24dfdaf-goog Message-ID: <0f8c1b8eb12619029e31b3d5c0346f4616a5aeda.1739437531.git.dvyukov@google.com> Subject: [PATCH v7 1/9] perf report: Add machine parallelism From: Dmitry Vyukov To: namhyung@kernel.org, irogers@google.com, acme@kernel.org, ak@linux.intel.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Dmitry Vyukov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add calculation of the current parallelism level (number of threads actively running on CPUs). The parallelism level can be shown in reports on its own, and to calculate latency overheads. Signed-off-by: Dmitry Vyukov Cc: Namhyung Kim Cc: Arnaldo Carvalho de Melo Cc: Ian Rogers Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Andi Kleen --- tools/perf/builtin-report.c | 1 + tools/perf/util/addr_location.c | 1 + tools/perf/util/addr_location.h | 2 ++ tools/perf/util/event.c | 3 +++ tools/perf/util/machine.c | 7 +++++++ tools/perf/util/machine.h | 6 ++++++ 6 files changed, 20 insertions(+) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index f5fbd670d619a..0d9bd090eda71 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1568,6 +1568,7 @@ int cmd_report(int argc, const char **argv) report.tool.cgroup =3D perf_event__process_cgroup; report.tool.exit =3D perf_event__process_exit; report.tool.fork =3D perf_event__process_fork; + report.tool.context_switch =3D perf_event__process_switch; report.tool.lost =3D perf_event__process_lost; report.tool.read =3D process_read_event; report.tool.attr =3D process_attr; diff --git a/tools/perf/util/addr_location.c b/tools/perf/util/addr_locatio= n.c index 51825ef8c0ab7..007a2f5df9a6a 100644 --- a/tools/perf/util/addr_location.c +++ b/tools/perf/util/addr_location.c @@ -17,6 +17,7 @@ void addr_location__init(struct addr_location *al) al->cpumode =3D 0; al->cpu =3D 0; al->socket =3D 0; + al->parallelism =3D 1; } =20 /* diff --git a/tools/perf/util/addr_location.h b/tools/perf/util/addr_locatio= n.h index d8ac0428dff23..36aaa45445f24 100644 --- a/tools/perf/util/addr_location.h +++ b/tools/perf/util/addr_location.h @@ -21,6 +21,8 @@ struct addr_location { u8 cpumode; s32 cpu; s32 socket; + /* Same as machine.parallelism but within [1, nr_cpus]. */ + int parallelism; }; =20 void addr_location__init(struct addr_location *al); diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index aac96d5d19170..2f10e31157572 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -767,6 +767,9 @@ int machine__resolve(struct machine *machine, struct ad= dr_location *al, al->socket =3D env->cpu[al->cpu].socket_id; } =20 + /* Account for possible out-of-order switch events. */ + al->parallelism =3D max(1, min(machine->parallelism, machine__nr_cpus_ava= il(machine))); + if (al->map) { if (symbol_conf.dso_list && (!dso || !(strlist__has_entry(symbol_conf.dso_list, diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 321cc110698c4..d6fb739e9a3f4 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -94,6 +94,8 @@ int machine__init(struct machine *machine, const char *ro= ot_dir, pid_t pid) machine->comm_exec =3D false; machine->kernel_start =3D 0; machine->vmlinux_map =3D NULL; + /* There is no initial context switch in, so we start at 1. */ + machine->parallelism =3D 1; =20 machine->root_dir =3D strdup(root_dir); if (machine->root_dir =3D=3D NULL) @@ -677,8 +679,11 @@ int machine__process_aux_output_hw_id_event(struct mac= hine *machine __maybe_unus int machine__process_switch_event(struct machine *machine __maybe_unused, union perf_event *event) { + bool out =3D event->header.misc & PERF_RECORD_MISC_SWITCH_OUT; + if (dump_trace) perf_event__fprintf_switch(event, stdout); + machine->parallelism +=3D out ? -1 : 1; return 0; } =20 @@ -1880,6 +1885,8 @@ int machine__process_exit_event(struct machine *machi= ne, union perf_event *event if (dump_trace) perf_event__fprintf_task(event, stdout); =20 + /* There is no context switch out before exit, so we decrement here. */ + machine->parallelism--; if (thread !=3D NULL) { if (symbol_conf.keep_exited_threads) thread__set_exited(thread, /*exited=3D*/true); diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h index ae3e5542d57df..b56abec84fed1 100644 --- a/tools/perf/util/machine.h +++ b/tools/perf/util/machine.h @@ -50,6 +50,12 @@ struct machine { u64 text_start; u64 text_end; } sched, lock, traceiter, trace; + /* + * The current parallelism level (number of threads that run on CPUs). + * This value can be less than 1, or larger than the total number + * of CPUs, if events are poorly ordered. + */ + int parallelism; pid_t *current_tid; size_t current_tid_sz; union { /* Tool specific area */ --=20 2.48.1.502.g6dc24dfdaf-goog From nobody Sat Feb 22 00:03:52 2025 Received: from mail-ej1-f73.google.com (mail-ej1-f73.google.com [209.85.218.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8507520D4F6 for ; Thu, 13 Feb 2025 09:08:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437719; cv=none; b=q/vn558JF7A3IgWxKiyPD74vvUwCT/G5jNSPWUT+KIrAH4snfFviCg3/GeyvyHFYPDnajrAUtBzvmon0tiVDANHkb2UuDy1x7yBFyTo9IyQ+raXcdx3M2w4y0fxjfB/5GGvhQcjDsbYtolwDgRPQdxKaMD73b4cYfEmPbGHyK8o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437719; c=relaxed/simple; bh=bVdq4hc9YGjBJ7nZgP42O6s9vw1vEoorGjpbdnRHngo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=hyfIHwMRgR5cz5pUu0tE43j7QwF9SKA4jK711H5cDdinqGYDJMoXAc/6CfpqoFlnEofEaRfDZopRmyMSsJReh6WLSjRrUpq36XwAs1oGNjchcfJzl3B8QIqiqXlfujbWvR7qaL0bCUiqItc4LZ5Oo+R4vmxw6LAC7K9ygwkRR6o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=wJA0LUuO; arc=none smtp.client-ip=209.85.218.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wJA0LUuO" Received: by mail-ej1-f73.google.com with SMTP id a640c23a62f3a-ab7ce0ccd78so45128166b.0 for ; Thu, 13 Feb 2025 01:08:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739437716; x=1740042516; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=YImqwGUqA/hzzpmvQV3kbZOr/KKA/KHBuOuOVe6LOyc=; b=wJA0LUuO93AM3uwK+U22/0MB/u/jcKM39UI93513ziVia3mVFTU8zHWjePobELHHZW W3tsueQ0TpBLOCu1KBURNpGX9iHa9tIq8+EsZpsbtANcCYrRhx4RvTiQF7XrtxAf8+oj hwX1yWsd+PFHpHctGU+AKlwr2hxvbKy+/UQ0XlOo81Hzui7RK8+t5QImZeBS9OYDEQpt FqZOL9bfxN1KTyOmFJcJ5myA4Z6x/HKULVIeV0HJ1VmfwQuuWn2bOJ8OvXwGJWBRNHa8 694QB515zb5xFQTggwFoO5ZX/aSnIGHHAVsO4J1lGVe513mvZ7bUV0bqBaVwkl1BhpW5 TyZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739437716; x=1740042516; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YImqwGUqA/hzzpmvQV3kbZOr/KKA/KHBuOuOVe6LOyc=; b=o7mf7XZfUfL3rxBZofyQWBtLmhcqpAyFW3p0pYgBrmdszUWKpuX5R89VBYE5xEyvub j8wcyLJbFMvpo7CdmTq/PxjxaP6NAcCaOcFzFyd6Noqxt/52G4cCFQ17Y8dgvvBCewbN oIZRNuoaKDxwiihGLS7Z2y9pDiXDM8bo2GjJAYU4RiZbgddiHJA71zVcJkM1FX3UGe3B 5HpIpRtLZ+RDlS0fsptorFW7/KUuohBJXtX67U1SzbmFjoREROxwYwC5Cyss+qxEZeki 9h1+qHJ6GX96NWN4WXDjl2/DX7vdngX21OgMwDS/4js0TTXYGbJx4VKnIonbc60EJ6PF 58Kg== X-Forwarded-Encrypted: i=1; AJvYcCVlhZG2igcF/LfTNdZbN1YDEXiRbFJBM2mF7/NUTP2Kc+TSV91d0BBtFO/LAoKqDwICPpvDXkPHv6fY0BE=@vger.kernel.org X-Gm-Message-State: AOJu0YxzG7dumXVaPTqsquOrh6NPJx8OiDHbDh5VxIUjsoc7zTKiKRxc 8PAyjf/8CSZEt5wjPDlBKyz/J8Q3eMRNbDAOEtS5jds/jUlvjn2VNKgitYFoJKR7NriLA+/yo3X tzcVtEw== X-Google-Smtp-Source: AGHT+IHp6fogdD+S6Hr+YYsieMIfeysCwTyXK8mnMYssfRsQHL07WiYc+yyDnM2Zv/ZhRDKkzn8fEPgbzLoe X-Received: from ejdv10.prod.google.com ([2002:a17:906:292a:b0:ab7:b9bd:ec2b]) (user=dvyukov job=prod-delivery.src-stubby-dispatcher) by 2002:a17:906:fe02:b0:ab7:c11:a980 with SMTP id a640c23a62f3a-aba510aecf0mr173431866b.17.1739437715875; Thu, 13 Feb 2025 01:08:35 -0800 (PST) Date: Thu, 13 Feb 2025 10:08:15 +0100 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.48.1.502.g6dc24dfdaf-goog Message-ID: <7f7bb87cbaa51bf1fb008a0d68b687423ce4bad4.1739437531.git.dvyukov@google.com> Subject: [PATCH v7 2/9] perf report: Add parallelism sort key From: Dmitry Vyukov To: namhyung@kernel.org, irogers@google.com, acme@kernel.org, ak@linux.intel.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Dmitry Vyukov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Show parallelism level in profiles if requested by user. Signed-off-by: Dmitry Vyukov Cc: Namhyung Kim Cc: Arnaldo Carvalho de Melo Cc: Ian Rogers Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Andi Kleen --- tools/perf/builtin-report.c | 11 +++++++++++ tools/perf/util/hist.c | 2 ++ tools/perf/util/hist.h | 3 +++ tools/perf/util/session.c | 12 ++++++++++++ tools/perf/util/session.h | 1 + tools/perf/util/sort.c | 23 +++++++++++++++++++++++ tools/perf/util/sort.h | 1 + 7 files changed, 53 insertions(+) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 0d9bd090eda71..14d49f0625881 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1720,6 +1720,17 @@ int cmd_report(int argc, const char **argv) symbol_conf.annotate_data_sample =3D true; } =20 + if (report.disable_order || !perf_session__has_switch_events(session)) { + if ((sort_order && strstr(sort_order, "parallelism")) || + (field_order && strstr(field_order, "parallelism"))) { + if (report.disable_order) + ui__error("Use of parallelism is incompatible with --disable-order.\n"= ); + else + ui__error("Use of parallelism requires --switch-events during record.\= n"); + return -1; + } + } + if (sort_order && strstr(sort_order, "ipc")) { parse_options_usage(report_usage, options, "s", 1); goto error; diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 0f30f843c566d..cafd693568189 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -207,6 +207,7 @@ void hists__calc_col_len(struct hists *hists, struct hi= st_entry *h) =20 hists__new_col_len(hists, HISTC_CGROUP, 6); hists__new_col_len(hists, HISTC_CGROUP_ID, 20); + hists__new_col_len(hists, HISTC_PARALLELISM, 11); hists__new_col_len(hists, HISTC_CPU, 3); hists__new_col_len(hists, HISTC_SOCKET, 6); hists__new_col_len(hists, HISTC_MEM_LOCKED, 6); @@ -741,6 +742,7 @@ __hists__add_entry(struct hists *hists, .ip =3D al->addr, .level =3D al->level, .code_page_size =3D sample->code_page_size, + .parallelism =3D al->parallelism, .stat =3D { .nr_events =3D 1, .period =3D sample->period, diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 46c8373e31465..a6e662d77dc24 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -42,6 +42,7 @@ enum hist_column { HISTC_CGROUP_ID, HISTC_CGROUP, HISTC_PARENT, + HISTC_PARALLELISM, HISTC_CPU, HISTC_SOCKET, HISTC_SRCLINE, @@ -228,6 +229,7 @@ struct hist_entry { u64 transaction; s32 socket; s32 cpu; + int parallelism; u64 code_page_size; u64 weight; u64 ins_lat; @@ -580,6 +582,7 @@ bool perf_hpp__is_thread_entry(struct perf_hpp_fmt *fmt= ); bool perf_hpp__is_comm_entry(struct perf_hpp_fmt *fmt); bool perf_hpp__is_dso_entry(struct perf_hpp_fmt *fmt); bool perf_hpp__is_sym_entry(struct perf_hpp_fmt *fmt); +bool perf_hpp__is_parallelism_entry(struct perf_hpp_fmt *fmt); =20 struct perf_hpp_fmt *perf_hpp_fmt__dup(struct perf_hpp_fmt *fmt); =20 diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index c06e3020a9769..00fcf8d8ac255 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -2403,6 +2403,18 @@ bool perf_session__has_traces(struct perf_session *s= ession, const char *msg) return false; } =20 +bool perf_session__has_switch_events(struct perf_session *session) +{ + struct evsel *evsel; + + evlist__for_each_entry(session->evlist, evsel) { + if (evsel->core.attr.context_switch) + return true; + } + + return false; +} + int map__set_kallsyms_ref_reloc_sym(struct map *map, const char *symbol_na= me, u64 addr) { char *bracket; diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h index bcf1bcf06959b..db1c120a9e67f 100644 --- a/tools/perf/util/session.h +++ b/tools/perf/util/session.h @@ -141,6 +141,7 @@ int perf_session__resolve_callchain(struct perf_session= *session, struct symbol **parent); =20 bool perf_session__has_traces(struct perf_session *session, const char *ms= g); +bool perf_session__has_switch_events(struct perf_session *session); =20 void perf_event__attr_swap(struct perf_event_attr *attr); =20 diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 3dd33721823f3..7eef43f5be360 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -892,6 +892,27 @@ struct sort_entry sort_cpu =3D { .se_width_idx =3D HISTC_CPU, }; =20 +/* --sort parallelism */ + +static int64_t +sort__parallelism_cmp(struct hist_entry *left, struct hist_entry *right) +{ + return right->parallelism - left->parallelism; +} + +static int hist_entry__parallelism_snprintf(struct hist_entry *he, char *b= f, + size_t size, unsigned int width) +{ + return repsep_snprintf(bf, size, "%*d", width, he->parallelism); +} + +struct sort_entry sort_parallelism =3D { + .se_header =3D "Parallelism", + .se_cmp =3D sort__parallelism_cmp, + .se_snprintf =3D hist_entry__parallelism_snprintf, + .se_width_idx =3D HISTC_PARALLELISM, +}; + /* --sort cgroup_id */ =20 static int64_t _sort__cgroup_dev_cmp(u64 left_dev, u64 right_dev) @@ -2534,6 +2555,7 @@ static struct sort_dimension common_sort_dimensions[]= =3D { DIM(SORT_ANNOTATE_DATA_TYPE_OFFSET, "typeoff", sort_type_offset), DIM(SORT_SYM_OFFSET, "symoff", sort_sym_offset), DIM(SORT_ANNOTATE_DATA_TYPE_CACHELINE, "typecln", sort_type_cacheline), + DIM(SORT_PARALLELISM, "parallelism", sort_parallelism), }; =20 #undef DIM @@ -2735,6 +2757,7 @@ MK_SORT_ENTRY_CHK(thread) MK_SORT_ENTRY_CHK(comm) MK_SORT_ENTRY_CHK(dso) MK_SORT_ENTRY_CHK(sym) +MK_SORT_ENTRY_CHK(parallelism) =20 =20 static bool __sort__hpp_equal(struct perf_hpp_fmt *a, struct perf_hpp_fmt = *b) diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index a8572574e1686..11fb15f914093 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -72,6 +72,7 @@ enum sort_type { SORT_ANNOTATE_DATA_TYPE_OFFSET, SORT_SYM_OFFSET, SORT_ANNOTATE_DATA_TYPE_CACHELINE, + SORT_PARALLELISM, =20 /* branch stack specific sort keys */ __SORT_BRANCH_STACK, --=20 2.48.1.502.g6dc24dfdaf-goog From nobody Sat Feb 22 00:03:52 2025 Received: from mail-ej1-f74.google.com (mail-ej1-f74.google.com [209.85.218.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF1DD20C496 for ; Thu, 13 Feb 2025 09:08:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437721; cv=none; b=PwHPSPtK+wEwJ0bgZKksyW/y0EPstv85pXGUyky3HUxeR9CSuO36K2cX5LnF3jRr+iBRnIP0Hr/wIJugFSaefyC70CtZZWqdI3O+boHtX1SOuL/zNpe6OiLtKjzRkQ4I4M169EaoxDy23KNiA9m1iil3ORm/2pmwPW4fB2Ot88w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437721; c=relaxed/simple; bh=JrTATGJXLXPnfT36NRdWEwRMkJqStLNhLoyrpg87QNc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=r6FkEVpcOc5/JniVoA06/8Iol5QwDgfxqZV9PJUXZyIZ8xzbP8hn0FjU6fDrIFvuusKUI5DLQm+GHtUr8GAhECbg5PEF4XowDG+EtME31TY7KGAWcx01D4ce41TMpe0etxbOEpOHrvcMwLlSdHMBjyEamFPLQtHk4f5AJaP91vE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=JRF2hl8P; arc=none smtp.client-ip=209.85.218.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="JRF2hl8P" Received: by mail-ej1-f74.google.com with SMTP id a640c23a62f3a-ab7eb66ea19so68058366b.1 for ; Thu, 13 Feb 2025 01:08:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739437718; x=1740042518; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mjAFE95s2BudHj4Rh3WJBU6cX4YpZYt+tVJ8vz3xHhE=; b=JRF2hl8P1SI7/umMNjTPMuU9/9nXp9Wo8xBL6967oBCwX5cWX6zyTwhyHRCFn29DBA HQJJJ9PYWyGzufpItmr6VjPG4SFV3tlDUqL7YjViGR8Z6EqQ9EyjgeDV5hxqHh0SepGb HrA6uTxq3WdZn/eSeRV8kQ2atjlUiuAvp2JOBLzvofIJWep4vTkmOPtyrlNY62g0MPOg kXTpnKOCDmiBbfM+aQ9hchV8rwvhNzmT4W/vx8lDhoKudwbACOZwcfiqR7UHn+wwaxNm qO00MI9l0F+dnS4R3oJW3UytlZ9S4geJikZTb2WDyQvMOEFhLJn63LXrWtuA7Qxc8GPn E+ZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739437718; x=1740042518; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mjAFE95s2BudHj4Rh3WJBU6cX4YpZYt+tVJ8vz3xHhE=; b=HtLBOvedHL0saJUocGsQZiecq7PHeynWSHfEFtFlQnuTwMIJpLWQ1sbLd4qRdGPaVI jzakaERvABTh2OfTUF8bUpUjpmTphmaYuDunb7D/5Puo784AKYrpUjojK/bGUXt5UwYR 8L/AWuKqNKwfv/MZE96kzCjmKgvNwGGlXKNtPbttLToLNWzC5I04n7txpaxOD2oouw3c hdb5YCP1NDmBIPop2Ms3Ir8F260FJsTsUVsU5WrBjqI1Fu73MTvBE1c94cw5icBLmBSs MEDG5ntwi043T23BOOuurfE1KCT68yP7p4PaKrpZ8TFnXzhl2YKAyu8skz8vspAk5hYL LMhg== X-Forwarded-Encrypted: i=1; AJvYcCWS0XCJG76HkhC/Cl5KkQPybb0L3mEvRN99wRnP0zwxyE/nQFggWzlIXSALkOaYxZW1tiUUS0c9zf/ikTQ=@vger.kernel.org X-Gm-Message-State: AOJu0YwsJT0EZKDSJLdi3UxiWswDFuO0LLViedZJLqo2peUMCqQNdj4H oKukfi27b88W1U2vimW3bayBOiy+o2XTzgMWmYgYU33Aph1Ge/HkyloBJKQITIwrMOhotKB8v8B d9luoDg== X-Google-Smtp-Source: AGHT+IH9LwpfT04DDWtCH5j94A3cuMLdIHZO6FgVgF/BHCBKgnawOwnbB2bOallL0yYB74DsbmHNq/kAgJGD X-Received: from ejdv2.prod.google.com ([2002:a17:906:2922:b0:ab7:a6ca:4d72]) (user=dvyukov job=prod-delivery.src-stubby-dispatcher) by 2002:a17:907:7f27:b0:aa6:a7ef:7f1f with SMTP id a640c23a62f3a-ab7f3325255mr587122766b.11.1739437718375; Thu, 13 Feb 2025 01:08:38 -0800 (PST) Date: Thu, 13 Feb 2025 10:08:16 +0100 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.48.1.502.g6dc24dfdaf-goog Message-ID: <32b4ce1731126c88a2d9e191dc87e39ae4651cb7.1739437531.git.dvyukov@google.com> Subject: [PATCH v7 3/9] perf report: Switch filtered from u8 to u16 From: Dmitry Vyukov To: namhyung@kernel.org, irogers@google.com, acme@kernel.org, ak@linux.intel.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Dmitry Vyukov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We already have all u8 bits taken, adding one more filter leads to unpleasa= nt failure mode, where code compiles w/o warnings, but the last filters silent= ly don't work. Add a typedef and switch to u16. Signed-off-by: Dmitry Vyukov Cc: Namhyung Kim Cc: Arnaldo Carvalho de Melo Cc: Ian Rogers Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Andi Kleen --- tools/perf/util/addr_location.h | 3 ++- tools/perf/util/hist.c | 2 +- tools/perf/util/hist.h | 4 +++- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/addr_location.h b/tools/perf/util/addr_locatio= n.h index 36aaa45445f24..f83d74e370b2f 100644 --- a/tools/perf/util/addr_location.h +++ b/tools/perf/util/addr_location.h @@ -3,6 +3,7 @@ #define __PERF_ADDR_LOCATION 1 =20 #include +#include "hist.h" =20 struct thread; struct maps; @@ -17,7 +18,7 @@ struct addr_location { const char *srcline; u64 addr; char level; - u8 filtered; + filter_mask_t filtered; u8 cpumode; s32 cpu; s32 socket; diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index cafd693568189..6b8f8da8d3b66 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -585,7 +585,7 @@ static struct hist_entry *hist_entry__new(struct hist_e= ntry *template, return he; } =20 -static u8 symbol__parent_filter(const struct symbol *parent) +static filter_mask_t symbol__parent_filter(const struct symbol *parent) { if (symbol_conf.exclude_other && parent =3D=3D NULL) return 1 << HIST_FILTER__PARENT; diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index a6e662d77dc24..4035106a74087 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -33,6 +33,8 @@ enum hist_filter { HIST_FILTER__C2C, }; =20 +typedef u16 filter_mask_t; + enum hist_column { HISTC_SYMBOL, HISTC_TIME, @@ -244,7 +246,7 @@ struct hist_entry { bool leaf; =20 char level; - u8 filtered; + filter_mask_t filtered; =20 u16 callchain_size; union { --=20 2.48.1.502.g6dc24dfdaf-goog From nobody Sat Feb 22 00:03:52 2025 Received: from mail-ej1-f73.google.com (mail-ej1-f73.google.com [209.85.218.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 21A3F20E309 for ; Thu, 13 Feb 2025 09:08:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437724; cv=none; b=CqUcS0HRv/7soWfDJSiPX5Izq77rrCN0xr2oDvSULrepiD9o02+qgGw8OiCeSLWSbCzDL4UKs+ZNne2c+llfa+FkAxBZ53i8Ka612A7Vn255dgf+Q+juQENW8DezUXFgx4weHPaJGPPG+o57QZaZk6K88dwtTnHSuxnD4smA9go= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437724; c=relaxed/simple; bh=9gukdcfV9Nf9Mc/TOrp11xJ/gBhYjmqkZTKk9PTNb30=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PFHSvAt4BLGnHIulgAsxnjOuyqnEc7Iw6lJdxO1D9fneM82eWox28SK3/B6ACS/p6pIH2F1amGESsHnrLqEH2jX3GGkGuFuRkwkgIDoqDXHvg9sOA2Xqu8CgS2l2qpOAM0GHhpXSMgfBYBS9MPO5sd+IF3ZUswHEcD77CvbPBCY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=RTbuTTxg; arc=none smtp.client-ip=209.85.218.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="RTbuTTxg" Received: by mail-ej1-f73.google.com with SMTP id a640c23a62f3a-ab7e1beaa45so73780966b.1 for ; Thu, 13 Feb 2025 01:08:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739437721; x=1740042521; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=+9f+e9kCTsyr4pVok6kl9UHneh9IdRYlU7MPwMhOzrI=; b=RTbuTTxgj42tCsc0RiibASyaHCcGotDcIYWy92tBAUuoMnVliuswVCKt/U1WyzOC1V HsLOqVGYOXCFiALMR3lhOn67j8JKfJTSnPwSkLJ48m6GLW3qVRAcwlVu8vxinjdhx2nN mma8D5mksuxHBf5sNO4XWLIVMLyLGKAOnfFw8hNB1P+b0V2HVpbcoAdLvjSlj2wSeux9 txWEqePBUi7K/km2IYrG0ZNO63kbuGr+HNefqbhU5mM6TFORv9om42kAMERocx92TOk1 u8O3IFESiIX2M6IfnEIK26BgZemo3HkTZRlIKP7FR23pnAgZntFnBTicbXBuuLxYA70+ Esqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739437721; x=1740042521; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+9f+e9kCTsyr4pVok6kl9UHneh9IdRYlU7MPwMhOzrI=; b=mQffWQERoFOxhxBuBY1wKn3R4Rmw4IaowJW7n4jRtD2fQFii+N1wvcsgabGOtx8kk0 s+VXhRhTXBuekSsJAaoKrIa3atan76HW4kEmMMkVPZUyFAyk2vW9OXaePjSQlFM5K7W6 W+nvFWgZql8aJDAPF/T79czhxoL8uAIcGcQXywBORe86xgW4kj87dFFd51O3D0epxals EXKvml99WHQtygel5hMpChgJvF5HeHiN8JWX9ChgfurCcOKjb0XSDdNJ9RD9ea4KCqtZ lWsmNnS83HziqZkznsL70aCoHR3hEWXWv63imv0n7fO9m6xiSlV0cXONeMLlN+7RgADp Zthg== X-Forwarded-Encrypted: i=1; AJvYcCWmeQAFtA01DB59W4ZaZru0d0cyPDQFhYtLn0FroKr/wKcIt83rSUwjt2Y9Sr4hPRYGLTDixGiDbv/qm8A=@vger.kernel.org X-Gm-Message-State: AOJu0Yz/ewtiWSlkTfUI+pgjJxv5q85WqFpfZtYXacc+Qi317DBa+Baw VF4Sk+G4WKmRSs7a6TswxPjV2CSoBLsHalrr2qbXIZHZ0aM7SzSvnih/TqbA993Pmgyl2CoqFCR QFm9uVg== X-Google-Smtp-Source: AGHT+IFnZI34vFrDlF4QR8m8y6Jps3gdJLmqcSJyyQ/NTGFCYAh5/9C3IJgivuf4buDViN6P7KIFLY9nD+HL X-Received: from edbig14.prod.google.com ([2002:a05:6402:458e:b0:5db:68bc:eb3d]) (user=dvyukov job=prod-delivery.src-stubby-dispatcher) by 2002:a17:907:1c85:b0:aa6:9624:78f1 with SMTP id a640c23a62f3a-ab7f3714978mr662602066b.9.1739437720849; Thu, 13 Feb 2025 01:08:40 -0800 (PST) Date: Thu, 13 Feb 2025 10:08:17 +0100 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.48.1.502.g6dc24dfdaf-goog Message-ID: Subject: [PATCH v7 4/9] perf report: Add parallelism filter From: Dmitry Vyukov To: namhyung@kernel.org, irogers@google.com, acme@kernel.org, ak@linux.intel.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Dmitry Vyukov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add parallelism filter that can be used to look at specific parallelism levels only. The format is the same as cpu lists. For example: Only single-threaded samples: --parallelism=3D1 Low parallelism only: --parallelism=3D1-4 High parallelism only: --parallelism=3D64-128 Signed-off-by: Dmitry Vyukov Cc: Namhyung Kim Cc: Arnaldo Carvalho de Melo Cc: Ian Rogers Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Andi Kleen --- tools/perf/builtin-report.c | 5 ++++- tools/perf/util/event.c | 2 ++ tools/perf/util/hist.c | 31 +++++++++++++++++++++++++++++++ tools/perf/util/hist.h | 6 +++++- tools/perf/util/sort.c | 11 +++++++++++ tools/perf/util/symbol.c | 34 ++++++++++++++++++++++++++++++++++ tools/perf/util/symbol_conf.h | 4 ++++ 7 files changed, 91 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 14d49f0625881..2a19abdc869a1 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1390,6 +1390,8 @@ int cmd_report(int argc, const char **argv) symbol__config_symfs), OPT_STRING('C', "cpu", &report.cpu_list, "cpu", "list of cpus to profile"), + OPT_STRING(0, "parallelism", &symbol_conf.parallelism_list_str, "parallel= ism", + "only consider these parallelism levels (cpu set format)"), OPT_BOOLEAN('I', "show-info", &report.show_full_info, "Display extended information about perf.data file"), OPT_BOOLEAN(0, "source", &annotate_opts.annotate_src, @@ -1721,7 +1723,8 @@ int cmd_report(int argc, const char **argv) } =20 if (report.disable_order || !perf_session__has_switch_events(session)) { - if ((sort_order && strstr(sort_order, "parallelism")) || + if (symbol_conf.parallelism_list_str || + (sort_order && strstr(sort_order, "parallelism")) || (field_order && strstr(field_order, "parallelism"))) { if (report.disable_order) ui__error("Use of parallelism is incompatible with --disable-order.\n"= ); diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index 2f10e31157572..6ceed46acd5a4 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -769,6 +769,8 @@ int machine__resolve(struct machine *machine, struct ad= dr_location *al, =20 /* Account for possible out-of-order switch events. */ al->parallelism =3D max(1, min(machine->parallelism, machine__nr_cpus_ava= il(machine))); + if (test_bit(al->parallelism, symbol_conf.parallelism_filter)) + al->filtered |=3D (1 << HIST_FILTER__PARALLELISM); =20 if (al->map) { if (symbol_conf.dso_list && diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 6b8f8da8d3b66..446342246f5ee 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -43,6 +43,8 @@ static bool hists__filter_entry_by_symbol(struct hists *h= ists, struct hist_entry *he); static bool hists__filter_entry_by_socket(struct hists *hists, struct hist_entry *he); +static bool hists__filter_entry_by_parallelism(struct hists *hists, + struct hist_entry *he); =20 u16 hists__col_len(struct hists *hists, enum hist_column col) { @@ -1457,6 +1459,10 @@ static void hist_entry__check_and_remove_filter(stru= ct hist_entry *he, if (symbol_conf.sym_list =3D=3D NULL) return; break; + case HIST_FILTER__PARALLELISM: + if (__bitmap_weight(symbol_conf.parallelism_filter, MAX_NR_CPUS + 1) =3D= =3D 0) + return; + break; case HIST_FILTER__PARENT: case HIST_FILTER__GUEST: case HIST_FILTER__HOST: @@ -1515,6 +1521,9 @@ static void hist_entry__apply_hierarchy_filters(struc= t hist_entry *he) hist_entry__check_and_remove_filter(he, HIST_FILTER__SYMBOL, perf_hpp__is_sym_entry); =20 + hist_entry__check_and_remove_filter(he, HIST_FILTER__PARALLELISM, + perf_hpp__is_parallelism_entry); + hists__apply_filters(he->hists, he); } =20 @@ -1711,6 +1720,7 @@ static void hists__apply_filters(struct hists *hists,= struct hist_entry *he) hists__filter_entry_by_thread(hists, he); hists__filter_entry_by_symbol(hists, he); hists__filter_entry_by_socket(hists, he); + hists__filter_entry_by_parallelism(hists, he); } =20 int hists__collapse_resort(struct hists *hists, struct ui_progress *prog) @@ -2197,6 +2207,16 @@ static bool hists__filter_entry_by_socket(struct his= ts *hists, return false; } =20 +static bool hists__filter_entry_by_parallelism(struct hists *hists, + struct hist_entry *he) +{ + if (test_bit(he->parallelism, hists->parallelism_filter)) { + he->filtered |=3D (1 << HIST_FILTER__PARALLELISM); + return true; + } + return false; +} + typedef bool (*filter_fn_t)(struct hists *hists, struct hist_entry *he); =20 static void hists__filter_by_type(struct hists *hists, int type, filter_fn= _t filter) @@ -2366,6 +2386,16 @@ void hists__filter_by_socket(struct hists *hists) hists__filter_entry_by_socket); } =20 +void hists__filter_by_parallelism(struct hists *hists) +{ + if (symbol_conf.report_hierarchy) + hists__filter_hierarchy(hists, HIST_FILTER__PARALLELISM, + hists->parallelism_filter); + else + hists__filter_by_type(hists, HIST_FILTER__PARALLELISM, + hists__filter_entry_by_parallelism); +} + void events_stats__inc(struct events_stats *stats, u32 type) { ++stats->nr_events[0]; @@ -2872,6 +2902,7 @@ int __hists__init(struct hists *hists, struct perf_hp= p_list *hpp_list) hists->entries =3D RB_ROOT_CACHED; mutex_init(&hists->lock); hists->socket_filter =3D -1; + hists->parallelism_filter =3D symbol_conf.parallelism_filter; hists->hpp_list =3D hpp_list; INIT_LIST_HEAD(&hists->hpp_formats); return 0; diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 4035106a74087..c2236e0d89f2a 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -31,6 +31,7 @@ enum hist_filter { HIST_FILTER__HOST, HIST_FILTER__SOCKET, HIST_FILTER__C2C, + HIST_FILTER__PARALLELISM, }; =20 typedef u16 filter_mask_t; @@ -112,6 +113,7 @@ struct hists { const struct dso *dso_filter; const char *uid_filter_str; const char *symbol_filter_str; + unsigned long *parallelism_filter; struct mutex lock; struct hists_stats stats; u64 event_stream; @@ -388,11 +390,13 @@ void hists__filter_by_dso(struct hists *hists); void hists__filter_by_thread(struct hists *hists); void hists__filter_by_symbol(struct hists *hists); void hists__filter_by_socket(struct hists *hists); +void hists__filter_by_parallelism(struct hists *hists); =20 static inline bool hists__has_filter(struct hists *hists) { return hists->thread_filter || hists->dso_filter || - hists->symbol_filter_str || (hists->socket_filter > -1); + hists->symbol_filter_str || (hists->socket_filter > -1) || + hists->parallelism_filter; } =20 u16 hists__col_len(struct hists *hists, enum hist_column col); diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 7eef43f5be360..3055496358ebb 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -900,6 +900,16 @@ sort__parallelism_cmp(struct hist_entry *left, struct = hist_entry *right) return right->parallelism - left->parallelism; } =20 +static int hist_entry__parallelism_filter(struct hist_entry *he, int type,= const void *arg) +{ + const unsigned long *parallelism_filter =3D arg; + + if (type !=3D HIST_FILTER__PARALLELISM) + return -1; + + return test_bit(he->parallelism, parallelism_filter); +} + static int hist_entry__parallelism_snprintf(struct hist_entry *he, char *b= f, size_t size, unsigned int width) { @@ -909,6 +919,7 @@ static int hist_entry__parallelism_snprintf(struct hist= _entry *he, char *bf, struct sort_entry sort_parallelism =3D { .se_header =3D "Parallelism", .se_cmp =3D sort__parallelism_cmp, + .se_filter =3D hist_entry__parallelism_filter, .se_snprintf =3D hist_entry__parallelism_snprintf, .se_width_idx =3D HISTC_PARALLELISM, }; diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 49b08adc6ee34..315f74b5bac06 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -18,6 +18,7 @@ #include "annotate.h" #include "build-id.h" #include "cap.h" +#include "cpumap.h" #include "dso.h" #include "util.h" // lsdir() #include "debug.h" @@ -2471,6 +2472,36 @@ int symbol__annotation_init(void) return 0; } =20 +static int setup_parallelism_bitmap(void) +{ + struct perf_cpu_map *map; + struct perf_cpu cpu; + int i, err =3D -1; + + if (symbol_conf.parallelism_list_str =3D=3D NULL) + return 0; + + map =3D perf_cpu_map__new(symbol_conf.parallelism_list_str); + if (map =3D=3D NULL) { + pr_err("failed to parse parallelism filter list\n"); + return -1; + } + + bitmap_fill(symbol_conf.parallelism_filter, MAX_NR_CPUS + 1); + perf_cpu_map__for_each_cpu(cpu, i, map) { + if (cpu.cpu <=3D 0 || cpu.cpu > MAX_NR_CPUS) { + pr_err("Requested parallelism level %d is invalid.\n", cpu.cpu); + goto out_delete_map; + } + __clear_bit(cpu.cpu, symbol_conf.parallelism_filter); + } + + err =3D 0; +out_delete_map: + perf_cpu_map__put(map); + return err; +} + int symbol__init(struct perf_env *env) { const char *symfs; @@ -2490,6 +2521,9 @@ int symbol__init(struct perf_env *env) return -1; } =20 + if (setup_parallelism_bitmap()) + return -1; + if (setup_list(&symbol_conf.dso_list, symbol_conf.dso_list_str, "dso") < 0) return -1; diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h index a9c51acc722fe..c5b2e56127e22 100644 --- a/tools/perf/util/symbol_conf.h +++ b/tools/perf/util/symbol_conf.h @@ -3,6 +3,8 @@ #define __PERF_SYMBOL_CONF 1 =20 #include +#include +#include "perf.h" =20 struct strlist; struct intlist; @@ -62,6 +64,7 @@ struct symbol_conf { *pid_list_str, *tid_list_str, *sym_list_str, + *parallelism_list_str, *col_width_list_str, *bt_stop_list_str; const char *addr2line_path; @@ -82,6 +85,7 @@ struct symbol_conf { int pad_output_len_dso; int group_sort_idx; int addr_range; + DECLARE_BITMAP(parallelism_filter, MAX_NR_CPUS + 1); }; =20 extern struct symbol_conf symbol_conf; --=20 2.48.1.502.g6dc24dfdaf-goog From nobody Sat Feb 22 00:03:52 2025 Received: from mail-ej1-f74.google.com (mail-ej1-f74.google.com [209.85.218.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F01E020E6F3 for ; Thu, 13 Feb 2025 09:08:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437728; cv=none; b=aYtBey4lQ9XOXYjjA/nw1z/nTZqEHtdz17Vlkd4dED2gxiKXalpunUfdeuc6crU9WPQgoxT1vw6LgRFbZDl6s3wk+zLmnvUowyEbqFb+YeWLIL4uR0Dgo/iSx6jxj441CU4EX8uUrs3sA4w+Ystt6RVcZ2QhTYG/5OBR6sZ8N3E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437728; c=relaxed/simple; bh=sOhn0KnO2WzqPNgeHhQYU9u8KXnTV/qmcB6LUFGm99A=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=rxZUVCZ5YhpXNiC/oUNQftiRAZzP9/sHtCi/C1IxljqYDLjf0vs/cF+h26uSImBMH2+TGCOneIVJMwWDqHrQftqqQaKG26T35yY5nFNqmd3DcbV0x0117inYeIO82QML8GKLZs7V64NHrGJvUE2cOrXBQz6YD2dPgxRBA+DTypg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=zIuDAkId; arc=none smtp.client-ip=209.85.218.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="zIuDAkId" Received: by mail-ej1-f74.google.com with SMTP id a640c23a62f3a-ab7e718b232so55299966b.2 for ; Thu, 13 Feb 2025 01:08:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739437723; x=1740042523; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7Z+v5W5R67xiAzE9NJMzr/vTsU6YAr1VcvF4I8FLOHM=; b=zIuDAkIdYBDeKWRwRZiG+Qq2kHZTPnoXBC01L8BT7oHynkgnSjCASNTlVgcQBXDEDW RnvGrGe/Whj6IwXNXfOhab7oCWlg/beekw9pte7LhLHbZA6RLO7RpDmDrBS86kQAuS+J 7QV6iHPckQeB3guIugqo8biKJf7FvkI6k2HwXkudFehXDuphRw/1sJglRlV0Axkhp/sm subGsGx3OJbBZltk2S51F0gb3tDB3YIETkOizmCFYy/QeJWX9Dp7hRexrlrJo0QBQ9t7 WiE5aEbi9r53abv3fDBkFBDDkgB2soRKFGEVAeeAYEV67+xNDrcYHupGu5T8e2DThx6K 1zgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739437723; x=1740042523; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7Z+v5W5R67xiAzE9NJMzr/vTsU6YAr1VcvF4I8FLOHM=; b=GQzTLwvQ4ynNJpg598K9LyT6WbR3OqOO0qILjc7nxs3F4EUb8gfk/ofoGqlWRj7Erp z2xgr5y38LdFbb+owlYNtNM7cprjR/NyqvlxwhsrRHW5ivtX1BULQ7zn/Vkcakp4FNox H7Oy2FAAwaGLPPM1VrQmrRnXBWfiRiVO8GuaCjluKjIqD3TeBIZKAP+9lCB9aUiu0boI C18YyklXDnuMpmBZBt3C/ILGBu5F7qNAvtAAal7UfXiMAoFj9cWfLyODoUJ7XZ8X2wLa 2bCFPXSoGVhB8VSUxOOkgUvwtNzMpGJ5Hpg8JcuZMhL+ATbtBHahDoFrtxnoB3W5jFZb Lcow== X-Forwarded-Encrypted: i=1; AJvYcCViqaBIgAUza63rQNkkj6BN3VHh4hAcUYQugjuHIVPyz87Axi8L8fYdk++rCU1Mog78zxOpFwFlIurn7/4=@vger.kernel.org X-Gm-Message-State: AOJu0Yz688r1g4HGFcvE9w1iUXJjcRcH5cMIxPE+Y1gjgC1B3NwDzuvE nM2x5v9kZpW+GoiYLJ49aMNjCYVn42yTmJ5BQMRUE/fOgI/qNvKGHnCFS3rvsEo9qKqL7haRTP/ 4K2XbLQ== X-Google-Smtp-Source: AGHT+IFmPTKCGQ78dUTm1fXq7nuM3HBLRpBEfRo1LsilhRV0iGFHtJ9LkJark6gWC1lJxU9/tgtVVJrO8rDn X-Received: from edbev17.prod.google.com ([2002:a05:6402:5411:b0:5de:a947:20d8]) (user=dvyukov job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6402:5285:b0:5dc:7374:261d with SMTP id 4fb4d7f45d1cf-5dec9d2c146mr6289068a12.7.1739437723228; Thu, 13 Feb 2025 01:08:43 -0800 (PST) Date: Thu, 13 Feb 2025 10:08:18 +0100 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.48.1.502.g6dc24dfdaf-goog Message-ID: Subject: [PATCH v7 5/9] perf report: Add latency output field From: Dmitry Vyukov To: namhyung@kernel.org, irogers@google.com, acme@kernel.org, ak@linux.intel.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Dmitry Vyukov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Latency output field is similar to overhead, but represents overhead for latency rather than CPU consumption. It's re-scaled from overhead by dividi= ng weight by the current parallelism level at the time of the sample. It effectively models profiling with 1 sample taken per unit of wall-clock time rather than unit of CPU time. Signed-off-by: Dmitry Vyukov Cc: Namhyung Kim Cc: Arnaldo Carvalho de Melo Cc: Ian Rogers Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Andi Kleen --- Changes in v5: - fixed formatting of latency field in --stdout mode --- tools/perf/ui/browsers/hists.c | 27 ++++++++----- tools/perf/ui/hist.c | 69 ++++++++++++++++++--------------- tools/perf/util/addr_location.h | 2 + tools/perf/util/event.c | 6 +++ tools/perf/util/events_stats.h | 2 + tools/perf/util/hist.c | 55 +++++++++++++++++++------- tools/perf/util/hist.h | 12 ++++++ tools/perf/util/sort.c | 2 + 8 files changed, 120 insertions(+), 55 deletions(-) diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c index 49ba82bf33918..35c10509b797f 100644 --- a/tools/perf/ui/browsers/hists.c +++ b/tools/perf/ui/browsers/hists.c @@ -1226,7 +1226,7 @@ int __hpp__slsmg_color_printf(struct perf_hpp *hpp, c= onst char *fmt, ...) return ret; } =20 -#define __HPP_COLOR_PERCENT_FN(_type, _field) \ +#define __HPP_COLOR_PERCENT_FN(_type, _field, _fmttype) \ static u64 __hpp_get_##_field(struct hist_entry *he) \ { \ return he->stat._field; \ @@ -1238,10 +1238,10 @@ hist_browser__hpp_color_##_type(struct perf_hpp_fmt= *fmt, \ struct hist_entry *he) \ { \ return hpp__fmt(fmt, hpp, he, __hpp_get_##_field, " %*.2f%%", \ - __hpp__slsmg_color_printf, true); \ + __hpp__slsmg_color_printf, _fmttype); \ } =20 -#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field) \ +#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field, _fmttype) \ static u64 __hpp_get_acc_##_field(struct hist_entry *he) \ { \ return he->stat_acc->_field; \ @@ -1262,15 +1262,18 @@ hist_browser__hpp_color_##_type(struct perf_hpp_fmt= *fmt, \ return ret; \ } \ return hpp__fmt(fmt, hpp, he, __hpp_get_acc_##_field, \ - " %*.2f%%", __hpp__slsmg_color_printf, true); \ + " %*.2f%%", __hpp__slsmg_color_printf, \ + _fmttype); \ } =20 -__HPP_COLOR_PERCENT_FN(overhead, period) -__HPP_COLOR_PERCENT_FN(overhead_sys, period_sys) -__HPP_COLOR_PERCENT_FN(overhead_us, period_us) -__HPP_COLOR_PERCENT_FN(overhead_guest_sys, period_guest_sys) -__HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us) -__HPP_COLOR_ACC_PERCENT_FN(overhead_acc, period) +__HPP_COLOR_PERCENT_FN(overhead, period, PERF_HPP_FMT_TYPE__PERCENT) +__HPP_COLOR_PERCENT_FN(latency, latency, PERF_HPP_FMT_TYPE__LATENCY) +__HPP_COLOR_PERCENT_FN(overhead_sys, period_sys, PERF_HPP_FMT_TYPE__PERCEN= T) +__HPP_COLOR_PERCENT_FN(overhead_us, period_us, PERF_HPP_FMT_TYPE__PERCENT) +__HPP_COLOR_PERCENT_FN(overhead_guest_sys, period_guest_sys, PERF_HPP_FMT_= TYPE__PERCENT) +__HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us, PERF_HPP_FMT_TY= PE__PERCENT) +__HPP_COLOR_ACC_PERCENT_FN(overhead_acc, period, PERF_HPP_FMT_TYPE__PERCEN= T) +__HPP_COLOR_ACC_PERCENT_FN(latency_acc, latency, PERF_HPP_FMT_TYPE__LATENC= Y) =20 #undef __HPP_COLOR_PERCENT_FN #undef __HPP_COLOR_ACC_PERCENT_FN @@ -1279,6 +1282,8 @@ void hist_browser__init_hpp(void) { perf_hpp__format[PERF_HPP__OVERHEAD].color =3D hist_browser__hpp_color_overhead; + perf_hpp__format[PERF_HPP__LATENCY].color =3D + hist_browser__hpp_color_latency; perf_hpp__format[PERF_HPP__OVERHEAD_SYS].color =3D hist_browser__hpp_color_overhead_sys; perf_hpp__format[PERF_HPP__OVERHEAD_US].color =3D @@ -1289,6 +1294,8 @@ void hist_browser__init_hpp(void) hist_browser__hpp_color_overhead_guest_us; perf_hpp__format[PERF_HPP__OVERHEAD_ACC].color =3D hist_browser__hpp_color_overhead_acc; + perf_hpp__format[PERF_HPP__LATENCY_ACC].color =3D + hist_browser__hpp_color_latency_acc; =20 res_sample_init(); } diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c index 34fda1d5eccb4..6de6309595f9e 100644 --- a/tools/perf/ui/hist.c +++ b/tools/perf/ui/hist.c @@ -27,9 +27,10 @@ static int __hpp__fmt_print(struct perf_hpp *hpp, struct= hists *hists, u64 val, int nr_samples, const char *fmt, int len, hpp_snprint_fn print_fn, enum perf_hpp_fmt_type fmtype) { - if (fmtype =3D=3D PERF_HPP_FMT_TYPE__PERCENT) { + if (fmtype =3D=3D PERF_HPP_FMT_TYPE__PERCENT || fmtype =3D=3D PERF_HPP_FM= T_TYPE__LATENCY) { double percent =3D 0.0; - u64 total =3D hists__total_period(hists); + u64 total =3D fmtype =3D=3D PERF_HPP_FMT_TYPE__PERCENT ? hists__total_pe= riod(hists) : + hists__total_latency(hists); =20 if (total) percent =3D 100.0 * val / total; @@ -128,7 +129,7 @@ int hpp__fmt(struct perf_hpp_fmt *fmt, struct perf_hpp = *hpp, print_fn, fmtype); } =20 - if (fmtype =3D=3D PERF_HPP_FMT_TYPE__PERCENT) + if (fmtype =3D=3D PERF_HPP_FMT_TYPE__PERCENT || fmtype =3D=3D PERF_HPP_FM= T_TYPE__LATENCY) len -=3D 2; /* 2 for a space and a % sign */ else len -=3D 1; @@ -356,7 +357,7 @@ static int hpp_entry_scnprintf(struct perf_hpp *hpp, co= nst char *fmt, ...) return (ret >=3D ssize) ? (ssize - 1) : ret; } =20 -#define __HPP_COLOR_PERCENT_FN(_type, _field) \ +#define __HPP_COLOR_PERCENT_FN(_type, _field, _fmttype) \ static u64 he_get_##_field(struct hist_entry *he) \ { \ return he->stat._field; \ @@ -366,15 +367,15 @@ static int hpp__color_##_type(struct perf_hpp_fmt *fm= t, \ struct perf_hpp *hpp, struct hist_entry *he) \ { \ return hpp__fmt(fmt, hpp, he, he_get_##_field, " %*.2f%%", \ - hpp_color_scnprintf, PERF_HPP_FMT_TYPE__PERCENT); \ + hpp_color_scnprintf, _fmttype); \ } =20 -#define __HPP_ENTRY_PERCENT_FN(_type, _field) \ +#define __HPP_ENTRY_PERCENT_FN(_type, _field, _fmttype) \ static int hpp__entry_##_type(struct perf_hpp_fmt *fmt, \ struct perf_hpp *hpp, struct hist_entry *he) \ { \ return hpp__fmt(fmt, hpp, he, he_get_##_field, " %*.2f%%", \ - hpp_entry_scnprintf, PERF_HPP_FMT_TYPE__PERCENT); \ + hpp_entry_scnprintf, _fmttype); \ } =20 #define __HPP_SORT_FN(_type, _field) \ @@ -384,7 +385,7 @@ static int64_t hpp__sort_##_type(struct perf_hpp_fmt *f= mt __maybe_unused, \ return __hpp__sort(a, b, he_get_##_field); \ } =20 -#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field) \ +#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field, _fmttype) \ static u64 he_get_acc_##_field(struct hist_entry *he) \ { \ return he->stat_acc->_field; \ @@ -394,15 +395,15 @@ static int hpp__color_##_type(struct perf_hpp_fmt *fm= t, \ struct perf_hpp *hpp, struct hist_entry *he) \ { \ return hpp__fmt_acc(fmt, hpp, he, he_get_acc_##_field, " %*.2f%%", \ - hpp_color_scnprintf, PERF_HPP_FMT_TYPE__PERCENT); \ + hpp_color_scnprintf, _fmttype); \ } =20 -#define __HPP_ENTRY_ACC_PERCENT_FN(_type, _field) \ +#define __HPP_ENTRY_ACC_PERCENT_FN(_type, _field, _fmttype) \ static int hpp__entry_##_type(struct perf_hpp_fmt *fmt, \ struct perf_hpp *hpp, struct hist_entry *he) \ { \ return hpp__fmt_acc(fmt, hpp, he, he_get_acc_##_field, " %*.2f%%", \ - hpp_entry_scnprintf, PERF_HPP_FMT_TYPE__PERCENT); \ + hpp_entry_scnprintf, _fmttype); \ } =20 #define __HPP_SORT_ACC_FN(_type, _field) \ @@ -453,14 +454,14 @@ static int64_t hpp__sort_##_type(struct perf_hpp_fmt = *fmt __maybe_unused, \ } =20 =20 -#define HPP_PERCENT_FNS(_type, _field) \ -__HPP_COLOR_PERCENT_FN(_type, _field) \ -__HPP_ENTRY_PERCENT_FN(_type, _field) \ +#define HPP_PERCENT_FNS(_type, _field, _fmttype) \ +__HPP_COLOR_PERCENT_FN(_type, _field, _fmttype) \ +__HPP_ENTRY_PERCENT_FN(_type, _field, _fmttype) \ __HPP_SORT_FN(_type, _field) =20 -#define HPP_PERCENT_ACC_FNS(_type, _field) \ -__HPP_COLOR_ACC_PERCENT_FN(_type, _field) \ -__HPP_ENTRY_ACC_PERCENT_FN(_type, _field) \ +#define HPP_PERCENT_ACC_FNS(_type, _field, _fmttype) \ +__HPP_COLOR_ACC_PERCENT_FN(_type, _field, _fmttype) \ +__HPP_ENTRY_ACC_PERCENT_FN(_type, _field, _fmttype) \ __HPP_SORT_ACC_FN(_type, _field) =20 #define HPP_RAW_FNS(_type, _field) \ @@ -471,12 +472,14 @@ __HPP_SORT_RAW_FN(_type, _field) __HPP_ENTRY_AVERAGE_FN(_type, _field) \ __HPP_SORT_AVERAGE_FN(_type, _field) =20 -HPP_PERCENT_FNS(overhead, period) -HPP_PERCENT_FNS(overhead_sys, period_sys) -HPP_PERCENT_FNS(overhead_us, period_us) -HPP_PERCENT_FNS(overhead_guest_sys, period_guest_sys) -HPP_PERCENT_FNS(overhead_guest_us, period_guest_us) -HPP_PERCENT_ACC_FNS(overhead_acc, period) +HPP_PERCENT_FNS(overhead, period, PERF_HPP_FMT_TYPE__PERCENT) +HPP_PERCENT_FNS(latency, latency, PERF_HPP_FMT_TYPE__LATENCY) +HPP_PERCENT_FNS(overhead_sys, period_sys, PERF_HPP_FMT_TYPE__PERCENT) +HPP_PERCENT_FNS(overhead_us, period_us, PERF_HPP_FMT_TYPE__PERCENT) +HPP_PERCENT_FNS(overhead_guest_sys, period_guest_sys, PERF_HPP_FMT_TYPE__P= ERCENT) +HPP_PERCENT_FNS(overhead_guest_us, period_guest_us, PERF_HPP_FMT_TYPE__PER= CENT) +HPP_PERCENT_ACC_FNS(overhead_acc, period, PERF_HPP_FMT_TYPE__PERCENT) +HPP_PERCENT_ACC_FNS(latency_acc, latency, PERF_HPP_FMT_TYPE__LATENCY) =20 HPP_RAW_FNS(samples, nr_events) HPP_RAW_FNS(period, period) @@ -548,11 +551,13 @@ static bool hpp__equal(struct perf_hpp_fmt *a, struct= perf_hpp_fmt *b) =20 struct perf_hpp_fmt perf_hpp__format[] =3D { HPP__COLOR_PRINT_FNS("Overhead", overhead, OVERHEAD), + HPP__COLOR_PRINT_FNS("Latency", latency, LATENCY), HPP__COLOR_PRINT_FNS("sys", overhead_sys, OVERHEAD_SYS), HPP__COLOR_PRINT_FNS("usr", overhead_us, OVERHEAD_US), HPP__COLOR_PRINT_FNS("guest sys", overhead_guest_sys, OVERHEAD_GUEST_SYS), HPP__COLOR_PRINT_FNS("guest usr", overhead_guest_us, OVERHEAD_GUEST_US), HPP__COLOR_ACC_PRINT_FNS("Children", overhead_acc, OVERHEAD_ACC), + HPP__COLOR_ACC_PRINT_FNS("Latency", latency_acc, LATENCY_ACC), HPP__PRINT_FNS("Samples", samples, SAMPLES), HPP__PRINT_FNS("Period", period, PERIOD), HPP__PRINT_FNS("Weight1", weight1, WEIGHT1), @@ -601,6 +606,11 @@ static void fmt_free(struct perf_hpp_fmt *fmt) fmt->free(fmt); } =20 +static bool fmt_equal(struct perf_hpp_fmt *a, struct perf_hpp_fmt *b) +{ + return a->equal && a->equal(a, b); +} + void perf_hpp__init(void) { int i; @@ -671,30 +681,26 @@ static void perf_hpp__column_unregister(struct perf_h= pp_fmt *format) =20 void perf_hpp__cancel_cumulate(void) { - struct perf_hpp_fmt *fmt, *acc, *ovh, *tmp; + struct perf_hpp_fmt *fmt, *acc, *ovh, *acc_lat, *tmp; =20 if (is_strict_order(field_order)) return; =20 ovh =3D &perf_hpp__format[PERF_HPP__OVERHEAD]; acc =3D &perf_hpp__format[PERF_HPP__OVERHEAD_ACC]; + acc_lat =3D &perf_hpp__format[PERF_HPP__LATENCY_ACC]; =20 perf_hpp_list__for_each_format_safe(&perf_hpp_list, fmt, tmp) { - if (acc->equal(acc, fmt)) { + if (fmt_equal(acc, fmt) || fmt_equal(acc_lat, fmt)) { perf_hpp__column_unregister(fmt); continue; } =20 - if (ovh->equal(ovh, fmt)) + if (fmt_equal(ovh, fmt)) fmt->name =3D "Overhead"; } } =20 -static bool fmt_equal(struct perf_hpp_fmt *a, struct perf_hpp_fmt *b) -{ - return a->equal && a->equal(a, b); -} - void perf_hpp__setup_output_field(struct perf_hpp_list *list) { struct perf_hpp_fmt *fmt; @@ -819,6 +825,7 @@ void perf_hpp__reset_width(struct perf_hpp_fmt *fmt, st= ruct hists *hists) =20 switch (fmt->idx) { case PERF_HPP__OVERHEAD: + case PERF_HPP__LATENCY: case PERF_HPP__OVERHEAD_SYS: case PERF_HPP__OVERHEAD_US: case PERF_HPP__OVERHEAD_ACC: diff --git a/tools/perf/util/addr_location.h b/tools/perf/util/addr_locatio= n.h index f83d74e370b2f..663e9a55d8ed3 100644 --- a/tools/perf/util/addr_location.h +++ b/tools/perf/util/addr_location.h @@ -24,6 +24,8 @@ struct addr_location { s32 socket; /* Same as machine.parallelism but within [1, nr_cpus]. */ int parallelism; + /* See he_stat.latency. */ + u64 latency; }; =20 void addr_location__init(struct addr_location *al); diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index 6ceed46acd5a4..c23b77f8f854a 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -771,6 +771,12 @@ int machine__resolve(struct machine *machine, struct a= ddr_location *al, al->parallelism =3D max(1, min(machine->parallelism, machine__nr_cpus_ava= il(machine))); if (test_bit(al->parallelism, symbol_conf.parallelism_filter)) al->filtered |=3D (1 << HIST_FILTER__PARALLELISM); + /* + * Multiply it by some const to avoid precision loss or dealing + * with floats. The multiplier does not matter otherwise since + * we only print it as percents. + */ + al->latency =3D sample->period * 1000 / al->parallelism; =20 if (al->map) { if (symbol_conf.dso_list && diff --git a/tools/perf/util/events_stats.h b/tools/perf/util/events_stats.h index eabd7913c3092..dcff697ed2529 100644 --- a/tools/perf/util/events_stats.h +++ b/tools/perf/util/events_stats.h @@ -57,6 +57,8 @@ struct events_stats { struct hists_stats { u64 total_period; u64 total_non_filtered_period; + u64 total_latency; + u64 total_non_filtered_latency; u32 nr_samples; u32 nr_non_filtered_samples; u32 nr_lost_samples; diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 446342246f5ee..a29324e33ed04 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -305,9 +305,10 @@ static long hist_time(unsigned long htime) return htime; } =20 -static void he_stat__add_period(struct he_stat *he_stat, u64 period) +static void he_stat__add_period(struct he_stat *he_stat, u64 period, u64 l= atency) { he_stat->period +=3D period; + he_stat->latency +=3D latency; he_stat->nr_events +=3D 1; } =20 @@ -322,6 +323,7 @@ static void he_stat__add_stat(struct he_stat *dest, str= uct he_stat *src) dest->weight2 +=3D src->weight2; dest->weight3 +=3D src->weight3; dest->nr_events +=3D src->nr_events; + dest->latency +=3D src->latency; } =20 static void he_stat__decay(struct he_stat *he_stat) @@ -331,6 +333,7 @@ static void he_stat__decay(struct he_stat *he_stat) he_stat->weight1 =3D (he_stat->weight1 * 7) / 8; he_stat->weight2 =3D (he_stat->weight2 * 7) / 8; he_stat->weight3 =3D (he_stat->weight3 * 7) / 8; + he_stat->latency =3D (he_stat->latency * 7) / 8; } =20 static void hists__delete_entry(struct hists *hists, struct hist_entry *he= ); @@ -338,7 +341,7 @@ static void hists__delete_entry(struct hists *hists, st= ruct hist_entry *he); static bool hists__decay_entry(struct hists *hists, struct hist_entry *he) { u64 prev_period =3D he->stat.period; - u64 diff; + u64 prev_latency =3D he->stat.latency; =20 if (prev_period =3D=3D 0) return true; @@ -348,12 +351,16 @@ static bool hists__decay_entry(struct hists *hists, s= truct hist_entry *he) he_stat__decay(he->stat_acc); decay_callchain(he->callchain); =20 - diff =3D prev_period - he->stat.period; - if (!he->depth) { - hists->stats.total_period -=3D diff; - if (!he->filtered) - hists->stats.total_non_filtered_period -=3D diff; + u64 period_diff =3D prev_period - he->stat.period; + u64 latency_diff =3D prev_latency - he->stat.latency; + + hists->stats.total_period -=3D period_diff; + hists->stats.total_latency -=3D latency_diff; + if (!he->filtered) { + hists->stats.total_non_filtered_period -=3D period_diff; + hists->stats.total_non_filtered_latency -=3D latency_diff; + } } =20 if (!he->leaf) { @@ -368,7 +375,7 @@ static bool hists__decay_entry(struct hists *hists, str= uct hist_entry *he) } } =20 - return he->stat.period =3D=3D 0; + return he->stat.period =3D=3D 0 && he->stat.latency =3D=3D 0; } =20 static void hists__delete_entry(struct hists *hists, struct hist_entry *he) @@ -594,14 +601,17 @@ static filter_mask_t symbol__parent_filter(const stru= ct symbol *parent) return 0; } =20 -static void hist_entry__add_callchain_period(struct hist_entry *he, u64 pe= riod) +static void hist_entry__add_callchain_period(struct hist_entry *he, u64 pe= riod, u64 latency) { if (!hist_entry__has_callchains(he) || !symbol_conf.use_callchain) return; =20 he->hists->callchain_period +=3D period; - if (!he->filtered) + he->hists->callchain_latency +=3D latency; + if (!he->filtered) { he->hists->callchain_non_filtered_period +=3D period; + he->hists->callchain_non_filtered_latency +=3D latency; + } } =20 static struct hist_entry *hists__findnew_entry(struct hists *hists, @@ -614,6 +624,7 @@ static struct hist_entry *hists__findnew_entry(struct h= ists *hists, struct hist_entry *he; int64_t cmp; u64 period =3D entry->stat.period; + u64 latency =3D entry->stat.latency; bool leftmost =3D true; =20 p =3D &hists->entries_in->rb_root.rb_node; @@ -632,10 +643,10 @@ static struct hist_entry *hists__findnew_entry(struct= hists *hists, if (!cmp) { if (sample_self) { he_stat__add_stat(&he->stat, &entry->stat); - hist_entry__add_callchain_period(he, period); + hist_entry__add_callchain_period(he, period, latency); } if (symbol_conf.cumulate_callchain) - he_stat__add_period(he->stat_acc, period); + he_stat__add_period(he->stat_acc, period, latency); =20 block_info__delete(entry->block_info); =20 @@ -672,7 +683,7 @@ static struct hist_entry *hists__findnew_entry(struct h= ists *hists, return NULL; =20 if (sample_self) - hist_entry__add_callchain_period(he, period); + hist_entry__add_callchain_period(he, period, latency); hists->nr_entries++; =20 rb_link_node(&he->rb_node_in, parent, p); @@ -751,6 +762,7 @@ __hists__add_entry(struct hists *hists, .weight1 =3D sample->weight, .weight2 =3D sample->ins_lat, .weight3 =3D sample->p_stage_cyc, + .latency =3D al->latency, }, .parent =3D sym_parent, .filtered =3D symbol__parent_filter(sym_parent) | al->filtered, @@ -1768,12 +1780,14 @@ static void hists__reset_filter_stats(struct hists = *hists) { hists->nr_non_filtered_entries =3D 0; hists->stats.total_non_filtered_period =3D 0; + hists->stats.total_non_filtered_latency =3D 0; } =20 void hists__reset_stats(struct hists *hists) { hists->nr_entries =3D 0; hists->stats.total_period =3D 0; + hists->stats.total_latency =3D 0; =20 hists__reset_filter_stats(hists); } @@ -1782,6 +1796,7 @@ static void hists__inc_filter_stats(struct hists *his= ts, struct hist_entry *h) { hists->nr_non_filtered_entries++; hists->stats.total_non_filtered_period +=3D h->stat.period; + hists->stats.total_non_filtered_latency +=3D h->stat.latency; } =20 void hists__inc_stats(struct hists *hists, struct hist_entry *h) @@ -1791,6 +1806,7 @@ void hists__inc_stats(struct hists *hists, struct his= t_entry *h) =20 hists->nr_entries++; hists->stats.total_period +=3D h->stat.period; + hists->stats.total_latency +=3D h->stat.latency; } =20 static void hierarchy_recalc_total_periods(struct hists *hists) @@ -1802,6 +1818,8 @@ static void hierarchy_recalc_total_periods(struct his= ts *hists) =20 hists->stats.total_period =3D 0; hists->stats.total_non_filtered_period =3D 0; + hists->stats.total_latency =3D 0; + hists->stats.total_non_filtered_latency =3D 0; =20 /* * recalculate total period using top-level entries only @@ -1813,8 +1831,11 @@ static void hierarchy_recalc_total_periods(struct hi= sts *hists) node =3D rb_next(node); =20 hists->stats.total_period +=3D he->stat.period; - if (!he->filtered) + hists->stats.total_latency +=3D he->stat.latency; + if (!he->filtered) { hists->stats.total_non_filtered_period +=3D he->stat.period; + hists->stats.total_non_filtered_latency +=3D he->stat.latency; + } } } =20 @@ -2791,6 +2812,12 @@ u64 hists__total_period(struct hists *hists) hists->stats.total_period; } =20 +u64 hists__total_latency(struct hists *hists) +{ + return symbol_conf.filter_relative ? hists->stats.total_non_filtered_late= ncy : + hists->stats.total_latency; +} + int __hists__scnprintf_title(struct hists *hists, char *bf, size_t size, b= ool show_freq) { char unit; diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index c2236e0d89f2a..91159f16c60b2 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -109,6 +109,8 @@ struct hists { u64 nr_non_filtered_entries; u64 callchain_period; u64 callchain_non_filtered_period; + u64 callchain_latency; + u64 callchain_non_filtered_latency; struct thread *thread_filter; const struct dso *dso_filter; const char *uid_filter_str; @@ -170,6 +172,12 @@ struct res_sample { =20 struct he_stat { u64 period; + /* + * Period re-scaled from CPU time to wall-clock time (divided by the + * parallelism at the time of the sample). This represents effect of + * the event on latency rather than CPU consumption. + */ + u64 latency; u64 period_sys; u64 period_us; u64 period_guest_sys; @@ -374,6 +382,7 @@ void hists__output_recalc_col_len(struct hists *hists, = int max_rows); struct hist_entry *hists__get_entry(struct hists *hists, int idx); =20 u64 hists__total_period(struct hists *hists); +u64 hists__total_latency(struct hists *hists); void hists__reset_stats(struct hists *hists); void hists__inc_stats(struct hists *hists, struct hist_entry *h); void hists__inc_nr_events(struct hists *hists); @@ -555,11 +564,13 @@ extern struct perf_hpp_fmt perf_hpp__format[]; enum { /* Matches perf_hpp__format array. */ PERF_HPP__OVERHEAD, + PERF_HPP__LATENCY, PERF_HPP__OVERHEAD_SYS, PERF_HPP__OVERHEAD_US, PERF_HPP__OVERHEAD_GUEST_SYS, PERF_HPP__OVERHEAD_GUEST_US, PERF_HPP__OVERHEAD_ACC, + PERF_HPP__LATENCY_ACC, PERF_HPP__SAMPLES, PERF_HPP__PERIOD, PERF_HPP__WEIGHT1, @@ -615,6 +626,7 @@ void hists__reset_column_width(struct hists *hists); enum perf_hpp_fmt_type { PERF_HPP_FMT_TYPE__RAW, PERF_HPP_FMT_TYPE__PERCENT, + PERF_HPP_FMT_TYPE__LATENCY, PERF_HPP_FMT_TYPE__AVERAGE, }; =20 diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 3055496358ebb..bc4c3acfe7552 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -2628,11 +2628,13 @@ struct hpp_dimension { =20 static struct hpp_dimension hpp_sort_dimensions[] =3D { DIM(PERF_HPP__OVERHEAD, "overhead"), + DIM(PERF_HPP__LATENCY, "latency"), DIM(PERF_HPP__OVERHEAD_SYS, "overhead_sys"), DIM(PERF_HPP__OVERHEAD_US, "overhead_us"), DIM(PERF_HPP__OVERHEAD_GUEST_SYS, "overhead_guest_sys"), DIM(PERF_HPP__OVERHEAD_GUEST_US, "overhead_guest_us"), DIM(PERF_HPP__OVERHEAD_ACC, "overhead_children"), + DIM(PERF_HPP__LATENCY_ACC, "latency_children"), DIM(PERF_HPP__SAMPLES, "sample"), DIM(PERF_HPP__PERIOD, "period"), DIM(PERF_HPP__WEIGHT1, "weight1"), --=20 2.48.1.502.g6dc24dfdaf-goog From nobody Sat Feb 22 00:03:52 2025 Received: from mail-ed1-f74.google.com (mail-ed1-f74.google.com [209.85.208.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3DCCD20F097 for ; Thu, 13 Feb 2025 09:08:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437729; cv=none; b=cbKMb4yNudk0e3CkWrYqSAr2dORUmdtafbJXl4+74I3d48pePPv9EeBJdz0GtAbO34+Gt7E+OXzFYtKwFpiFcPMDQPcJrjy5CEM9FY5pW8OFT3Ehm6sdTGbTIqBd1KIWXUYD+aZsc0MCHZTRKuS3kwExx3UtuHcrObhD/dG+x6Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437729; c=relaxed/simple; bh=KIR6kBi9zMAisf2JfDeLR4sJrLeNNrY01+Q7qALShuw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=gnImGWhjW3BGl+BeqxORbEDfF8nRhyicwkk7YHmsKiSLc6GstiXTeXBAYTjlWdNi/Jf/vMap3SNFxvpRj2ondq5oH06PnC0q1aHLvhXA8np1wwng/g5LywBDXklGDl9IHpZycAK/szNKhnSP7aqj/yjTl1wwYlI1q8QORXulDbI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Rlgju85H; arc=none smtp.client-ip=209.85.208.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Rlgju85H" Received: by mail-ed1-f74.google.com with SMTP id 4fb4d7f45d1cf-5dca72b752fso578985a12.0 for ; Thu, 13 Feb 2025 01:08:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739437725; x=1740042525; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9FGDDvPjllCETLDQCzwN2EWbzyNA7+wRyP7rVX7sH5M=; b=Rlgju85Hx/f+M34KF7nD9Q7BB3WQg9RE2wrVUW3dvetvqE5+CT+SxD75qc6qBBxhy/ QMp58hGplIf5jwxi/nVWpa8gQMLLV334iDvCDGElTH8dT8ItrjgazMOLK3+ZZ8jycmSn EF2hEQs9XkS+8JYyPfPdx46zuwH9NpxuNdmuZULsSrOklYgYxhh3lmuKd9xUt3ajwETD xMd64/0LbFwvZZ91BGb/Q/sOItFv3khSQCESrikGJ8qhYnTBOEt3iiB0indQ/+Gv3JGp OSOGCAKZaicpkmzM4XKjEYfjXE7BuFbgQs6tFUXkKDszPTuFjfH7gtDNAUOEhiZE/jKl DaGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739437725; x=1740042525; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9FGDDvPjllCETLDQCzwN2EWbzyNA7+wRyP7rVX7sH5M=; b=j2tfU0kML/ws2/uCQMLOcfVliwdDTyfVUApvdUQSqO9kRHkf+hPazx10fObLUgXTv5 9MxJ7bKpOrGbwvQZQCn24fCNcoAY97NFqHe6Z38aEvz+4+GmxIJk6qR5Sr24UMMCMyTf y5axFIgM8NmFQznv6QNK+4JYcAdj0n9DkyjuZF28zSblgrd/LSbYMmoJ4ukCPOTCiDV+ lEP8BdPtdR5DsKua6k64nnQIV79GW+rbrAm0NQCyswovjOknybjye8bHH9gv1/K3IrOw txeqU+y7YVAjr5VzyYzCpeO44zjM/X4rfgAjGG2qBKRbqTKPUstch9RkUcdXlEy43one D81w== X-Forwarded-Encrypted: i=1; AJvYcCVUX7WHNs5DKLZlyXtiV144rD5rfXtE/ds2o7JXMaKvUO+kViF3CTLx5Af3Lfr9N2KTB9NgXMkqg0vUq6w=@vger.kernel.org X-Gm-Message-State: AOJu0Yy/w7GJUDayeeHTehV+DStuF279Km+/Uw7S5GAK8NIKhcf9oMx+ ae9wIQAvB3jpHmg8mFC+DtoHZBq+rv0YUatTtWxTmGBICeTfyhN2Z7vgTSJP8P2PAuI1BDsayzr 4oWZdUQ== X-Google-Smtp-Source: AGHT+IFOjivOq4iFo/I5gCxc4c0RF6+hhTahAPu+YHydZ24v94t19xf2cf7tFOvalDY8KEdzSKBlFuR+gzHS X-Received: from edbin7.prod.google.com ([2002:a05:6402:2087:b0:5de:bccf:1964]) (user=dvyukov job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6402:4409:b0:5dc:89e0:8eb3 with SMTP id 4fb4d7f45d1cf-5deb08810a7mr4620321a12.11.1739437725747; Thu, 13 Feb 2025 01:08:45 -0800 (PST) Date: Thu, 13 Feb 2025 10:08:19 +0100 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.48.1.502.g6dc24dfdaf-goog Message-ID: Subject: [PATCH v7 6/9] perf report: Add --latency flag From: Dmitry Vyukov To: namhyung@kernel.org, irogers@google.com, acme@kernel.org, ak@linux.intel.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Dmitry Vyukov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add record/report --latency flag that allows to capture and show latency-centric profiles rather than the default CPU-consumption-centric profiles. For latency profiles record captures context switch events, and report shows Latency as the first column. Signed-off-by: Dmitry Vyukov Cc: Namhyung Kim Cc: Arnaldo Carvalho de Melo Cc: Ian Rogers Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Andi Kleen --- Changes in v7: - added comment perf_hpp__init() re was_taken logic Changes in v6: - remove latency column in perf_hpp__cancel_latency if sort order is specified, but does not include latency Changes in v5: - added description of --latency flag in Documentation flags --- tools/perf/Documentation/perf-record.txt | 4 ++ tools/perf/Documentation/perf-report.txt | 5 +++ tools/perf/builtin-record.c | 20 +++++++++ tools/perf/builtin-report.c | 32 ++++++++++++-- tools/perf/ui/hist.c | 54 ++++++++++++++++++++---- tools/perf/util/hist.h | 1 + tools/perf/util/sort.c | 33 ++++++++++++--- tools/perf/util/sort.h | 2 +- tools/perf/util/symbol_conf.h | 4 +- 9 files changed, 135 insertions(+), 20 deletions(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Document= ation/perf-record.txt index 80686d590de24..c7fc1ba265e27 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -227,6 +227,10 @@ OPTIONS '--filter' exists, the new filter expression will be combined with them by '&&'. =20 +--latency:: + Enable data collection for latency profiling. + Use perf report --latency for latency-centric profile. + -a:: --all-cpus:: System-wide collection from all CPUs (default if no target is spec= ified). diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Document= ation/perf-report.txt index 87f8645194062..66794131aec48 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -68,6 +68,11 @@ OPTIONS --hide-unresolved:: Only display entries resolved to a symbol. =20 +--latency:: + Show latency-centric profile rather than the default + CPU-consumption-centric profile + (requires perf record --latency flag). + -s:: --sort=3D:: Sort histogram entries by given key(s) - multiple keys can be specified diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 5db1aedf48df9..e219639ac401b 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -161,6 +161,7 @@ struct record { struct evlist *sb_evlist; pthread_t thread_id; int realtime_prio; + bool latency; bool switch_output_event_set; bool no_buildid; bool no_buildid_set; @@ -3371,6 +3372,9 @@ static struct option __record_options[] =3D { parse_events_option), OPT_CALLBACK(0, "filter", &record.evlist, "filter", "event filter", parse_filter), + OPT_BOOLEAN(0, "latency", &record.latency, + "Enable data collection for latency profiling.\n" + "\t\t\t Use perf report --latency for latency-centric profile."), OPT_CALLBACK_NOOPT(0, "exclude-perf", &record.evlist, NULL, "don't record events from perf itself", exclude_perf), @@ -4017,6 +4021,22 @@ int cmd_record(int argc, const char **argv) =20 } =20 + if (record.latency) { + /* + * There is no fundamental reason why latency profiling + * can't work for system-wide mode, but exact semantics + * and details are to be defined. + * See the following thread for details: + * https://lore.kernel.org/all/Z4XDJyvjiie3howF@google.com/ + */ + if (record.opts.target.system_wide) { + pr_err("Failed: latency profiling is not supported with system-wide col= lection.\n"); + err =3D -EINVAL; + goto out_opts; + } + record.opts.record_switch_events =3D true; + } + if (rec->buildid_mmap) { if (!perf_can_record_build_id()) { pr_err("Failed: no support to record build id in mmap events, update yo= ur kernel.\n"); diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 2a19abdc869a1..8e064b8bd589d 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -112,6 +112,8 @@ struct report { u64 nr_entries; u64 queue_size; u64 total_cycles; + u64 total_samples; + u64 singlethreaded_samples; int socket_filter; DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); struct branch_type_stat brtype_stat; @@ -331,6 +333,10 @@ static int process_sample_event(const struct perf_tool= *tool, &rep->total_cycles, evsel); } =20 + rep->total_samples++; + if (al.parallelism =3D=3D 1) + rep->singlethreaded_samples++; + ret =3D hist_entry_iter__add(&iter, &al, rep->max_stack, rep); if (ret < 0) pr_debug("problem adding hist entry, skipping event\n"); @@ -1079,6 +1085,11 @@ static int __cmd_report(struct report *rep) return ret; } =20 + /* Don't show Latency column for non-parallel profiles by default. */ + if (!symbol_conf.prefer_latency && rep->total_samples && + rep->singlethreaded_samples * 100 / rep->total_samples >=3D 99) + perf_hpp__cancel_latency(); + evlist__check_mem_load_aux(session->evlist); =20 if (rep->stats_mode) @@ -1468,6 +1479,10 @@ int cmd_report(int argc, const char **argv) "Disable raw trace ordering"), OPT_BOOLEAN(0, "skip-empty", &report.skip_empty, "Do not display empty (or dummy) events in the output"), + OPT_BOOLEAN(0, "latency", &symbol_conf.prefer_latency, + "Show latency-centric profile rather than the default\n" + "\t\t\t CPU-consumption-centric profile\n" + "\t\t\t (requires perf record --latency flag)."), OPT_END() }; struct perf_data data =3D { @@ -1722,16 +1737,25 @@ int cmd_report(int argc, const char **argv) symbol_conf.annotate_data_sample =3D true; } =20 + symbol_conf.enable_latency =3D true; if (report.disable_order || !perf_session__has_switch_events(session)) { if (symbol_conf.parallelism_list_str || - (sort_order && strstr(sort_order, "parallelism")) || - (field_order && strstr(field_order, "parallelism"))) { + symbol_conf.prefer_latency || + (sort_order && (strstr(sort_order, "latency") || + strstr(sort_order, "parallelism"))) || + (field_order && (strstr(field_order, "latency") || + strstr(field_order, "parallelism")))) { if (report.disable_order) - ui__error("Use of parallelism is incompatible with --disable-order.\n"= ); + ui__error("Use of latency profile or parallelism is incompatible with = --disable-order.\n"); else - ui__error("Use of parallelism requires --switch-events during record.\= n"); + ui__error("Use of latency profile or parallelism requires --latency fl= ag during record.\n"); return -1; } + /* + * If user did not ask for anything related to + * latency/parallelism explicitly, just don't show it. + */ + symbol_conf.enable_latency =3D false; } =20 if (sort_order && strstr(sort_order, "ipc")) { diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c index 6de6309595f9e..ae3b7fe1dadc8 100644 --- a/tools/perf/ui/hist.c +++ b/tools/perf/ui/hist.c @@ -631,28 +631,48 @@ void perf_hpp__init(void) if (is_strict_order(field_order)) return; =20 + /* + * Overhead and latency columns are added in setup_overhead(), + * so they are added implicitly here only if they were added + * by setup_overhead() before (have was_taken flag set). + * This is required because setup_overhead() has more complex + * logic, in particular it does not add "overhead" if user + * specified "latency" in sort order, and vise versa. + */ if (symbol_conf.cumulate_callchain) { - hpp_dimension__add_output(PERF_HPP__OVERHEAD_ACC); + /* + * Addition of fields is idempotent, so we add latency + * column twice to get desired order with simpler logic. + */ + if (symbol_conf.prefer_latency) + hpp_dimension__add_output(PERF_HPP__LATENCY_ACC, true); + hpp_dimension__add_output(PERF_HPP__OVERHEAD_ACC, true); + if (symbol_conf.enable_latency) + hpp_dimension__add_output(PERF_HPP__LATENCY_ACC, true); perf_hpp__format[PERF_HPP__OVERHEAD].name =3D "Self"; } =20 - hpp_dimension__add_output(PERF_HPP__OVERHEAD); + if (symbol_conf.prefer_latency) + hpp_dimension__add_output(PERF_HPP__LATENCY, true); + hpp_dimension__add_output(PERF_HPP__OVERHEAD, true); + if (symbol_conf.enable_latency) + hpp_dimension__add_output(PERF_HPP__LATENCY, true); =20 if (symbol_conf.show_cpu_utilization) { - hpp_dimension__add_output(PERF_HPP__OVERHEAD_SYS); - hpp_dimension__add_output(PERF_HPP__OVERHEAD_US); + hpp_dimension__add_output(PERF_HPP__OVERHEAD_SYS, false); + hpp_dimension__add_output(PERF_HPP__OVERHEAD_US, false); =20 if (perf_guest) { - hpp_dimension__add_output(PERF_HPP__OVERHEAD_GUEST_SYS); - hpp_dimension__add_output(PERF_HPP__OVERHEAD_GUEST_US); + hpp_dimension__add_output(PERF_HPP__OVERHEAD_GUEST_SYS, false); + hpp_dimension__add_output(PERF_HPP__OVERHEAD_GUEST_US, false); } } =20 if (symbol_conf.show_nr_samples) - hpp_dimension__add_output(PERF_HPP__SAMPLES); + hpp_dimension__add_output(PERF_HPP__SAMPLES, false); =20 if (symbol_conf.show_total_period) - hpp_dimension__add_output(PERF_HPP__PERIOD); + hpp_dimension__add_output(PERF_HPP__PERIOD, false); } =20 void perf_hpp_list__column_register(struct perf_hpp_list *list, @@ -701,6 +721,24 @@ void perf_hpp__cancel_cumulate(void) } } =20 +void perf_hpp__cancel_latency(void) +{ + struct perf_hpp_fmt *fmt, *lat, *acc, *tmp; + + if (is_strict_order(field_order)) + return; + if (sort_order && strstr(sort_order, "latency")) + return; + + lat =3D &perf_hpp__format[PERF_HPP__LATENCY]; + acc =3D &perf_hpp__format[PERF_HPP__LATENCY_ACC]; + + perf_hpp_list__for_each_format_safe(&perf_hpp_list, fmt, tmp) { + if (fmt_equal(lat, fmt) || fmt_equal(acc, fmt)) + perf_hpp__column_unregister(fmt); + } +} + void perf_hpp__setup_output_field(struct perf_hpp_list *list) { struct perf_hpp_fmt *fmt; diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 91159f16c60b2..29d4c7a3d1747 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -582,6 +582,7 @@ enum { =20 void perf_hpp__init(void); void perf_hpp__cancel_cumulate(void); +void perf_hpp__cancel_latency(void); void perf_hpp__setup_output_field(struct perf_hpp_list *list); void perf_hpp__reset_output_field(struct perf_hpp_list *list); void perf_hpp__append_sort_keys(struct perf_hpp_list *list); diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index bc4c3acfe7552..2b6023de7a53a 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -2622,6 +2622,7 @@ struct hpp_dimension { const char *name; struct perf_hpp_fmt *fmt; int taken; + int was_taken; }; =20 #define DIM(d, n) { .name =3D n, .fmt =3D &perf_hpp__format[d], } @@ -3513,6 +3514,7 @@ static int __hpp_dimension__add(struct hpp_dimension = *hd, return -1; =20 hd->taken =3D 1; + hd->was_taken =3D 1; perf_hpp_list__register_sort_field(list, fmt); return 0; } @@ -3547,10 +3549,15 @@ static int __hpp_dimension__add_output(struct perf_= hpp_list *list, return 0; } =20 -int hpp_dimension__add_output(unsigned col) +int hpp_dimension__add_output(unsigned col, bool implicit) { + struct hpp_dimension *hd; + BUG_ON(col >=3D PERF_HPP__MAX_INDEX); - return __hpp_dimension__add_output(&perf_hpp_list, &hpp_sort_dimensions[c= ol]); + hd =3D &hpp_sort_dimensions[col]; + if (implicit && !hd->was_taken) + return 0; + return __hpp_dimension__add_output(&perf_hpp_list, hd); } =20 int sort_dimension__add(struct perf_hpp_list *list, const char *tok, @@ -3809,10 +3816,24 @@ static char *setup_overhead(char *keys) if (sort__mode =3D=3D SORT_MODE__DIFF) return keys; =20 - keys =3D prefix_if_not_in("overhead", keys); - - if (symbol_conf.cumulate_callchain) - keys =3D prefix_if_not_in("overhead_children", keys); + if (symbol_conf.prefer_latency) { + keys =3D prefix_if_not_in("overhead", keys); + keys =3D prefix_if_not_in("latency", keys); + if (symbol_conf.cumulate_callchain) { + keys =3D prefix_if_not_in("overhead_children", keys); + keys =3D prefix_if_not_in("latency_children", keys); + } + } else if (!keys || (!strstr(keys, "overhead") && + !strstr(keys, "latency"))) { + if (symbol_conf.enable_latency) + keys =3D prefix_if_not_in("latency", keys); + keys =3D prefix_if_not_in("overhead", keys); + if (symbol_conf.cumulate_callchain) { + if (symbol_conf.enable_latency) + keys =3D prefix_if_not_in("latency_children", keys); + keys =3D prefix_if_not_in("overhead_children", keys); + } + } =20 return keys; } diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 11fb15f914093..180d36a2bea35 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -141,7 +141,7 @@ int report_parse_ignore_callees_opt(const struct option= *opt, const char *arg, i =20 bool is_strict_order(const char *order); =20 -int hpp_dimension__add_output(unsigned col); +int hpp_dimension__add_output(unsigned col, bool implicit); void reset_dimensions(void); int sort_dimension__add(struct perf_hpp_list *list, const char *tok, struct evlist *evlist, diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h index c5b2e56127e22..cd9aa82c7d5ad 100644 --- a/tools/perf/util/symbol_conf.h +++ b/tools/perf/util/symbol_conf.h @@ -49,7 +49,9 @@ struct symbol_conf { keep_exited_threads, annotate_data_member, annotate_data_sample, - skip_empty; + skip_empty, + enable_latency, + prefer_latency; const char *vmlinux_name, *kallsyms_name, *source_prefix, --=20 2.48.1.502.g6dc24dfdaf-goog From nobody Sat Feb 22 00:03:52 2025 Received: from mail-ej1-f73.google.com (mail-ej1-f73.google.com [209.85.218.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A19E9211A20 for ; Thu, 13 Feb 2025 09:08:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437732; cv=none; b=XBs1bMqQP8cGOPpvh4lZe2sw+VFh3ZDDCoQCVS457BQe5RKZdB/sN24toqz5aBdTBiHI42buxCS4vIaMfZSgo6cLnWmuZcnRqCI8gM8KGAk2xXAJPoOodIbS+uAsZay/sxIINT3i+nRGhGFdz/9+88ShACSQwHQV5MxKp4TvORU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437732; c=relaxed/simple; bh=hHsYh4Wj2aKIk+sbbDk+eoksuvTL9kE1wTEO+n06vms=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=MGPwKaJWK50ZgsuETc0TD/vkZIBtFrf5M3UvP1BdWtILfZ1e+tHqZbugpeG8A7PaFyRD+gWc5117KYo2e6r6BjHzQSmXqKmF+LG3pbeXHMfA8EuQOqZKM/97fJZMT7sjLzujHZYzG6nzsffRUgRvuDPi/n67a5pQ798XuGvdcis= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=BhXcywJ8; arc=none smtp.client-ip=209.85.218.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="BhXcywJ8" Received: by mail-ej1-f73.google.com with SMTP id a640c23a62f3a-aa6b904a886so56919066b.0 for ; Thu, 13 Feb 2025 01:08:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739437728; x=1740042528; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=T2iUxOYgcCaQTSWGHwDxztwJgFH0wzbbb/8+hafzV1k=; b=BhXcywJ8sUsLPqM+lEwgE4a9JdDuZo7iRB4JA9R1yUXlJKxHjrTU6jNyAm0kDQA0Ez vMF1GNbXDmNGvDQRuCBkYbqai6X4qqyGyJ7Ksh/lgQHeyohIIC5XUFDPgvYK7u4zv7Dt IWCoA7uy4bg1eVQmChDo0r82QKaDj1wsNf5/A6wNjZ+3fGLfccAKdFzFhHQuop5Xmn9s 1vI3PtQiyczagguw4KvkGQrgKtqWl5jOniRFCh/CS9zkRw076+Xh4Ye5CQ8ZNPHkMf3A 1ldRnxSy63HnlfiQe5/+jvCETK6lCHnBiMse50nQKFprHPKVjEKtZ6D8V/1vnmJaSY4L 61Pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739437728; x=1740042528; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=T2iUxOYgcCaQTSWGHwDxztwJgFH0wzbbb/8+hafzV1k=; b=RsNHlhadZH5VGa8OkPCS4/9sLl+5uDg4tEYMkLf7fJAL0ieM8lD6nqd9Gl6TMD4gK/ poDiwlisJKBpXW5YsmsNjgHuztff3cS8kvagViNxgKTqX1qFTkUDZsR49aZ4MDbjNaaQ lZ4pXlzBaBVQP8AEv8AmUcKdZdBnUJKzPtnWM0cDL2krdRddTTC55kM+g744ZR2VDK8S duHJsE6WGkBEKaKRKU9QvrTBH6DIeUYjtBzzhDkL3GCobI+Ivg6HSP8d715EaI0pJT1P pY9DfAg2GQ1M8kMWFxT6UiFtacgZvcUkwilPzq0/NPwtGrpa1gzgKeB3mzY1wvVODF2S Snbw== X-Forwarded-Encrypted: i=1; AJvYcCV7Hjj4BIlEeK7TUDsThjDGE81ObOod/85KKCt0Ql3X/8kd7KqsLqigzmv6XABopBYMQyI2ZupM1EGb9+c=@vger.kernel.org X-Gm-Message-State: AOJu0YwIzao2NdxpKUENHbbWhP2tsdhHRmerDxGRgcMp7JzHXX2qYbhW w830b9h2M4xiiFkDdZtpRqZ8oiKgvmAX+byvf1lM03ov5C3gH+PV3NLBwydg5LubGxhcqzbSkuX dJeq/Gg== X-Google-Smtp-Source: AGHT+IFpfvIQ0Sg4w0j91B9TGflRTIxP/RoKsnyIB+IfuX0sf2113hN1CEdmW8p2me+OaamqbmaPRnW3Kgnv X-Received: from ejbb16.prod.google.com ([2002:a17:906:30d0:b0:ab7:b8ec:cf22]) (user=dvyukov job=prod-delivery.src-stubby-dispatcher) by 2002:a17:907:2da1:b0:ab6:f06b:4a26 with SMTP id a640c23a62f3a-ab7f33e0e2amr761382466b.34.1739437728204; Thu, 13 Feb 2025 01:08:48 -0800 (PST) Date: Thu, 13 Feb 2025 10:08:20 +0100 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.48.1.502.g6dc24dfdaf-goog Message-ID: Subject: [PATCH v7 7/9] perf report: Add latency and parallelism profiling documentation From: Dmitry Vyukov To: namhyung@kernel.org, irogers@google.com, acme@kernel.org, ak@linux.intel.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Dmitry Vyukov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Describe latency and parallelism profiling, related flags, and differences with the currently only supported CPU-consumption-centric profiling. Signed-off-by: Dmitry Vyukov Cc: Namhyung Kim Cc: Arnaldo Carvalho de Melo Cc: Ian Rogers Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Andi Kleen --- Changes in v7: - add tips.txt example with time/time-quantum --- .../callchain-overhead-calculation.txt | 5 +- .../cpu-and-latency-overheads.txt | 85 +++++++++++++++++++ tools/perf/Documentation/perf-report.txt | 49 +++++++---- tools/perf/Documentation/tips.txt | 4 + 4 files changed, 124 insertions(+), 19 deletions(-) diff --git a/tools/perf/Documentation/callchain-overhead-calculation.txt b/= tools/perf/Documentation/callchain-overhead-calculation.txt index 1a757927195ed..e0202bf5bd1a0 100644 --- a/tools/perf/Documentation/callchain-overhead-calculation.txt +++ b/tools/perf/Documentation/callchain-overhead-calculation.txt @@ -1,7 +1,8 @@ Overhead calculation -------------------- -The overhead can be shown in two columns as 'Children' and 'Self' when -perf collects callchains. The 'self' overhead is simply calculated by +The CPU overhead can be shown in two columns as 'Children' and 'Self' +when perf collects callchains (and corresponding 'Wall' columns for +wall-clock overhead). The 'self' overhead is simply calculated by adding all period values of the entry - usually a function (symbol). This is the value that perf shows traditionally and sum of all the 'self' overhead values should be 100%. diff --git a/tools/perf/Documentation/cpu-and-latency-overheads.txt b/tools= /perf/Documentation/cpu-and-latency-overheads.txt new file mode 100644 index 0000000000000..3b6d637054651 --- /dev/null +++ b/tools/perf/Documentation/cpu-and-latency-overheads.txt @@ -0,0 +1,85 @@ +CPU and latency overheads +------------------------- +There are two notions of time: wall-clock time and CPU time. +For a single-threaded program, or a program running on a single-core machi= ne, +these notions are the same. However, for a multi-threaded/multi-process pr= ogram +running on a multi-core machine, these notions are significantly different. +Each second of wall-clock time we have number-of-cores seconds of CPU time. +Perf can measure overhead for both of these times (shown in 'overhead' and +'latency' columns for CPU and wall-clock time correspondingly). + +Optimizing CPU overhead is useful to improve 'throughput', while optimizing +latency overhead is useful to improve 'latency'. It's important to underst= and +which one is useful in a concrete situation at hand. For example, the form= er +may be useful to improve max throughput of a CI build server that runs on = 100% +CPU utilization, while the latter may be useful to improve user-perceived +latency of a single interactive program build. +These overheads may be significantly different in some cases. For example, +consider a program that executes function 'foo' for 9 seconds with 1 threa= d, +and then executes function 'bar' for 1 second with 128 threads (consumes +128 seconds of CPU time). The CPU overhead is: 'foo' - 6.6%, 'bar' - 93.4%. +While the latency overhead is: 'foo' - 90%, 'bar' - 10%. If we try to opti= mize +running time of the program looking at the (wrong in this case) CPU overhe= ad, +we would concentrate on the function 'bar', but it can yield only 10% runn= ing +time improvement at best. + +By default, perf shows only CPU overhead. To show latency overhead, use +'perf record --latency' and 'perf report': + +----------------------------------- +Overhead Latency Command + 93.88% 25.79% cc1 + 1.90% 39.87% gzip + 0.99% 10.16% dpkg-deb + 0.57% 1.00% as + 0.40% 0.46% sh +----------------------------------- + +To sort by latency overhead, use 'perf report --latency': + +----------------------------------- +Latency Overhead Command + 39.87% 1.90% gzip + 25.79% 93.88% cc1 + 10.16% 0.99% dpkg-deb + 4.17% 0.29% git + 2.81% 0.11% objtool +----------------------------------- + +To get insight into the difference between the overheads, you may check +parallelization histogram with '--sort=3Dlatency,parallelism,comm,symbol -= -hierarchy' +flags. It shows fraction of (wall-clock) time the workload utilizes differ= ent +numbers of cores ('Parallelism' column). For example, in the following case +the workload utilizes only 1 core most of the time, but also has some +highly-parallel phases, which explains significant difference between +CPU and wall-clock overheads: + +----------------------------------- + Latency Overhead Parallelism / Command / Symbol ++ 56.98% 2.29% 1 ++ 16.94% 1.36% 2 ++ 4.00% 20.13% 125 ++ 3.66% 18.25% 124 ++ 3.48% 17.66% 126 ++ 3.26% 0.39% 3 ++ 2.61% 12.93% 123 +----------------------------------- + +By expanding corresponding lines, you may see what commands/functions run +at the given parallelism level: + +----------------------------------- + Latency Overhead Parallelism / Command / Symbol +- 56.98% 2.29% 1 + 32.80% 1.32% gzip + 4.46% 0.18% cc1 + 2.81% 0.11% objtool + 2.43% 0.10% dpkg-source + 2.22% 0.09% ld + 2.10% 0.08% dpkg-genchanges +----------------------------------- + +To see the normal function-level profile for particular parallelism levels +(number of threads actively running on CPUs), you may use '--parallelism' +filter. For example, to see the profile only for low parallelism phases +of a workload use '--latency --parallelism=3D1-2' flags. diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Document= ation/perf-report.txt index 66794131aec48..3376c47105750 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -44,7 +44,7 @@ OPTIONS --comms=3D:: Only consider symbols in these comms. CSV that understands file://filename entries. This option will affect the percentage of - the overhead column. See --percentage for more info. + the overhead and latency columns. See --percentage for more info. --pid=3D:: Only show events for given process ID (comma separated list). =20 @@ -54,12 +54,12 @@ OPTIONS --dsos=3D:: Only consider symbols in these dsos. CSV that understands file://filename entries. This option will affect the percentage of - the overhead column. See --percentage for more info. + the overhead and latency columns. See --percentage for more info. -S:: --symbols=3D:: Only consider these symbols. CSV that understands file://filename entries. This option will affect the percentage of - the overhead column. See --percentage for more info. + the overhead and latency columns. See --percentage for more info. =20 --symbol-filter=3D:: Only show symbols that match (partially) with this filter. @@ -68,6 +68,16 @@ OPTIONS --hide-unresolved:: Only display entries resolved to a symbol. =20 +--parallelism:: + Only consider these parallelism levels. Parallelism level is the n= umber + of threads that actively run on CPUs at the time of sample. The fl= ag + accepts single number, comma-separated list, and ranges (for examp= le: + "1", "7,8", "1,64-128"). This is useful in understanding what a pr= ogram + is doing during sequential/low-parallelism phases as compared to + high-parallelism phases. This option will affect the percentage of + the overhead and latency columns. See --percentage for more info. + Also see the `CPU and latency overheads' section for more details. + --latency:: Show latency-centric profile rather than the default CPU-consumption-centric profile @@ -92,6 +102,7 @@ OPTIONS entries are displayed as "[other]". - cpu: cpu number the task ran at the time of sample - socket: processor socket number the task ran at the time of sample + - parallelism: number of running threads at the time of sample - srcline: filename and line number executed at the time of sample. The DWARF debugging info must be provided. - srcfile: file name of the source file of the samples. Requires dwarf @@ -102,12 +113,14 @@ OPTIONS - cgroup_id: ID derived from cgroup namespace device and inode numbers. - cgroup: cgroup pathname in the cgroupfs. - transaction: Transaction abort flags. - - overhead: Overhead percentage of sample - - overhead_sys: Overhead percentage of sample running in system mode - - overhead_us: Overhead percentage of sample running in user mode - - overhead_guest_sys: Overhead percentage of sample running in system mode + - overhead: CPU overhead percentage of sample. + - latency: latency (wall-clock) overhead percentage of sample. + See the `CPU and latency overheads' section for more details. + - overhead_sys: CPU overhead percentage of sample running in system mode + - overhead_us: CPU overhead percentage of sample running in user mode + - overhead_guest_sys: CPU overhead percentage of sample running in system= mode on guest machine - - overhead_guest_us: Overhead percentage of sample running in user mode on + - overhead_guest_us: CPU overhead percentage of sample running in user mo= de on guest machine - sample: Number of sample - period: Raw number of event count of sample @@ -130,8 +143,8 @@ OPTIONS - weight2: Average value of event specific weight (2nd field of weight_st= ruct). - weight3: Average value of event specific weight (3rd field of weight_st= ruct). =20 - By default, comm, dso and symbol keys are used. - (i.e. --sort comm,dso,symbol) + By default, overhead, comm, dso and symbol keys are used. + (i.e. --sort overhead,comm,dso,symbol). =20 If --branch-stack option is used, following sort keys are also available: @@ -206,9 +219,9 @@ OPTIONS --fields=3D:: Specify output field - multiple keys can be specified in CSV format. Following fields are available: - overhead, overhead_sys, overhead_us, overhead_children, sample, period, - weight1, weight2, weight3, ins_lat, p_stage_cyc and retire_lat. The - last 3 names are alias for the corresponding weights. When the weight + overhead, latency, overhead_sys, overhead_us, overhead_children, sample, + period, weight1, weight2, weight3, ins_lat, p_stage_cyc and retire_lat. + The last 3 names are alias for the corresponding weights. When the weight fields are used, they will show the average value of the weight. =20 Also it can contain any sort key(s). @@ -294,7 +307,7 @@ OPTIONS Accumulate callchain of children to parent entry so that then can show up in the output. The output will have a new "Children" column and will be sorted on the data. It requires callchains are recorded. - See the `overhead calculation' section for more details. Enabled by + See the `Overhead calculation' section for more details. Enabled by default, disable with --no-children. =20 --max-stack:: @@ -447,9 +460,9 @@ OPTIONS --call-graph option for details. =20 --percentage:: - Determine how to display the overhead percentage of filtered entries. - Filters can be applied by --comms, --dsos and/or --symbols options and - Zoom operations on the TUI (thread, dso, etc). + Determine how to display the CPU and latency overhead percentage + of filtered entries. Filters can be applied by --comms, --dsos, --symbols + and/or --parallelism options and Zoom operations on the TUI (thread, dso,= etc). =20 "relative" means it's relative to filtered entries only so that the sum of shown entries will be always 100%. "absolute" means it retains @@ -632,6 +645,8 @@ include::itrace.txt[] --skip-empty:: Do not print 0 results in the --stat output. =20 +include::cpu-and-latency-overheads.txt[] + include::callchain-overhead-calculation.txt[] =20 SEE ALSO diff --git a/tools/perf/Documentation/tips.txt b/tools/perf/Documentation/t= ips.txt index 67b326ba00407..3fee9b2a88ea9 100644 --- a/tools/perf/Documentation/tips.txt +++ b/tools/perf/Documentation/tips.txt @@ -62,3 +62,7 @@ To show context switches in perf report sample context ad= d --switch-events to pe To show time in nanoseconds in record/report add --ns To compare hot regions in two workloads use perf record -b -o file ... ; p= erf diff --stream file1 file2 To compare scalability of two workload samples use perf diff -c ratio file= 1 file2 +For latency profiling, try: perf record/report --latency +For parallelism histogram, try: perf report --hierarchy --sort latency,par= allelism,comm,symbol +To analyze particular parallelism levels, try: perf report --latency --par= allelism=3D32-64 +To see how parallelism changes over time, try: perf report -F time,latency= ,parallelism --time-quantum=3D1s --=20 2.48.1.502.g6dc24dfdaf-goog From nobody Sat Feb 22 00:03:52 2025 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2BB11212FB3 for ; Thu, 13 Feb 2025 09:08:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437734; cv=none; b=OUVgajxi9P3h4ZwEDw86pVelhjoZNkvtNR9Jgghs34VoYBuvfNrUU51zFD2y4wHnPL3yhc49s/HLmt4jGcpAlDTXy/lVYBWiwnuBv+AS5V1C/lWbsw7/L6g7fchcLcPzkQb8U9huJC6KxCiez+EsJPNXrUaTMkGMWTP5RMkd9fQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437734; c=relaxed/simple; bh=vNm1zipX8l+y8MG6tw9MJGfiW0u+S4OqXwmISHUwx04=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Touw4UNp4NVqZdtGgyAcsNJfdLuqHcsSWmSlEa1MOUX37oqlMrKSl/l2ju+KMU2R4iTLxUdkODr5qXYbFYoXFqYs/6DXlhO/r+m6fwFAdOYtgojKHYa0+gIY+FHmSksvPl2IEHrm4nkMbsKzdKVlNQ0f4bs095WIMoSXa6DipgE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=eOJy0YYE; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="eOJy0YYE" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-38dcb65c717so353206f8f.2 for ; Thu, 13 Feb 2025 01:08:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739437730; x=1740042530; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WdHp3gJt+sZl9/jkaomemfmjBrhePbLjx4QyMp6pEyY=; b=eOJy0YYEpsenc/2wRFr2tJFNFNO8Lbvz8chrSjR4XYyZpGdB9z4h/d5FDRw86FSprj Dct1fLhPPxIyG8D2BD+BPK9E1jcTKWHL55xvkt2ehiPEMqJuozsfN0E53l7uMGLroX6n MdIrbHpHqSIt3wLBgfHO2buIzx4Iez1/OZ3UiF80gcQ+1G4QCSBHxpCQ/z/QuxZfGijk vp+SmdNT3iFLierNWa2oXYwlRu+9GXGc4wtphkn4ZPhWNFU+DnqesjD2WBW+8Zky7kh7 tB/dqPvktN516ib5i78pVPswtvUoR3ZjESLKOnf0aAzMe32LvS/FwIMTKA20pTMNDegE Y6Tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739437730; x=1740042530; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WdHp3gJt+sZl9/jkaomemfmjBrhePbLjx4QyMp6pEyY=; b=Py7wMJ367HX302iAZrn9+Yp9uC6gye1pJ2RvbnCV9dVKccWYRIQxx/s9QcAui/NQec 62Iv7yV1zzxcz9ZtcLiiQdMgNq9MSWgTn8Kocj1iW8nU5mCriJbXJEzsCyFxdcLLSDs/ qMgggtKeDSl2O59xy3bBu91aZgcYb+Sbe1GCREwTCw802cE6PLeSVPqiWvocrpRQGBp3 uBQoeC9ofBAFw9Wec+QEuscRuuVw+6yQkhSCM5nwmWZ3N9q87G/+mJ08LeYC/fyXFjeQ 7pzUVGkgeSzPF/VqqR7jivy4sMzaGeNka1gLKfyOHPwowBWJqwXSCIvT6dwsxwd8Rd04 Krnw== X-Forwarded-Encrypted: i=1; AJvYcCVFkArGgPUjPyYyfeHtIKfOObkW6jRh1sqCZ0z+22AElR1QpJZNQk3edXOkUEQY9KqrdJr1ROU7BCtck4o=@vger.kernel.org X-Gm-Message-State: AOJu0Yy3b1c6MGa/tfMXuY6fV43aicNoP7vt1EnisYk/cE+81JPtpsTG 5hnqrtTqxMfRXjw+TAimc2g6I7ApbYL9oSuBN5YtS7J4/v33OL/07nRpbz+osP7JwXnnlCeclyD +BbfpOQ== X-Google-Smtp-Source: AGHT+IF+G4BAyfyJWSsO64c7Jj+8HIOb0dhGOApcR2eIgjNiak3y9klCm22xFuNKZ49XTx39xgEZTtsr9OtS X-Received: from wmbg1.prod.google.com ([2002:a05:600c:a401:b0:439:6370:39d]) (user=dvyukov job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:c0b:b0:38f:2352:4314 with SMTP id ffacd0b85a97d-38f24519e3dmr2304748f8f.38.1739437730701; Thu, 13 Feb 2025 01:08:50 -0800 (PST) Date: Thu, 13 Feb 2025 10:08:21 +0100 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.48.1.502.g6dc24dfdaf-goog Message-ID: Subject: [PATCH v7 8/9] perf test: Add tests for latency and parallelism profiling From: Dmitry Vyukov To: namhyung@kernel.org, irogers@google.com, acme@kernel.org, ak@linux.intel.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Dmitry Vyukov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ensure basic operation of latency/parallelism profiling and that main latency/parallelism record/report invocations don't fail/crash. Signed-off-by: Dmitry Vyukov Cc: Namhyung Kim Cc: Arnaldo Carvalho de Melo Cc: Ian Rogers Cc: Andi Kleen Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Andi Kleen --- tools/perf/tests/shell/base_report/setup.sh | 18 ++++++- .../tests/shell/base_report/test_basic.sh | 52 +++++++++++++++++++ 2 files changed, 69 insertions(+), 1 deletion(-) diff --git a/tools/perf/tests/shell/base_report/setup.sh b/tools/perf/tests= /shell/base_report/setup.sh index b03501b2e8fc5..8634e7e0dda6a 100755 --- a/tools/perf/tests/shell/base_report/setup.sh +++ b/tools/perf/tests/shell/base_report/setup.sh @@ -15,6 +15,8 @@ # include working environment . ../common/init.sh =20 +TEST_RESULT=3D0 + test -d "$HEADER_TAR_DIR" || mkdir -p "$HEADER_TAR_DIR" =20 SW_EVENT=3D"cpu-clock" @@ -26,7 +28,21 @@ PERF_EXIT_CODE=3D$? CHECK_EXIT_CODE=3D$? =20 print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE "prepare the perf.data file" -TEST_RESULT=3D$? +(( TEST_RESULT +=3D $? )) + +# Some minimal parallel workload. +$CMD_PERF record --latency -o $CURRENT_TEST_DIR/perf.data.1 bash -c "for i= in {1..100} ; do cat /proc/cpuinfo 1> /dev/null & done; sleep 1" 2> $LOGS_= DIR/setup-latency.log +PERF_EXIT_CODE=3D$? + +echo =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +cat $LOGS_DIR/setup-latency.log +echo =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +../common/check_all_patterns_found.pl "$RE_LINE_RECORD1" "$RE_LINE_RECORD2= " < $LOGS_DIR/setup-latency.log +CHECK_EXIT_CODE=3D$? + +print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE "prepare the perf.data.1 fi= le" +(( TEST_RESULT +=3D $? )) =20 print_overall_results $TEST_RESULT exit $? diff --git a/tools/perf/tests/shell/base_report/test_basic.sh b/tools/perf/= tests/shell/base_report/test_basic.sh index 2398eba4d3fdd..adfd8713b8f87 100755 --- a/tools/perf/tests/shell/base_report/test_basic.sh +++ b/tools/perf/tests/shell/base_report/test_basic.sh @@ -183,6 +183,58 @@ print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE "symbol= filter" (( TEST_RESULT +=3D $? )) =20 =20 +### latency and parallelism + +# Record with --latency should record with context switches. +$CMD_PERF report -i $CURRENT_TEST_DIR/perf.data.1 --stdio --header-only > = $LOGS_DIR/latency_header.log +PERF_EXIT_CODE=3D$? + +../common/check_all_patterns_found.pl ", context_switch =3D 1, " < $LOGS_D= IR/latency_header.log +CHECK_EXIT_CODE=3D$? + +print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE "latency header" +(( TEST_RESULT +=3D $? )) + + +# The default report for latency profile should show Overhead and Latency = fields (in that order). +$CMD_PERF report --stdio -i $CURRENT_TEST_DIR/perf.data.1 > $LOGS_DIR/late= ncy_default.log 2> $LOGS_DIR/latency_default.err +PERF_EXIT_CODE=3D$? + +../common/check_all_patterns_found.pl "# Overhead Latency Command" < $L= OGS_DIR/latency_default.log +CHECK_EXIT_CODE=3D$? +../common/check_errors_whitelisted.pl "stderr-whitelist.txt" < $LOGS_DIR/l= atency_default.err +(( CHECK_EXIT_CODE +=3D $? )) + +print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE "default report for latency= profile" +(( TEST_RESULT +=3D $? )) + + +# The latency report for latency profile should show Latency and Overhead = fields (in that order). +$CMD_PERF report --latency --stdio -i $CURRENT_TEST_DIR/perf.data.1 > $LOG= S_DIR/latency_latency.log 2> $LOGS_DIR/latency_latency.err +PERF_EXIT_CODE=3D$? + +../common/check_all_patterns_found.pl "# Latency Overhead Command" < $L= OGS_DIR/latency_latency.log +CHECK_EXIT_CODE=3D$? +../common/check_errors_whitelisted.pl "stderr-whitelist.txt" < $LOGS_DIR/l= atency_latency.err +(( CHECK_EXIT_CODE +=3D $? )) + +print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE "latency report for latency= profile" +(( TEST_RESULT +=3D $? )) + + +# Ensure parallelism histogram with parallelism filter does not fail/crash. +$CMD_PERF report --hierarchy --sort latency,parallelism,comm,symbol --para= llelism=3D1,2 --stdio -i $CURRENT_TEST_DIR/perf.data.1 > $LOGS_DIR/parallel= ism_hierarchy.log 2> $LOGS_DIR/parallelism_hierarchy.err +PERF_EXIT_CODE=3D$? + +../common/check_all_patterns_found.pl "# Latency Parallelism / = Command / Symbol" < $LOGS_DIR/parallelism_hierarchy.log +CHECK_EXIT_CODE=3D$? +../common/check_errors_whitelisted.pl "stderr-whitelist.txt" < $LOGS_DIR/p= arallelism_hierarchy.err +(( CHECK_EXIT_CODE +=3D $? )) + +print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE "parallelism histogram" +(( TEST_RESULT +=3D $? )) + + # TODO: $CMD_PERF report -n --showcpuutilization -TUxDg 2> 01.log =20 # print overall results --=20 2.48.1.502.g6dc24dfdaf-goog From nobody Sat Feb 22 00:03:52 2025 Received: from mail-ej1-f74.google.com (mail-ej1-f74.google.com [209.85.218.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A48932139A6 for ; Thu, 13 Feb 2025 09:08:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437737; cv=none; b=VyOmvHVkBgr6KD8RHjX3azhsZVlnmHOI3rPgdOWHodAM+hdDIQedw03jjOHiyTY1O/uNMtkM3Epw8l5QW/XtqCe6rBu9L5vgDbyt7pybXJh3zdi5Y1bHXne7KaKpACDdPvq8jeMS1+CPQCX7lQ25pybJ0Rb5OOw09xVsUm0uWYE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739437737; c=relaxed/simple; bh=XfN24f35OJ2vk/m+e6pUtd4T3vptKtMhhNopu2gMhYw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=WiCefRcSp2P2ioU5MNYoU0w3rJGYRaPhLDzAQ50ixJ3j75mBg4DvGVJ1QZ5rDpICgE6yRznN9whULkTqmq1Y6/TZFNKolnfB/TtxaMQKR7AKSQzttV0dEw6ePPMP7ZP5m7SLcZnk4+X5oJtP0k1hm8j/dbwbrVNE0BqjPAyfoxE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=AZkTIhPl; arc=none smtp.client-ip=209.85.218.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="AZkTIhPl" Received: by mail-ej1-f74.google.com with SMTP id a640c23a62f3a-ab7fab91f3cso68511166b.3 for ; Thu, 13 Feb 2025 01:08:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739437733; x=1740042533; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=fgMJGG8J9qWcW3TupvJXIyqnEtZAoqP7JYM6BoJGJhw=; b=AZkTIhPlKAICxka45kVGwQQindk/IfD24e6ylVuJ2Qy41Lyoc0Y8G0hEePhcOU/ffu 1e6EE6n4EaGV33DEAPCPHKkL2/SrL9C4iiyrlhWYz3KuEMfmqqDN9AOQDP8KfZ4Zt4Mv VOl6KDnk8DP/DI6SeLA2dxk9PDdO8jAWEMJX29bO2gnMCN6QO6TzFkWMgBtwpdMwZW1g t3YmAtnYn+6HgD+sly9+YvZZFSdHVtRa9dojFWGwUQZr17zthwMx/Vs5Fdz4he0rgBik azsMrK8L2g/JkYyksdksPoHtiAzLoXRpadm3Vokc/4/hoWJRl2cenB83UqyZQKb1ilgf fVhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739437733; x=1740042533; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fgMJGG8J9qWcW3TupvJXIyqnEtZAoqP7JYM6BoJGJhw=; b=cW7NLMwMI04wtazzpzeNqhna+85bTjUQb9by5hE1c8aj7gj7Xn70aP2Orvq3zODCdJ 637S63NzOkd82EjobOByzFtw8qcmXFDyGBYzQGPww5wuonnqYx9W6SXF1AC9lF748GBc AiZ0QhmEvYSBJL1f4PzSj9KUNsJr6G4SGRfYgMHK9/I43wF72tVBSBc5EjLGvjCOq/8P rEHpYyaMTnZV+1Vwxj3B2HD4UkS/tasdzoiam8kMIxKkIbBrl3O+csp0LQA2sX1BhQ+A 6LAEVp0RbdsDJJDl3CWdhFanQPoXzPTUocPu5Sz/nxvbRsPAiZqIquPanXjrygMP2vj0 3LIg== X-Forwarded-Encrypted: i=1; AJvYcCUoz012yHtva16NlAvP5O3VAe0BFBh/T7LGkYM4ebR0+YArkjx5mC4znZfYRP3+SaqNc+m9yS+dKLhu3fI=@vger.kernel.org X-Gm-Message-State: AOJu0YwUc60mllkQ3TzIf8BzUTAMIAQ7pn/6TpFAlNsahv9x8TYF+5fd wvcD8lTuUDHK1cMvj66u0T03aPjruVF6XkO5P5dyzVvPdji2noxjn+HZlG5z6kR5mAKg2SJCX7x bIQiavA== X-Google-Smtp-Source: AGHT+IESQcCr8+K8FwNJFz7mKQPjJ9j9o1aqQnbiHg2o232jZMxRV6+6jYX0sYilojk6zt9/4a+CTtA4JxXR X-Received: from ejv1.prod.google.com ([2002:a17:906:3081:b0:ab7:b298:3b76]) (user=dvyukov job=prod-delivery.src-stubby-dispatcher) by 2002:a17:907:dab:b0:aa6:8cbc:8d15 with SMTP id a640c23a62f3a-ab7f338ef2fmr650865366b.14.1739437733104; Thu, 13 Feb 2025 01:08:53 -0800 (PST) Date: Thu, 13 Feb 2025 10:08:22 +0100 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.48.1.502.g6dc24dfdaf-goog Message-ID: <7c1cb1c8f9901e945162701ba7269d0f9c70be89.1739437531.git.dvyukov@google.com> Subject: [PATCH v7 9/9] perf hist: Shrink struct hist_entry size From: Dmitry Vyukov To: namhyung@kernel.org, irogers@google.com, acme@kernel.org, ak@linux.intel.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Dmitry Vyukov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Reorder the struct fields by size to reduce paddings and reduce struct simd_flags size from 8 to 1 byte. This reduces struct hist_entry size by 8 bytes (592->584), and leaves a single more usable 6 byte padding hole. Signed-off-by: Dmitry Vyukov Cc: Namhyung Kim Cc: Arnaldo Carvalho de Melo Cc: Ian Rogers Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Andi Kleen --- Pahole output before: struct hist_entry { struct rb_node rb_node_in __attribute__((__aligned__(8))); /* = 0 24 */ struct rb_node rb_node __attribute__((__aligned__(8))); /* = 24 24 */ union { struct list_head node; /* 48 16 */ struct list_head head; /* 48 16 */ } pairs; /* 48 16 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct he_stat stat; /* 64 80 */ /* XXX last struct has 4 bytes of padding */ /* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */ struct he_stat * stat_acc; /* 144 8 */ struct map_symbol ms; /* 152 24 */ struct thread * thread; /* 176 8 */ struct comm * comm; /* 184 8 */ /* --- cacheline 3 boundary (192 bytes) --- */ struct namespace_id cgroup_id; /* 192 16 */ u64 cgroup; /* 208 8 */ u64 ip; /* 216 8 */ u64 transaction; /* 224 8 */ s32 socket; /* 232 4 */ s32 cpu; /* 236 4 */ int parallelism; /* 240 4 */ /* XXX 4 bytes hole, try to pack */ u64 code_page_size; /* 248 8 */ /* --- cacheline 4 boundary (256 bytes) --- */ u64 weight; /* 256 8 */ u64 ins_lat; /* 264 8 */ u64 p_stage_cyc; /* 272 8 */ u8 cpumode; /* 280 1 */ u8 depth; /* 281 1 */ /* XXX 2 bytes hole, try to pack */ int mem_type_off; /* 284 4 */ struct simd_flags simd_flags; /* 288 8 */ _Bool dummy; /* 296 1 */ _Bool leaf; /* 297 1 */ char level; /* 298 1 */ /* XXX 1 byte hole, try to pack */ filter_mask_t filtered; /* 300 2 */ u16 callchain_size; /* 302 2 */ union { struct hist_entry_diff diff; /* 304 120 */ struct { u16 row_offset; /* 304 2 */ u16 nr_rows; /* 306 2 */ _Bool init_have_children; /* 308 1 */ _Bool unfolded; /* 309 1 */ _Bool has_children; /* 310 1 */ _Bool has_no_entry; /* 311 1 */ }; /* 304 8 */ }; /* 304 120 */ /* --- cacheline 6 boundary (384 bytes) was 40 bytes ago --- */ char * srcline; /* 424 8 */ char * srcfile; /* 432 8 */ struct symbol * parent; /* 440 8 */ /* --- cacheline 7 boundary (448 bytes) --- */ struct branch_info * branch_info; /* 448 8 */ long int time; /* 456 8 */ struct hists * hists; /* 464 8 */ struct mem_info * mem_info; /* 472 8 */ struct block_info * block_info; /* 480 8 */ struct kvm_info * kvm_info; /* 488 8 */ void * raw_data; /* 496 8 */ u32 raw_size; /* 504 4 */ int num_res; /* 508 4 */ /* --- cacheline 8 boundary (512 bytes) --- */ struct res_sample * res_samples; /* 512 8 */ void * trace_output; /* 520 8 */ struct perf_hpp_list * hpp_list; /* 528 8 */ struct hist_entry * parent_he; /* 536 8 */ struct hist_entry_ops * ops; /* 544 8 */ struct annotated_data_type * mem_type; /* 552 8 */ union { struct { struct rb_root_cached hroot_in; /* 560 16 */ /* --- cacheline 9 boundary (576 bytes) --- */ struct rb_root_cached hroot_out; /* 576 16 */ }; /* 560 32 */ struct rb_root sorted_chain; /* 560 8 */ }; /* 560 32 */ /* --- cacheline 9 boundary (576 bytes) was 16 bytes ago --- */ struct callchain_root callchain[] __attribute__((__aligned__(8))); /*= 592 0 */ /* size: 592, cachelines: 10, members: 49 */ /* sum members: 585, holes: 3, sum holes: 7 */ /* paddings: 1, sum paddings: 4 */ /* forced alignments: 3 */ /* last cacheline: 16 bytes */ } __attribute__((__aligned__(8))); After: struct hist_entry { struct rb_node rb_node_in __attribute__((__aligned__(8))); /* = 0 24 */ struct rb_node rb_node __attribute__((__aligned__(8))); /* = 24 24 */ union { struct list_head node; /* 48 16 */ struct list_head head; /* 48 16 */ } pairs; /* 48 16 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct he_stat stat; /* 64 80 */ /* XXX last struct has 4 bytes of padding */ /* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */ struct he_stat * stat_acc; /* 144 8 */ struct map_symbol ms; /* 152 24 */ struct thread * thread; /* 176 8 */ struct comm * comm; /* 184 8 */ /* --- cacheline 3 boundary (192 bytes) --- */ struct namespace_id cgroup_id; /* 192 16 */ u64 cgroup; /* 208 8 */ u64 ip; /* 216 8 */ u64 transaction; /* 224 8 */ u64 code_page_size; /* 232 8 */ u64 weight; /* 240 8 */ u64 ins_lat; /* 248 8 */ /* --- cacheline 4 boundary (256 bytes) --- */ u64 p_stage_cyc; /* 256 8 */ s32 socket; /* 264 4 */ s32 cpu; /* 268 4 */ int parallelism; /* 272 4 */ int mem_type_off; /* 276 4 */ u8 cpumode; /* 280 1 */ u8 depth; /* 281 1 */ struct simd_flags simd_flags; /* 282 1 */ _Bool dummy; /* 283 1 */ _Bool leaf; /* 284 1 */ char level; /* 285 1 */ filter_mask_t filtered; /* 286 2 */ u16 callchain_size; /* 288 2 */ /* XXX 6 bytes hole, try to pack */ union { struct hist_entry_diff diff; /* 296 120 */ struct { u16 row_offset; /* 296 2 */ u16 nr_rows; /* 298 2 */ _Bool init_have_children; /* 300 1 */ _Bool unfolded; /* 301 1 */ _Bool has_children; /* 302 1 */ _Bool has_no_entry; /* 303 1 */ }; /* 296 8 */ }; /* 296 120 */ /* --- cacheline 6 boundary (384 bytes) was 32 bytes ago --- */ char * srcline; /* 416 8 */ char * srcfile; /* 424 8 */ struct symbol * parent; /* 432 8 */ struct branch_info * branch_info; /* 440 8 */ /* --- cacheline 7 boundary (448 bytes) --- */ long int time; /* 448 8 */ struct hists * hists; /* 456 8 */ struct mem_info * mem_info; /* 464 8 */ struct block_info * block_info; /* 472 8 */ struct kvm_info * kvm_info; /* 480 8 */ void * raw_data; /* 488 8 */ u32 raw_size; /* 496 4 */ int num_res; /* 500 4 */ struct res_sample * res_samples; /* 504 8 */ /* --- cacheline 8 boundary (512 bytes) --- */ void * trace_output; /* 512 8 */ struct perf_hpp_list * hpp_list; /* 520 8 */ struct hist_entry * parent_he; /* 528 8 */ struct hist_entry_ops * ops; /* 536 8 */ struct annotated_data_type * mem_type; /* 544 8 */ union { struct { struct rb_root_cached hroot_in; /* 552 16 */ struct rb_root_cached hroot_out; /* 568 16 */ }; /* 552 32 */ struct rb_root sorted_chain; /* 552 8 */ }; /* 552 32 */ /* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */ struct callchain_root callchain[] __attribute__((__aligned__(8))); /*= 584 0 */ /* size: 584, cachelines: 10, members: 49 */ /* sum members: 578, holes: 1, sum holes: 6 */ /* paddings: 1, sum paddings: 4 */ /* forced alignments: 3 */ /* last cacheline: 8 bytes */ } __attribute__((__aligned__(8))); --- tools/perf/util/hist.h | 8 ++++---- tools/perf/util/sample.h | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 29d4c7a3d1747..317d06cca8b88 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -239,16 +239,16 @@ struct hist_entry { u64 cgroup; u64 ip; u64 transaction; - s32 socket; - s32 cpu; - int parallelism; u64 code_page_size; u64 weight; u64 ins_lat; u64 p_stage_cyc; + s32 socket; + s32 cpu; + int parallelism; + int mem_type_off; u8 cpumode; u8 depth; - int mem_type_off; struct simd_flags simd_flags; =20 /* We are added by hists__add_dummy_entry. */ diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h index 70b2c3135555e..ab756d61cbcd6 100644 --- a/tools/perf/util/sample.h +++ b/tools/perf/util/sample.h @@ -67,7 +67,7 @@ struct aux_sample { }; =20 struct simd_flags { - u64 arch:1, /* architecture (isa) */ + u8 arch:1, /* architecture (isa) */ pred:2; /* predication */ }; =20 --=20 2.48.1.502.g6dc24dfdaf-goog