From nobody Sun Dec 14 05:56:52 2025 Received: from mail-ej1-f74.google.com (mail-ej1-f74.google.com [209.85.218.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9EB061E32D5 for ; Fri, 7 Feb 2025 11:41:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738928489; cv=none; b=fDS1AgwMofXZx/Tr3TD1YFi+r4vd97vVUndtIriTyQ8qFSdwlfTrMAK3h4x0+ZSfT3mYK19VhSFgWpudOZY8Ry+5xUF+l+ad7TvBY6+SuGoVL3krBVC83Z0ryE/ic99G2zh1CXD8YDZcdpC+2jV1XOzfDB7vc4zwz6X7JGv7kaE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738928489; c=relaxed/simple; bh=DW9PtuhIXSe3NUmcIPyOiLbE2OTexJOMDqEI1hjD1MA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=fCujFywV1BrWv09u4xZ8f58WnXvlZE3np6AH34/eP9/TTHhy3Gzta/iA6ZQ2oAYCH4k1kkjUzs9oTC7Bv2IAHkjDxuLkbNccTSIkQ9Vo3bWXe6yiHFEMxVht8IC37uHinyFdOGivJRB4OKNyVbrEjF3L9Yrw4F/3b3TFCjStQrs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=rcX0A59j; arc=none smtp.client-ip=209.85.218.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--dvyukov.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rcX0A59j" Received: by mail-ej1-f74.google.com with SMTP id a640c23a62f3a-ab789e71f9bso58476666b.3 for ; Fri, 07 Feb 2025 03:41:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738928486; x=1739533286; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=s4ondbbiTr0y7tpMZReKJJz1JJVDYpBZbDdrR4mmgfY=; b=rcX0A59jOZHqW+vlRHGRk30UvyIp0xCK/1PgZzxjr4efC0xMgny8T0dD6lq1aiQT3h YgVJCVQkSFElhVN/lATspBkmjLvHp9aoV6bdWfK1tYjjdB9nli03RF1TbU85Q6+iG+nB FX5f8cmGa2aJY+Xywe+zIFj3xcTpnW/Sfwfh/FFozSIq5hmEI7Wzbf1Hg8rurChPzHIl BmJtCrc7jcKKk/kRdqTimkITYVikRTsP1cc6IJYg63S9VwnTvulp9otnzyRyNVGqnqdi KhTtT9ZlRssXolkZHu5f12QaRDhRfE8de532v7kz3JCYf0Z4i2TSOqbLJzrxSkQjNtHu 7NvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738928486; x=1739533286; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=s4ondbbiTr0y7tpMZReKJJz1JJVDYpBZbDdrR4mmgfY=; b=fpDhQ9xHPsEmK2m4VFd7Ia8jcEy6fydlGhSLwASYbqPXPmFGfVbulo7PCFyQdcu5Em OVLSLirLpbu2XV/3YwCUwd/AO/KJ0UyRULRd/kIPRaWJPdpvn7XhLmb1nPwOwnpB5Sai +c2BmPf38HxjxplSP4Ls95ij9Kc7BudZRl3qMm5STyQKsrjbXEDWwY7FlHPH6SrFR4pP T3EYDcD+PnTAy30Vkjxtm7OIde0G5I15YUVAixhusmVW68DxJkHba6nNdEHum8o+Z8C6 Nq9MtPaQ7MtHF7TTuy4iPwaReHdfLNNWoyRmLoAjrDoArOs+d7dJp+58ZUh1iLll4tV4 E6BA== X-Forwarded-Encrypted: i=1; AJvYcCVvE77q8/HGq44Ai9aAcUh9taP9M4EGz0ugJkAaGeQ/ir/kQt5mBTWvoQeNyuw0Ul8kESM728oeaDmJH5s=@vger.kernel.org X-Gm-Message-State: AOJu0YxNZz6ZfyZ/+1VThYi27Ks5+clNhXJNEEPjKvmInvcNKNG00zrJ Z3A1vz+/+SsZe9UZfXEGzKgzpmK8V+R2wWWl6SPB2f4P8vMo7OnDInn7q1jDzJ8sTKweSY2NUg3 kkyzPDA== X-Google-Smtp-Source: AGHT+IFYu3Hwi2zoRq+UhjxxDJqRt1mm/ZuqtaHYz6HgRuFTYe7i/n4k+PLvcWatSTRT5a1efLV5svaGQlbT X-Received: from ejhu10.prod.google.com ([2002:a17:906:1daa:b0:aa6:b175:74eb]) (user=dvyukov job=prod-delivery.src-stubby-dispatcher) by 2002:a17:907:7f07:b0:aa6:ac9b:6822 with SMTP id a640c23a62f3a-ab789a6ada9mr240889566b.12.1738928486345; Fri, 07 Feb 2025 03:41:26 -0800 (PST) Date: Fri, 7 Feb 2025 12:40:33 +0100 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.48.1.502.g6dc24dfdaf-goog Message-ID: <2f02840baa8e2f9a32ccb4cd545fc8f8813192df.1738928210.git.dvyukov@google.com> Subject: [PATCH v6 6/9] perf report: Add --latency flag From: Dmitry Vyukov To: namhyung@kernel.org, irogers@google.com, ak@linux.intel.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Dmitry Vyukov , Arnaldo Carvalho de Melo Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add record/report --latency flag that allows to capture and show latency-centric profiles rather than the default CPU-consumption-centric profiles. For latency profiles record captures context switch events, and report shows Latency as the first column. Signed-off-by: Dmitry Vyukov Cc: Namhyung Kim Cc: Arnaldo Carvalho de Melo Cc: Ian Rogers Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- Changes in v6: - remove latency column in perf_hpp__cancel_latency if sort order is specified, but does not include latency Changes in v5: - added description of --latency flag in Documentation flags --- tools/perf/Documentation/perf-record.txt | 4 +++ tools/perf/Documentation/perf-report.txt | 5 +++ tools/perf/builtin-record.c | 20 +++++++++++ tools/perf/builtin-report.c | 32 +++++++++++++++--- tools/perf/ui/hist.c | 43 +++++++++++++++++++----- tools/perf/util/hist.h | 1 + tools/perf/util/sort.c | 33 ++++++++++++++---- tools/perf/util/sort.h | 2 +- tools/perf/util/symbol_conf.h | 4 ++- 9 files changed, 124 insertions(+), 20 deletions(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Document= ation/perf-record.txt index 80686d590de24..c7fc1ba265e27 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -227,6 +227,10 @@ OPTIONS '--filter' exists, the new filter expression will be combined with them by '&&'. =20 +--latency:: + Enable data collection for latency profiling. + Use perf report --latency for latency-centric profile. + -a:: --all-cpus:: System-wide collection from all CPUs (default if no target is spec= ified). diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Document= ation/perf-report.txt index 87f8645194062..66794131aec48 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -68,6 +68,11 @@ OPTIONS --hide-unresolved:: Only display entries resolved to a symbol. =20 +--latency:: + Show latency-centric profile rather than the default + CPU-consumption-centric profile + (requires perf record --latency flag). + -s:: --sort=3D:: Sort histogram entries by given key(s) - multiple keys can be specified diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 5db1aedf48df9..e219639ac401b 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -161,6 +161,7 @@ struct record { struct evlist *sb_evlist; pthread_t thread_id; int realtime_prio; + bool latency; bool switch_output_event_set; bool no_buildid; bool no_buildid_set; @@ -3371,6 +3372,9 @@ static struct option __record_options[] =3D { parse_events_option), OPT_CALLBACK(0, "filter", &record.evlist, "filter", "event filter", parse_filter), + OPT_BOOLEAN(0, "latency", &record.latency, + "Enable data collection for latency profiling.\n" + "\t\t\t Use perf report --latency for latency-centric profile."), OPT_CALLBACK_NOOPT(0, "exclude-perf", &record.evlist, NULL, "don't record events from perf itself", exclude_perf), @@ -4017,6 +4021,22 @@ int cmd_record(int argc, const char **argv) =20 } =20 + if (record.latency) { + /* + * There is no fundamental reason why latency profiling + * can't work for system-wide mode, but exact semantics + * and details are to be defined. + * See the following thread for details: + * https://lore.kernel.org/all/Z4XDJyvjiie3howF@google.com/ + */ + if (record.opts.target.system_wide) { + pr_err("Failed: latency profiling is not supported with system-wide col= lection.\n"); + err =3D -EINVAL; + goto out_opts; + } + record.opts.record_switch_events =3D true; + } + if (rec->buildid_mmap) { if (!perf_can_record_build_id()) { pr_err("Failed: no support to record build id in mmap events, update yo= ur kernel.\n"); diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 2a19abdc869a1..8e064b8bd589d 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -112,6 +112,8 @@ struct report { u64 nr_entries; u64 queue_size; u64 total_cycles; + u64 total_samples; + u64 singlethreaded_samples; int socket_filter; DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); struct branch_type_stat brtype_stat; @@ -331,6 +333,10 @@ static int process_sample_event(const struct perf_tool= *tool, &rep->total_cycles, evsel); } =20 + rep->total_samples++; + if (al.parallelism =3D=3D 1) + rep->singlethreaded_samples++; + ret =3D hist_entry_iter__add(&iter, &al, rep->max_stack, rep); if (ret < 0) pr_debug("problem adding hist entry, skipping event\n"); @@ -1079,6 +1085,11 @@ static int __cmd_report(struct report *rep) return ret; } =20 + /* Don't show Latency column for non-parallel profiles by default. */ + if (!symbol_conf.prefer_latency && rep->total_samples && + rep->singlethreaded_samples * 100 / rep->total_samples >=3D 99) + perf_hpp__cancel_latency(); + evlist__check_mem_load_aux(session->evlist); =20 if (rep->stats_mode) @@ -1468,6 +1479,10 @@ int cmd_report(int argc, const char **argv) "Disable raw trace ordering"), OPT_BOOLEAN(0, "skip-empty", &report.skip_empty, "Do not display empty (or dummy) events in the output"), + OPT_BOOLEAN(0, "latency", &symbol_conf.prefer_latency, + "Show latency-centric profile rather than the default\n" + "\t\t\t CPU-consumption-centric profile\n" + "\t\t\t (requires perf record --latency flag)."), OPT_END() }; struct perf_data data =3D { @@ -1722,16 +1737,25 @@ int cmd_report(int argc, const char **argv) symbol_conf.annotate_data_sample =3D true; } =20 + symbol_conf.enable_latency =3D true; if (report.disable_order || !perf_session__has_switch_events(session)) { if (symbol_conf.parallelism_list_str || - (sort_order && strstr(sort_order, "parallelism")) || - (field_order && strstr(field_order, "parallelism"))) { + symbol_conf.prefer_latency || + (sort_order && (strstr(sort_order, "latency") || + strstr(sort_order, "parallelism"))) || + (field_order && (strstr(field_order, "latency") || + strstr(field_order, "parallelism")))) { if (report.disable_order) - ui__error("Use of parallelism is incompatible with --disable-order.\n"= ); + ui__error("Use of latency profile or parallelism is incompatible with = --disable-order.\n"); else - ui__error("Use of parallelism requires --switch-events during record.\= n"); + ui__error("Use of latency profile or parallelism requires --latency fl= ag during record.\n"); return -1; } + /* + * If user did not ask for anything related to + * latency/parallelism explicitly, just don't show it. + */ + symbol_conf.enable_latency =3D false; } =20 if (sort_order && strstr(sort_order, "ipc")) { diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c index 6de6309595f9e..0f2ef02004404 100644 --- a/tools/perf/ui/hist.c +++ b/tools/perf/ui/hist.c @@ -632,27 +632,36 @@ void perf_hpp__init(void) return; =20 if (symbol_conf.cumulate_callchain) { - hpp_dimension__add_output(PERF_HPP__OVERHEAD_ACC); + /* Use idempotent addition to avoid more complex logic. */ + if (symbol_conf.prefer_latency) + hpp_dimension__add_output(PERF_HPP__LATENCY_ACC, true); + hpp_dimension__add_output(PERF_HPP__OVERHEAD_ACC, true); + if (symbol_conf.enable_latency) + hpp_dimension__add_output(PERF_HPP__LATENCY_ACC, true); perf_hpp__format[PERF_HPP__OVERHEAD].name =3D "Self"; } =20 - hpp_dimension__add_output(PERF_HPP__OVERHEAD); + if (symbol_conf.prefer_latency) + hpp_dimension__add_output(PERF_HPP__LATENCY, true); + hpp_dimension__add_output(PERF_HPP__OVERHEAD, true); + if (symbol_conf.enable_latency) + hpp_dimension__add_output(PERF_HPP__LATENCY, true); =20 if (symbol_conf.show_cpu_utilization) { - hpp_dimension__add_output(PERF_HPP__OVERHEAD_SYS); - hpp_dimension__add_output(PERF_HPP__OVERHEAD_US); + hpp_dimension__add_output(PERF_HPP__OVERHEAD_SYS, false); + hpp_dimension__add_output(PERF_HPP__OVERHEAD_US, false); =20 if (perf_guest) { - hpp_dimension__add_output(PERF_HPP__OVERHEAD_GUEST_SYS); - hpp_dimension__add_output(PERF_HPP__OVERHEAD_GUEST_US); + hpp_dimension__add_output(PERF_HPP__OVERHEAD_GUEST_SYS, false); + hpp_dimension__add_output(PERF_HPP__OVERHEAD_GUEST_US, false); } } =20 if (symbol_conf.show_nr_samples) - hpp_dimension__add_output(PERF_HPP__SAMPLES); + hpp_dimension__add_output(PERF_HPP__SAMPLES, false); =20 if (symbol_conf.show_total_period) - hpp_dimension__add_output(PERF_HPP__PERIOD); + hpp_dimension__add_output(PERF_HPP__PERIOD, false); } =20 void perf_hpp_list__column_register(struct perf_hpp_list *list, @@ -701,6 +710,24 @@ void perf_hpp__cancel_cumulate(void) } } =20 +void perf_hpp__cancel_latency(void) +{ + struct perf_hpp_fmt *fmt, *lat, *acc, *tmp; + + if (is_strict_order(field_order)) + return; + if (sort_order && strstr(sort_order, "latency")) + return; + + lat =3D &perf_hpp__format[PERF_HPP__LATENCY]; + acc =3D &perf_hpp__format[PERF_HPP__LATENCY_ACC]; + + perf_hpp_list__for_each_format_safe(&perf_hpp_list, fmt, tmp) { + if (fmt_equal(lat, fmt) || fmt_equal(acc, fmt)) + perf_hpp__column_unregister(fmt); + } +} + void perf_hpp__setup_output_field(struct perf_hpp_list *list) { struct perf_hpp_fmt *fmt; diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 91159f16c60b2..29d4c7a3d1747 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -582,6 +582,7 @@ enum { =20 void perf_hpp__init(void); void perf_hpp__cancel_cumulate(void); +void perf_hpp__cancel_latency(void); void perf_hpp__setup_output_field(struct perf_hpp_list *list); void perf_hpp__reset_output_field(struct perf_hpp_list *list); void perf_hpp__append_sort_keys(struct perf_hpp_list *list); diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index bc4c3acfe7552..2b6023de7a53a 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -2622,6 +2622,7 @@ struct hpp_dimension { const char *name; struct perf_hpp_fmt *fmt; int taken; + int was_taken; }; =20 #define DIM(d, n) { .name =3D n, .fmt =3D &perf_hpp__format[d], } @@ -3513,6 +3514,7 @@ static int __hpp_dimension__add(struct hpp_dimension = *hd, return -1; =20 hd->taken =3D 1; + hd->was_taken =3D 1; perf_hpp_list__register_sort_field(list, fmt); return 0; } @@ -3547,10 +3549,15 @@ static int __hpp_dimension__add_output(struct perf_= hpp_list *list, return 0; } =20 -int hpp_dimension__add_output(unsigned col) +int hpp_dimension__add_output(unsigned col, bool implicit) { + struct hpp_dimension *hd; + BUG_ON(col >=3D PERF_HPP__MAX_INDEX); - return __hpp_dimension__add_output(&perf_hpp_list, &hpp_sort_dimensions[c= ol]); + hd =3D &hpp_sort_dimensions[col]; + if (implicit && !hd->was_taken) + return 0; + return __hpp_dimension__add_output(&perf_hpp_list, hd); } =20 int sort_dimension__add(struct perf_hpp_list *list, const char *tok, @@ -3809,10 +3816,24 @@ static char *setup_overhead(char *keys) if (sort__mode =3D=3D SORT_MODE__DIFF) return keys; =20 - keys =3D prefix_if_not_in("overhead", keys); - - if (symbol_conf.cumulate_callchain) - keys =3D prefix_if_not_in("overhead_children", keys); + if (symbol_conf.prefer_latency) { + keys =3D prefix_if_not_in("overhead", keys); + keys =3D prefix_if_not_in("latency", keys); + if (symbol_conf.cumulate_callchain) { + keys =3D prefix_if_not_in("overhead_children", keys); + keys =3D prefix_if_not_in("latency_children", keys); + } + } else if (!keys || (!strstr(keys, "overhead") && + !strstr(keys, "latency"))) { + if (symbol_conf.enable_latency) + keys =3D prefix_if_not_in("latency", keys); + keys =3D prefix_if_not_in("overhead", keys); + if (symbol_conf.cumulate_callchain) { + if (symbol_conf.enable_latency) + keys =3D prefix_if_not_in("latency_children", keys); + keys =3D prefix_if_not_in("overhead_children", keys); + } + } =20 return keys; } diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 11fb15f914093..180d36a2bea35 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -141,7 +141,7 @@ int report_parse_ignore_callees_opt(const struct option= *opt, const char *arg, i =20 bool is_strict_order(const char *order); =20 -int hpp_dimension__add_output(unsigned col); +int hpp_dimension__add_output(unsigned col, bool implicit); void reset_dimensions(void); int sort_dimension__add(struct perf_hpp_list *list, const char *tok, struct evlist *evlist, diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h index c5b2e56127e22..cd9aa82c7d5ad 100644 --- a/tools/perf/util/symbol_conf.h +++ b/tools/perf/util/symbol_conf.h @@ -49,7 +49,9 @@ struct symbol_conf { keep_exited_threads, annotate_data_member, annotate_data_sample, - skip_empty; + skip_empty, + enable_latency, + prefer_latency; const char *vmlinux_name, *kallsyms_name, *source_prefix, --=20 2.48.1.502.g6dc24dfdaf-goog