From nobody Mon Feb 9 19:31:36 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F4B715E5DC; Tue, 2 Apr 2024 21:46:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712094373; cv=none; b=CzXqTpb8EVJ/ExoaxqfJlWNtxV1eOauRbrKzXuShyDya8TqXcCrk5Hs5IijhdDZt1bXEp5oIq3gKvcDXsANdEIm8EW4vowN/AuA8gx3aGMROYDCBBDYuLS03Dg8JdNco9IKz0ywRnY9lQFNCJ0t3p9lepopBwMDynFLl2vMQECA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712094373; c=relaxed/simple; bh=AHwir3cW6P9/pc/cDHnMSyf40GMHHmckaJSV4TyfqiE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lLKLaeqkXnPKreuuM8Yy2jyUWr4y+pQlqQ5MZjzQVXNKdSF5UU2F/jMUQelRutsgBQyg3HydLa/TaREAslO5TzBRLjrElxMRcow1iX72d7G/mdFmm6RzpPE2SYjeoKe3QQwtuIjDsU3daRidunsl1qQNJpnqETNJ2QLCg6CDhJc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=jQWtmcec; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jQWtmcec" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712094371; x=1743630371; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AHwir3cW6P9/pc/cDHnMSyf40GMHHmckaJSV4TyfqiE=; b=jQWtmcec9asmgJwjBsPlFDN0YulEQLE1F3+ySRqPdHCblhcCjPC47yrv sYD7pibObT5tonItt7bB88OONOKZj/mQYd4HfN4tChc58XE2b16xKSNBH /++/S7Qm0eina2eNe6ifmy+S1X/hndFojVMF9WkEciUpT00L5u7RZuFah 7ornlA2Xsy6Xsnn2CO9V3xduBCqA7Wc/JMISiXYdPHL2elVBIIsivm8gM fU+SJRV2EWvzB/Enur5A5t3eP8j//gWVmctOmQTleLth7l0/LeUccHtJ8 J9h1lRnjJufu+BCG7W5kL+3E2a2xhNZbQc0Iy94382PkL52DWGVZjdya1 Q==; X-CSE-ConnectionGUID: iLW4k12NSQGH/Pr6C6VNtQ== X-CSE-MsgGUID: Ohehe3BIQNe8mgtGgKcnqQ== X-IronPort-AV: E=McAfee;i="6600,9927,11032"; a="18020015" X-IronPort-AV: E=Sophos;i="6.07,176,1708416000"; d="scan'208";a="18020015" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Apr 2024 14:45:05 -0700 X-CSE-ConnectionGUID: 6ugxNBLySTagRo/nY/nn7Q== X-CSE-MsgGUID: q61O5J3eQeuePT6sCS4LuQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,176,1708416000"; d="scan'208";a="22849082" Received: from fl31ca102ks0602.deacluster.intel.com (HELO gnr-bkc.deacluster.intel.com) ([10.75.133.163]) by fmviesa004.fm.intel.com with ESMTP; 02 Apr 2024 14:45:03 -0700 From: weilin.wang@intel.com To: weilin.wang@intel.com, Namhyung Kim , Ian Rogers , Arnaldo Carvalho de Melo , Peter Zijlstra , Ingo Molnar , Alexander Shishkin , Jiri Olsa , Adrian Hunter , Kan Liang Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers Subject: [RFC PATCH v7 3/6] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric. Date: Tue, 2 Apr 2024 17:44:33 -0400 Message-ID: <20240402214436.1409476-4-weilin.wang@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240402214436.1409476-1-weilin.wang@intel.com> References: <20240402214436.1409476-1-weilin.wang@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Weilin Wang When retire_latency value is used in a metric formula, perf stat would fork= a perf record process with "-e" and "-W" options. Perf record will collect required retire_latency values in parallel while perf stat is collecting counting values. At the point of time that perf stat stops counting, it would send sigterm s= ignal to perf record process and receiving sampling data back from perf record fr= om a pipe. Perf stat will then process the received data to get retire latency d= ata and calculate metric result. Another thread is required to synchronize between perf stat and perf record when we pass data through pipe. Signed-off-by: Weilin Wang Reviewed-by: Ian Rogers --- tools/perf/builtin-stat.c | 212 +++++++++++++++++++++++++++++++++- tools/perf/util/metricgroup.h | 8 ++ tools/perf/util/stat.h | 2 + 3 files changed, 220 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 4558b9d95441..2dcc1a12f7ef 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -94,8 +94,13 @@ #include #include =20 +#include "util/sample.h" +#include +#include + #define DEFAULT_SEPARATOR " " #define FREEZE_ON_SMI_PATH "devices/cpu/freeze_on_smi" +#define PERF_DATA "-" =20 static void print_counters(struct timespec *ts, int argc, const char **arg= v); =20 @@ -163,6 +168,8 @@ static struct perf_stat_config stat_config =3D { .ctl_fd_ack =3D -1, .iostat_run =3D false, .tpebs_events =3D LIST_HEAD_INIT(stat_config.tpebs_events), + .tpebs_results =3D LIST_HEAD_INIT(stat_config.tpebs_results), + .tpebs_pid =3D -1, }; =20 static bool cpus_map_matched(struct evsel *a, struct evsel *b) @@ -684,15 +691,173 @@ static enum counter_recovery stat_handle_error(struc= t evsel *counter) =20 if (child_pid !=3D -1) kill(child_pid, SIGTERM); + if (stat_config.tpebs_pid !=3D -1) + kill(stat_config.tpebs_pid, SIGTERM); return COUNTER_FATAL; } =20 -static int __run_perf_record(void) +static int __run_perf_record(const char **record_argv) { + int i =3D 0; + struct tpebs_event *e; + pr_debug("Prepare perf record for retire_latency\n"); + + record_argv[i++] =3D "perf"; + record_argv[i++] =3D "record"; + record_argv[i++] =3D "-W"; + record_argv[i++] =3D "--synth=3Dno"; + + if (stat_config.user_requested_cpu_list) { + record_argv[i++] =3D "-C"; + record_argv[i++] =3D stat_config.user_requested_cpu_list; + } + + if (stat_config.system_wide) + record_argv[i++] =3D "-a"; + + if (!stat_config.system_wide && !stat_config.user_requested_cpu_list) { + pr_err("Require -a or -C option to run sampling.\n"); + return -ECANCELED; + } + + list_for_each_entry(e, &stat_config.tpebs_events, nd) { + record_argv[i++] =3D "-e"; + record_argv[i++] =3D e->name; + } + + record_argv[i++] =3D "-o"; + record_argv[i++] =3D PERF_DATA; + + return 0; +} + +static void prepare_run_command(struct child_process *cmd, + const char **argv) +{ + memset(cmd, 0, sizeof(*cmd)); + cmd->argv =3D argv; + cmd->out =3D -1; +} + +static int prepare_perf_record(struct child_process *cmd) +{ + const char **record_argv; + int ret; + + record_argv =3D calloc(10 + 2 * stat_config.tpebs_event_size, sizeof(char= *)); + if (!record_argv) + return -1; + + ret =3D __run_perf_record(record_argv); + if (ret) + return ret; + + prepare_run_command(cmd, record_argv); + return start_command(cmd); +} + +struct perf_script { + struct perf_tool tool; + struct perf_session *session; +}; + +static void tpebs_event_name__delete(struct tpebs_event *e) +{ + zfree(&e->name); + zfree(&e->tpebs_name); +} + +static void tpebs_event__delete(struct tpebs_event *e) +{ + tpebs_event_name__delete(e); + free(e); +} + +static void tpebs_retire_lat__delete(struct tpebs_retire_lat *r) +{ + tpebs_event_name__delete(&r->event); + free(r); +} + +static void tpebs_data__delete(void) +{ + struct tpebs_retire_lat *r, *rtmp; + struct tpebs_event *e, *etmp; + + list_for_each_entry_safe(r, rtmp, &stat_config.tpebs_results, event.nd) { + list_del_init(&r->event.nd); + tpebs_retire_lat__delete(r); + } + list_for_each_entry_safe(e, etmp, &stat_config.tpebs_events, nd) { + list_del_init(&e->nd); + tpebs_event__delete(e); + } +} + +static int process_sample_event(struct perf_tool *tool __maybe_unused, + union perf_event *event __maybe_unused, + struct perf_sample *sample, + struct evsel *evsel, + struct machine *machine __maybe_unused) +{ + int ret =3D 0; + const char *evname; + struct tpebs_retire_lat *t; + + evname =3D evsel__name(evsel); + + /* + * Need to handle per core results? We are assuming average retire + * latency value will be used. Save the number of samples and the sum of + * retire latency value for each event. + */ + list_for_each_entry(t, &stat_config.tpebs_results, event.nd) { + if (!strcmp(evname, t->event.name)) { + t->count +=3D 1; + t->sum +=3D sample->retire_lat; + break; + } + } + + return ret; +} + +static int process_feature_event(struct perf_session *session, + union perf_event *event) +{ + if (event->feat.feat_id < HEADER_LAST_FEATURE) + return perf_event__process_feature(session, event); return 0; } =20 +static void *__sample_reader(void *arg) +{ + struct child_process *cmd =3D arg; + struct perf_session *session; + struct perf_data data =3D { + .mode =3D PERF_DATA_MODE_READ, + .path =3D PERF_DATA, + .file.fd =3D cmd->out, + }; + struct perf_script script =3D { + .tool =3D { + .sample =3D process_sample_event, + .feature =3D process_feature_event, + .attr =3D perf_event__process_attr, + }, + }; + + session =3D perf_session__new(&data, &script.tool); + if (IS_ERR(session)) + return NULL; + script.session =3D session; + perf_session__process_events(session); + perf_session__delete(session); + + return NULL; +} + static int __run_perf_stat(int argc, const char **argv, int run_idx) { int interval =3D stat_config.interval; @@ -709,6 +874,8 @@ static int __run_perf_stat(int argc, const char **argv,= int run_idx) struct affinity saved_affinity, *affinity =3D NULL; int err; bool second_pass =3D false; + struct child_process cmd; + pthread_t reader_thread; =20 /* * Prepare perf record for sampling event retire_latency before fork and @@ -716,10 +883,35 @@ static int __run_perf_stat(int argc, const char **arg= v, int run_idx) */ if (stat_config.tpebs_event_size > 0) { int ret; + struct tpebs_event *e; + + pr_debug("perf stat pid =3D %d\n", getpid()); + list_for_each_entry(e, &stat_config.tpebs_events, nd) { + struct tpebs_retire_lat *new =3D malloc(sizeof(struct tpebs_retire_lat)= ); =20 - ret =3D __run_perf_record(); + if (!new) + return -1; + new->event.name =3D strdup(e->name); + if (!new->event.name) + return -ENOMEM; + new->event.tpebs_name =3D strdup(e->tpebs_name); + if (!new->event.tpebs_name) + return -ENOMEM; + new->count =3D 0; + new->sum =3D 0; + list_add_tail(&new->event.nd, &stat_config.tpebs_results); + } + ret =3D prepare_perf_record(&cmd); if (ret) return ret; + if (pthread_create(&reader_thread, NULL, __sample_reader, &cmd)) { + kill(cmd.pid, SIGTERM); + close(cmd.out); + pr_err("Could not create thread to process sample data.\n"); + return -1; + } + /* Wait for perf record initialization a little bit.*/ + sleep(2); } =20 if (forks) { @@ -927,6 +1119,17 @@ static int __run_perf_stat(int argc, const char **arg= v, int run_idx) =20 t1 =3D rdclock(); =20 + if (stat_config.tpebs_event_size > 0) { + int ret; + + kill(cmd.pid, SIGTERM); + pthread_join(reader_thread, NULL); + close(cmd.out); + ret =3D finish_command(&cmd); + if (ret !=3D -ERR_RUN_COMMAND_WAITPID_SIGNAL) + return ret; + } + if (stat_config.walltime_run_table) stat_config.walltime_run[run_idx] =3D t1 - t0; =20 @@ -1034,6 +1237,9 @@ static void sig_atexit(void) if (child_pid !=3D -1) kill(child_pid, SIGTERM); =20 + if (stat_config.tpebs_pid !=3D -1) + kill(stat_config.tpebs_pid, SIGTERM); + sigprocmask(SIG_SETMASK, &oset, NULL); =20 if (signr =3D=3D -1) @@ -2974,5 +3180,7 @@ int cmd_stat(int argc, const char **argv) metricgroup__rblist_exit(&stat_config.metric_events); evlist__close_control(stat_config.ctl_fd, stat_config.ctl_fd_ack, &stat_c= onfig.ctl_fd_close); =20 + tpebs_data__delete(); + return status; } diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h index 7c24ed768ff3..ae788edef30f 100644 --- a/tools/perf/util/metricgroup.h +++ b/tools/perf/util/metricgroup.h @@ -68,10 +68,18 @@ struct metric_expr { =20 struct tpebs_event { struct list_head nd; + /* Event name */ const char *name; + /* Event name with the TPEBS modifier R */ const char *tpebs_name; }; =20 +struct tpebs_retire_lat { + struct tpebs_event event; + size_t count; + int sum; +}; + struct metric_event *metricgroup__lookup(struct rblist *metric_events, struct evsel *evsel, bool create); diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h index b987960df3c5..0726bdc06681 100644 --- a/tools/perf/util/stat.h +++ b/tools/perf/util/stat.h @@ -111,6 +111,8 @@ struct perf_stat_config { struct rblist metric_events; struct list_head tpebs_events; size_t tpebs_event_size; + struct list_head tpebs_results; + pid_t tpebs_pid; int ctl_fd; int ctl_fd_ack; bool ctl_fd_close; --=20 2.43.0