The ports metric group contains a metric for each port giving its
utilization as a ratio of cycles. The metrics are created by looking
for UOPS_DISPATCHED.PORT events.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)
diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index f4707e964f75..3ef4eb868580 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -1,12 +1,13 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
- JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
- MetricGroup, MetricRef, Select)
+ JsonEncodeMetricGroupDescriptions, Literal, LoadEvents,
+ Metric, MetricGroup, MetricRef, Select)
import argparse
import json
import math
import os
+import re
from typing import Optional
# Global command line arguments.
@@ -260,6 +261,33 @@ def IntelBr():
description="breakdown of retired branch instructions")
+def IntelPorts() -> Optional[MetricGroup]:
+ pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json"))
+
+ core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY",
+ "CPU_CLK_UNHALTED.DISTRIBUTED",
+ "cycles")
+ # Number of CPU cycles scaled for SMT.
+ smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles)
+
+ metrics = []
+ for x in pipeline_events:
+ if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]):
+ name = x["EventName"]
+ port = re.search(r"(PORT_[0-9].*)", name).group(0).lower()
+ if name.endswith("_CORE"):
+ cyc = core_cycles
+ else:
+ cyc = smt_cycles
+ metrics.append(Metric(port, f"{port} utilization (higher is better)",
+ d_ratio(Event(name), cyc), "100%"))
+ if len(metrics) == 0:
+ return None
+
+ return MetricGroup("ports", metrics, "functional unit (port) utilization -- "
+ "fraction of cycles each port is utilized (higher is better)")
+
+
def IntelSwpf() -> Optional[MetricGroup]:
ins = Event("instructions")
try:
@@ -352,6 +380,7 @@ def main() -> None:
Smi(),
Tsx(),
IntelBr(),
+ IntelPorts(),
IntelSwpf(),
])
--
2.46.1.824.gd892dcdcdd-goog
On 2024-09-26 1:50 p.m., Ian Rogers wrote: > The ports metric group contains a metric for each port giving its > utilization as a ratio of cycles. The metrics are created by looking > for UOPS_DISPATCHED.PORT events. > > Signed-off-by: Ian Rogers <irogers@google.com> > --- > tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++-- > 1 file changed, 31 insertions(+), 2 deletions(-) > > diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py > index f4707e964f75..3ef4eb868580 100755 > --- a/tools/perf/pmu-events/intel_metrics.py > +++ b/tools/perf/pmu-events/intel_metrics.py > @@ -1,12 +1,13 @@ > #!/usr/bin/env python3 > # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) > from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric, > - JsonEncodeMetricGroupDescriptions, LoadEvents, Metric, > - MetricGroup, MetricRef, Select) > + JsonEncodeMetricGroupDescriptions, Literal, LoadEvents, > + Metric, MetricGroup, MetricRef, Select) > import argparse > import json > import math > import os > +import re > from typing import Optional > > # Global command line arguments. > @@ -260,6 +261,33 @@ def IntelBr(): > description="breakdown of retired branch instructions") > > > +def IntelPorts() -> Optional[MetricGroup]: > + pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json")) > + > + core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY", > + "CPU_CLK_UNHALTED.DISTRIBUTED", > + "cycles") > + # Number of CPU cycles scaled for SMT. > + smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles) > + > + metrics = [] > + for x in pipeline_events: > + if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]): > + name = x["EventName"] > + port = re.search(r"(PORT_[0-9].*)", name).group(0).lower() > + if name.endswith("_CORE"): > + cyc = core_cycles > + else: > + cyc = smt_cycles > + metrics.append(Metric(port, f"{port} utilization (higher is better)", > + d_ratio(Event(name), cyc), "100%")) The generated metric highly depends on the event name, which is very fragile. We will probably have the same event in a new generation, but with a different name. Long-term maintenance could be a problem. Is there an idea regarding how to sync the event names for new generations? Maybe we should improve the event generation script and do an automatic check to tell which metrics are missed. Then we may decide if updating the new event name, dropping the metric or adding a different metric. Thanks, Kan > + if len(metrics) == 0: > + return None > + > + return MetricGroup("ports", metrics, "functional unit (port) utilization -- " > + "fraction of cycles each port is utilized (higher is better)") > + > + > def IntelSwpf() -> Optional[MetricGroup]: > ins = Event("instructions") > try: > @@ -352,6 +380,7 @@ def main() -> None: > Smi(), > Tsx(), > IntelBr(), > + IntelPorts(), > IntelSwpf(), > ]) >
On Thu, Nov 7, 2024 at 7:00 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > > > > On 2024-09-26 1:50 p.m., Ian Rogers wrote: > > The ports metric group contains a metric for each port giving its > > utilization as a ratio of cycles. The metrics are created by looking > > for UOPS_DISPATCHED.PORT events. > > > > Signed-off-by: Ian Rogers <irogers@google.com> > > --- > > tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++-- > > 1 file changed, 31 insertions(+), 2 deletions(-) > > > > diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py > > index f4707e964f75..3ef4eb868580 100755 > > --- a/tools/perf/pmu-events/intel_metrics.py > > +++ b/tools/perf/pmu-events/intel_metrics.py > > @@ -1,12 +1,13 @@ > > #!/usr/bin/env python3 > > # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) > > from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric, > > - JsonEncodeMetricGroupDescriptions, LoadEvents, Metric, > > - MetricGroup, MetricRef, Select) > > + JsonEncodeMetricGroupDescriptions, Literal, LoadEvents, > > + Metric, MetricGroup, MetricRef, Select) > > import argparse > > import json > > import math > > import os > > +import re > > from typing import Optional > > > > # Global command line arguments. > > @@ -260,6 +261,33 @@ def IntelBr(): > > description="breakdown of retired branch instructions") > > > > > > +def IntelPorts() -> Optional[MetricGroup]: > > + pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json")) > > + > > + core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY", > > + "CPU_CLK_UNHALTED.DISTRIBUTED", > > + "cycles") > > + # Number of CPU cycles scaled for SMT. > > + smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles) > > + > > + metrics = [] > > + for x in pipeline_events: > > + if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]): > > + name = x["EventName"] > > + port = re.search(r"(PORT_[0-9].*)", name).group(0).lower() > > + if name.endswith("_CORE"): > > + cyc = core_cycles > > + else: > > + cyc = smt_cycles > > + metrics.append(Metric(port, f"{port} utilization (higher is better)", > > + d_ratio(Event(name), cyc), "100%")) > > > The generated metric highly depends on the event name, which is very > fragile. We will probably have the same event in a new generation, but > with a different name. Long-term maintenance could be a problem. > Is there an idea regarding how to sync the event names for new generations? I agree with the idea that it is fragile, it is also strangely robust as you say, new generations will gain support if they follow the same naming convention. We have tests that load bearing metrics exists on our platforms so maybe the appropriate place to test for existence is in Weilin's metrics test. > Maybe we should improve the event generation script and do an automatic > check to tell which metrics are missed. Then we may decide if updating > the new event name, dropping the metric or adding a different metric. So I'm not sure it is a bug to not have the metric, if it were we could just throw rather than return None. We're going to run the script for every model including old models like nehalem, so I've generally kept it as None. I think doing future work on testing is probably best. It would also indicate use of the metric if people notice it missing (not that the script aims for that :-) ). Thanks, Ian > Thanks, > Kan > > > + if len(metrics) == 0: > > + return None > > + > > + return MetricGroup("ports", metrics, "functional unit (port) utilization -- " > > + "fraction of cycles each port is utilized (higher is better)") > > + > > + > > def IntelSwpf() -> Optional[MetricGroup]: > > ins = Event("instructions") > > try: > > @@ -352,6 +380,7 @@ def main() -> None: > > Smi(), > > Tsx(), > > IntelBr(), > > + IntelPorts(), > > IntelSwpf(), > > ]) > > >
On 2024-11-07 12:12 p.m., Ian Rogers wrote: > On Thu, Nov 7, 2024 at 7:00 AM Liang, Kan <kan.liang@linux.intel.com> wrote: >> >> >> On 2024-09-26 1:50 p.m., Ian Rogers wrote: >>> The ports metric group contains a metric for each port giving its >>> utilization as a ratio of cycles. The metrics are created by looking >>> for UOPS_DISPATCHED.PORT events. >>> >>> Signed-off-by: Ian Rogers <irogers@google.com> >>> --- >>> tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++-- >>> 1 file changed, 31 insertions(+), 2 deletions(-) >>> >>> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py >>> index f4707e964f75..3ef4eb868580 100755 >>> --- a/tools/perf/pmu-events/intel_metrics.py >>> +++ b/tools/perf/pmu-events/intel_metrics.py >>> @@ -1,12 +1,13 @@ >>> #!/usr/bin/env python3 >>> # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) >>> from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric, >>> - JsonEncodeMetricGroupDescriptions, LoadEvents, Metric, >>> - MetricGroup, MetricRef, Select) >>> + JsonEncodeMetricGroupDescriptions, Literal, LoadEvents, >>> + Metric, MetricGroup, MetricRef, Select) >>> import argparse >>> import json >>> import math >>> import os >>> +import re >>> from typing import Optional >>> >>> # Global command line arguments. >>> @@ -260,6 +261,33 @@ def IntelBr(): >>> description="breakdown of retired branch instructions") >>> >>> >>> +def IntelPorts() -> Optional[MetricGroup]: >>> + pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json")) >>> + >>> + core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY", >>> + "CPU_CLK_UNHALTED.DISTRIBUTED", >>> + "cycles") >>> + # Number of CPU cycles scaled for SMT. >>> + smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles) >>> + >>> + metrics = [] >>> + for x in pipeline_events: >>> + if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]): >>> + name = x["EventName"] >>> + port = re.search(r"(PORT_[0-9].*)", name).group(0).lower() >>> + if name.endswith("_CORE"): >>> + cyc = core_cycles >>> + else: >>> + cyc = smt_cycles >>> + metrics.append(Metric(port, f"{port} utilization (higher is better)", >>> + d_ratio(Event(name), cyc), "100%")) >> >> The generated metric highly depends on the event name, which is very >> fragile. We will probably have the same event in a new generation, but >> with a different name. Long-term maintenance could be a problem. >> Is there an idea regarding how to sync the event names for new generations? > I agree with the idea that it is fragile, it is also strangely robust > as you say, new generations will gain support if they follow the same > naming convention. We have tests that load bearing metrics exists on > our platforms so maybe the appropriate place to test for existence is > in Weilin's metrics test. > > >> Maybe we should improve the event generation script and do an automatic >> check to tell which metrics are missed. Then we may decide if updating >> the new event name, dropping the metric or adding a different metric. > So I'm not sure it is a bug to not have the metric, if it were we > could just throw rather than return None. We're going to run the > script for every model including old models like nehalem, so I've > generally kept it as None. I think doing future work on testing is > probably best. It would also indicate use of the metric if people > notice it missing (not that the script aims for that 🙂 ). The maintenance is still a concern, even if we have a way to test it out. There is already an "official" metric published in GitHub, which is maintained by Intel. To be honest, I don't think there is more energy to maintain these "non-official" metrics. I don't think it should be a bug without these metrics. So it's very likely that the issue will not be addressed right away. If we cannot keep these metrics updated for the future platforms, I couldn't find a reason to have them. Thanks, Kan
On Thu, Nov 7, 2024 at 11:36 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > > On 2024-11-07 12:12 p.m., Ian Rogers wrote: > > On Thu, Nov 7, 2024 at 7:00 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > >> > >> > >> On 2024-09-26 1:50 p.m., Ian Rogers wrote: > >>> The ports metric group contains a metric for each port giving its > >>> utilization as a ratio of cycles. The metrics are created by looking > >>> for UOPS_DISPATCHED.PORT events. > >>> > >>> Signed-off-by: Ian Rogers <irogers@google.com> > >>> --- > >>> tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++-- > >>> 1 file changed, 31 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py > >>> index f4707e964f75..3ef4eb868580 100755 > >>> --- a/tools/perf/pmu-events/intel_metrics.py > >>> +++ b/tools/perf/pmu-events/intel_metrics.py > >>> @@ -1,12 +1,13 @@ > >>> #!/usr/bin/env python3 > >>> # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) > >>> from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric, > >>> - JsonEncodeMetricGroupDescriptions, LoadEvents, Metric, > >>> - MetricGroup, MetricRef, Select) > >>> + JsonEncodeMetricGroupDescriptions, Literal, LoadEvents, > >>> + Metric, MetricGroup, MetricRef, Select) > >>> import argparse > >>> import json > >>> import math > >>> import os > >>> +import re > >>> from typing import Optional > >>> > >>> # Global command line arguments. > >>> @@ -260,6 +261,33 @@ def IntelBr(): > >>> description="breakdown of retired branch instructions") > >>> > >>> > >>> +def IntelPorts() -> Optional[MetricGroup]: > >>> + pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json")) > >>> + > >>> + core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY", > >>> + "CPU_CLK_UNHALTED.DISTRIBUTED", > >>> + "cycles") > >>> + # Number of CPU cycles scaled for SMT. > >>> + smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles) > >>> + > >>> + metrics = [] > >>> + for x in pipeline_events: > >>> + if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]): > >>> + name = x["EventName"] > >>> + port = re.search(r"(PORT_[0-9].*)", name).group(0).lower() > >>> + if name.endswith("_CORE"): > >>> + cyc = core_cycles > >>> + else: > >>> + cyc = smt_cycles > >>> + metrics.append(Metric(port, f"{port} utilization (higher is better)", > >>> + d_ratio(Event(name), cyc), "100%")) > >> > >> The generated metric highly depends on the event name, which is very > >> fragile. We will probably have the same event in a new generation, but > >> with a different name. Long-term maintenance could be a problem. > >> Is there an idea regarding how to sync the event names for new generations? > > I agree with the idea that it is fragile, it is also strangely robust > > as you say, new generations will gain support if they follow the same > > naming convention. We have tests that load bearing metrics exists on > > our platforms so maybe the appropriate place to test for existence is > > in Weilin's metrics test. > > > > > >> Maybe we should improve the event generation script and do an automatic > >> check to tell which metrics are missed. Then we may decide if updating > >> the new event name, dropping the metric or adding a different metric. > > So I'm not sure it is a bug to not have the metric, if it were we > > could just throw rather than return None. We're going to run the > > script for every model including old models like nehalem, so I've > > generally kept it as None. I think doing future work on testing is > > probably best. It would also indicate use of the metric if people > > notice it missing (not that the script aims for that 🙂 ). > > The maintenance is still a concern, even if we have a way to test it > out. There is already an "official" metric published in GitHub, which is > maintained by Intel. To be honest, I don't think there is more energy to > maintain these "non-official" metrics. > > I don't think it should be a bug without these metrics. So it's very > likely that the issue will not be addressed right away. If we cannot > keep these metrics updated for the future platforms, I couldn't find a > reason to have them. So I think there are a few things: 1) I'd like there to be a non-json infrastructure for events that can handle multiple models. Some failings of json are its inability to validate events, long lines, lack of comments, metric expression strings that aren't inherently sound, etc. I'd like to make it so we can have json metrics for everything, ie remove the hardcoded metrics that play badly with event sharing, etc. Doing this by updating every json would be tedious and excessively noisy. 2) There are "official" metrics from Intel and I've worked for the establishment of that. That doesn't mean every Intel metric is in the official metrics. Servers are better served than client machines. Core TMA metrics are well served but uncore less so. 3) Are perf metrics perfect with some kind of warranty? Well no, as your reviews in this thread have shown SMI cost on hybrid is likely broken. We don't intentionally try to have broken metrics and fix them asap when they come up. GPLv2 has an explicit "no warranty" section. Now Intel have experimental and non-experimental events, we update the comments of metrics using those to highlight that the underlying events are experimental: https://github.com/googleprodkernel/linux-perf/blob/google_tools_master/tools/perf/pmu-events/metric.py#L598 If there are bugs in the metrics then open source, sharing and fixing benefits everyone. 4) Am I looking for energy from Intel to maintain these metrics? No. I'm trying to stop carrying the patches just inside of Google's tree. Thanks, Ian
On 2024-11-07 4:00 p.m., Ian Rogers wrote: > On Thu, Nov 7, 2024 at 11:36 AM Liang, Kan <kan.liang@linux.intel.com> wrote: >> On 2024-11-07 12:12 p.m., Ian Rogers wrote: >>> On Thu, Nov 7, 2024 at 7:00 AM Liang, Kan <kan.liang@linux.intel.com> wrote: >>>> >>>> On 2024-09-26 1:50 p.m., Ian Rogers wrote: >>>>> The ports metric group contains a metric for each port giving its >>>>> utilization as a ratio of cycles. The metrics are created by looking >>>>> for UOPS_DISPATCHED.PORT events. >>>>> >>>>> Signed-off-by: Ian Rogers <irogers@google.com> >>>>> --- >>>>> tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++-- >>>>> 1 file changed, 31 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py >>>>> index f4707e964f75..3ef4eb868580 100755 >>>>> --- a/tools/perf/pmu-events/intel_metrics.py >>>>> +++ b/tools/perf/pmu-events/intel_metrics.py >>>>> @@ -1,12 +1,13 @@ >>>>> #!/usr/bin/env python3 >>>>> # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) >>>>> from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric, >>>>> - JsonEncodeMetricGroupDescriptions, LoadEvents, Metric, >>>>> - MetricGroup, MetricRef, Select) >>>>> + JsonEncodeMetricGroupDescriptions, Literal, LoadEvents, >>>>> + Metric, MetricGroup, MetricRef, Select) >>>>> import argparse >>>>> import json >>>>> import math >>>>> import os >>>>> +import re >>>>> from typing import Optional >>>>> >>>>> # Global command line arguments. >>>>> @@ -260,6 +261,33 @@ def IntelBr(): >>>>> description="breakdown of retired branch instructions") >>>>> >>>>> >>>>> +def IntelPorts() -> Optional[MetricGroup]: >>>>> + pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json")) >>>>> + >>>>> + core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY", >>>>> + "CPU_CLK_UNHALTED.DISTRIBUTED", >>>>> + "cycles") >>>>> + # Number of CPU cycles scaled for SMT. >>>>> + smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles) >>>>> + >>>>> + metrics = [] >>>>> + for x in pipeline_events: >>>>> + if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]): >>>>> + name = x["EventName"] >>>>> + port = re.search(r"(PORT_[0-9].*)", name).group(0).lower() >>>>> + if name.endswith("_CORE"): >>>>> + cyc = core_cycles >>>>> + else: >>>>> + cyc = smt_cycles >>>>> + metrics.append(Metric(port, f"{port} utilization (higher is better)", >>>>> + d_ratio(Event(name), cyc), "100%")) >>>> The generated metric highly depends on the event name, which is very >>>> fragile. We will probably have the same event in a new generation, but >>>> with a different name. Long-term maintenance could be a problem. >>>> Is there an idea regarding how to sync the event names for new generations? >>> I agree with the idea that it is fragile, it is also strangely robust >>> as you say, new generations will gain support if they follow the same >>> naming convention. We have tests that load bearing metrics exists on >>> our platforms so maybe the appropriate place to test for existence is >>> in Weilin's metrics test. >>> >>> >>>> Maybe we should improve the event generation script and do an automatic >>>> check to tell which metrics are missed. Then we may decide if updating >>>> the new event name, dropping the metric or adding a different metric. >>> So I'm not sure it is a bug to not have the metric, if it were we >>> could just throw rather than return None. We're going to run the >>> script for every model including old models like nehalem, so I've >>> generally kept it as None. I think doing future work on testing is >>> probably best. It would also indicate use of the metric if people >>> notice it missing (not that the script aims for that 🙂 ). >> The maintenance is still a concern, even if we have a way to test it >> out. There is already an "official" metric published in GitHub, which is >> maintained by Intel. To be honest, I don't think there is more energy to >> maintain these "non-official" metrics. >> >> I don't think it should be a bug without these metrics. So it's very >> likely that the issue will not be addressed right away. If we cannot >> keep these metrics updated for the future platforms, I couldn't find a >> reason to have them. > So I think there are a few things: > 1) I'd like there to be a non-json infrastructure for events that can > handle multiple models. Some failings of json are its inability to > validate events, long lines, lack of comments, metric expression > strings that aren't inherently sound, etc. I'd like to make it so we > can have json metrics for everything, ie remove the hardcoded metrics > that play badly with event sharing, etc. Doing this by updating every > json would be tedious and excessively noisy. > 2) There are "official" metrics from Intel and I've worked for the > establishment of that. That doesn't mean every Intel metric is in the > official metrics. Servers are better served than client machines. Core > TMA metrics are well served but uncore less so. > 3) Are perf metrics perfect with some kind of warranty? Well no, as > your reviews in this thread have shown SMI cost on hybrid is likely > broken. We don't intentionally try to have broken metrics and fix them > asap when they come up. GPLv2 has an explicit "no warranty" section. > Now Intel have experimental and non-experimental events, we update the > comments of metrics using those to highlight that the underlying > events are experimental: > https://github.com/googleprodkernel/linux-perf/blob/google_tools_master/ > tools/perf/pmu-events/metric.py#L598 > If there are bugs in the metrics then open source, sharing and fixing > benefits everyone. > 4) Am I looking for energy from Intel to maintain these metrics? No. > I'm trying to stop carrying the patches just inside of Google's tree. Got it. Thanks for the clarification. IIUC, the generated metrics are based on the best knowledge of contributors. The initial source is from Google's tree. It should be the contributor's responsibility to maintain or update the metrics. If so, I agree that it should be a good supplement. I would have some comments in general. - I think we need a way to distinguish these metrics, e.g., add a dedicated prefix. I will not be surprised if some customers bring the metrics to Intel or other vendor's customer service, and ask why it doesn't work on some platforms. I don't think they will get any useful information there. The best way is to report any issues here. So we can fix and update the metric. - All events, no matter whether they are from the JSON file or exposed by the kernel, have to be checked before showing the metrics. Because we cannot guarantee the availability of an event, even for the architectural events. We may introduce a wrapper for the Metric() to check all the involved events. So we don't need to add the try: except thing in each patch? - In the perf test, it's better to ignore the errors from these metrics. So it doesn't block things. But we can show a warning from them with -vvv. The issue can still be found and fixed. Thanks, Kan
© 2016 - 2024 Red Hat, Inc.