From: Kyle Meyer <kyle.meyer@hpe.com>
Systems have surpassed 2048 CPUs. Increase MAX_NR_CPUS to 4096.
Bitmaps declared with MAX_NR_CPUS bits will increase from 256B to 512B,
cpus_runtime will increase from 81960B to 163880B, and max_entries will
increase from 8192B to 16384B.
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Reviewed-by: Ian Rogers <irogers@google.com>
---
tools/lib/perf/include/internal/cpumap.h | 2 +-
tools/perf/perf.h | 2 +-
tools/perf/util/bpf_skel/kwork_top.bpf.c | 4 +++-
3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/tools/lib/perf/include/internal/cpumap.h b/tools/lib/perf/include/internal/cpumap.h
index 49649eb51ce4..3cf28522004e 100644
--- a/tools/lib/perf/include/internal/cpumap.h
+++ b/tools/lib/perf/include/internal/cpumap.h
@@ -22,7 +22,7 @@ DECLARE_RC_STRUCT(perf_cpu_map) {
};
#ifndef MAX_NR_CPUS
-#define MAX_NR_CPUS 2048
+#define MAX_NR_CPUS 4096
#endif
struct perf_cpu_map *perf_cpu_map__alloc(int nr_cpus);
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index c004dd4e65a3..3cb40965549f 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -3,7 +3,7 @@
#define _PERF_PERF_H
#ifndef MAX_NR_CPUS
-#define MAX_NR_CPUS 2048
+#define MAX_NR_CPUS 4096
#endif
enum perf_affinity {
diff --git a/tools/perf/util/bpf_skel/kwork_top.bpf.c b/tools/perf/util/bpf_skel/kwork_top.bpf.c
index 594da91965a2..73e32e063030 100644
--- a/tools/perf/util/bpf_skel/kwork_top.bpf.c
+++ b/tools/perf/util/bpf_skel/kwork_top.bpf.c
@@ -18,7 +18,9 @@ enum kwork_class_type {
};
#define MAX_ENTRIES 102400
-#define MAX_NR_CPUS 2048
+#ifndef MAX_NR_CPUS
+#define MAX_NR_CPUS 4096
+#endif
#define PF_KTHREAD 0x00200000
#define MAX_COMMAND_LEN 16
--
2.47.0.338.g60cca15819-goog
Hi Ian,
On Thu, Dec 05, 2024 at 08:40:28PM -0800, Ian Rogers wrote:
>
> From: Kyle Meyer <kyle.meyer@hpe.com>
>
> Systems have surpassed 2048 CPUs. Increase MAX_NR_CPUS to 4096.
>
> Bitmaps declared with MAX_NR_CPUS bits will increase from 256B to 512B,
> cpus_runtime will increase from 81960B to 163880B, and max_entries will
> increase from 8192B to 16384B.
>
> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> Reviewed-by: Ian Rogers <irogers@google.com>
> ---
> tools/lib/perf/include/internal/cpumap.h | 2 +-
> tools/perf/perf.h | 2 +-
> tools/perf/util/bpf_skel/kwork_top.bpf.c | 4 +++-
> 3 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/tools/lib/perf/include/internal/cpumap.h b/tools/lib/perf/include/internal/cpumap.h
> index 49649eb51ce4..3cf28522004e 100644
> --- a/tools/lib/perf/include/internal/cpumap.h
> +++ b/tools/lib/perf/include/internal/cpumap.h
> @@ -22,7 +22,7 @@ DECLARE_RC_STRUCT(perf_cpu_map) {
> };
>
> #ifndef MAX_NR_CPUS
> -#define MAX_NR_CPUS 2048
> +#define MAX_NR_CPUS 4096
> #endif
This series is fine for me. Just wandering if we can use a central
place to maintain the macro, e.g. lib/perf/include/perf/cpumap.h. It
is pointless to define exactly same macros in different headers. As
least, I think we can unify this except the kwork bpf program?
P.s. for dynamically allocating per CPU maps in eBPF program, we can
refer to the code samples/bpf/xdp_sample_user.c, but this is another
topic.
Thanks,
Leo
> struct perf_cpu_map *perf_cpu_map__alloc(int nr_cpus);
> diff --git a/tools/perf/perf.h b/tools/perf/perf.h
> index c004dd4e65a3..3cb40965549f 100644
> --- a/tools/perf/perf.h
> +++ b/tools/perf/perf.h
> @@ -3,7 +3,7 @@
> #define _PERF_PERF_H
>
> #ifndef MAX_NR_CPUS
> -#define MAX_NR_CPUS 2048
> +#define MAX_NR_CPUS 4096
> #endif
>
> enum perf_affinity {
> diff --git a/tools/perf/util/bpf_skel/kwork_top.bpf.c b/tools/perf/util/bpf_skel/kwork_top.bpf.c
> index 594da91965a2..73e32e063030 100644
> --- a/tools/perf/util/bpf_skel/kwork_top.bpf.c
> +++ b/tools/perf/util/bpf_skel/kwork_top.bpf.c
> @@ -18,7 +18,9 @@ enum kwork_class_type {
> };
>
> #define MAX_ENTRIES 102400
> -#define MAX_NR_CPUS 2048
> +#ifndef MAX_NR_CPUS
> +#define MAX_NR_CPUS 4096
> +#endif
> #define PF_KTHREAD 0x00200000
> #define MAX_COMMAND_LEN 16
>
> --
> 2.47.0.338.g60cca15819-goog
>
>
On Fri, Dec 6, 2024 at 2:24 AM Leo Yan <leo.yan@arm.com> wrote:
>
> Hi Ian,
>
> On Thu, Dec 05, 2024 at 08:40:28PM -0800, Ian Rogers wrote:
> >
> > From: Kyle Meyer <kyle.meyer@hpe.com>
> >
> > Systems have surpassed 2048 CPUs. Increase MAX_NR_CPUS to 4096.
> >
> > Bitmaps declared with MAX_NR_CPUS bits will increase from 256B to 512B,
> > cpus_runtime will increase from 81960B to 163880B, and max_entries will
> > increase from 8192B to 16384B.
> >
> > Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> > Reviewed-by: Ian Rogers <irogers@google.com>
> > ---
> > tools/lib/perf/include/internal/cpumap.h | 2 +-
> > tools/perf/perf.h | 2 +-
> > tools/perf/util/bpf_skel/kwork_top.bpf.c | 4 +++-
> > 3 files changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/lib/perf/include/internal/cpumap.h b/tools/lib/perf/include/internal/cpumap.h
> > index 49649eb51ce4..3cf28522004e 100644
> > --- a/tools/lib/perf/include/internal/cpumap.h
> > +++ b/tools/lib/perf/include/internal/cpumap.h
> > @@ -22,7 +22,7 @@ DECLARE_RC_STRUCT(perf_cpu_map) {
> > };
> >
> > #ifndef MAX_NR_CPUS
> > -#define MAX_NR_CPUS 2048
> > +#define MAX_NR_CPUS 4096
> > #endif
>
> This series is fine for me. Just wandering if we can use a central
> place to maintain the macro, e.g. lib/perf/include/perf/cpumap.h. It
> is pointless to define exactly same macros in different headers. As
> least, I think we can unify this except the kwork bpf program?
>
> P.s. for dynamically allocating per CPU maps in eBPF program, we can
> refer to the code samples/bpf/xdp_sample_user.c, but this is another
> topic.
Thanks Leo,
can I take this as an acked-by? Wrt a single constant I agree,
following these changes MAX_NR_CPUS is just used for a warning in
libperf's cpumap.c. I think we're agreed that getting rid of the
constant would be best. I also think the cpumap logic is duplicating
something that libc is providing in cpu_set:
https://man7.org/linux/man-pages/man3/CPU_SET.3.html
And we have more than one representation in perf for the sake of the
disk representation:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/event.h?h=perf-tools-next#n227
Just changing the int to be a s16 would lower the memory overhead,
which is why I'd kind of like the abstraction to be minimal.
Thanks,
Ian
Hi Ian, On Fri, Dec 06, 2024 at 08:25:06AM -0800, Ian Rogers wrote: [...] > > This series is fine for me. Just wandering if we can use a central > > place to maintain the macro, e.g. lib/perf/include/perf/cpumap.h. It > > is pointless to define exactly same macros in different headers. As > > least, I think we can unify this except the kwork bpf program? > > > > P.s. for dynamically allocating per CPU maps in eBPF program, we can > > refer to the code samples/bpf/xdp_sample_user.c, but this is another > > topic. > > Thanks Leo, > > can I take this as an acked-by? Yeah. I will give my review tags in the cover letter. > Wrt a single constant I agree, > following these changes MAX_NR_CPUS is just used for a warning in > libperf's cpumap.c. I think we're agreed that getting rid of the > constant would be best. I also think the cpumap logic is duplicating > something that libc is providing in cpu_set. > > And we have more than one representation in perf for the sake of the > disk representation: Thanks for sharing the info. > Just changing the int to be a s16 would lower the memory overhead, > which is why I'd kind of like the abstraction to be minimal. Here I am not clear what for "changing the int to be a s16". Could you elaberate a bit for this? Lastly, I also found multiple files use "MAX_CPUS" rather than "MAX_NR_CPUS". Polish them in a new series? Thanks, Leo
On Fri, Dec 6, 2024 at 3:03 PM Leo Yan <leo.yan@arm.com> wrote: > > Hi Ian, > > On Fri, Dec 06, 2024 at 08:25:06AM -0800, Ian Rogers wrote: > > [...] > > > > This series is fine for me. Just wandering if we can use a central > > > place to maintain the macro, e.g. lib/perf/include/perf/cpumap.h. It > > > is pointless to define exactly same macros in different headers. As > > > least, I think we can unify this except the kwork bpf program? > > > > > > P.s. for dynamically allocating per CPU maps in eBPF program, we can > > > refer to the code samples/bpf/xdp_sample_user.c, but this is another > > > topic. > > > > Thanks Leo, > > > > can I take this as an acked-by? > > Yeah. I will give my review tags in the cover letter. > > > Wrt a single constant I agree, > > following these changes MAX_NR_CPUS is just used for a warning in > > libperf's cpumap.c. I think we're agreed that getting rid of the > > constant would be best. I also think the cpumap logic is duplicating > > something that libc is providing in cpu_set. > > > > And we have more than one representation in perf for the sake of the > > disk representation: > > Thanks for sharing the info. > > > Just changing the int to be a s16 would lower the memory overhead, > > which is why I'd kind of like the abstraction to be minimal. > > Here I am not clear what for "changing the int to be a s16". Could you > elaberate a bit for this? I meant this :-) https://lore.kernel.org/lkml/20241207052133.102829-1-irogers@google.com/ > Lastly, I also found multiple files use "MAX_CPUS" rather than > "MAX_NR_CPUS". Polish them in a new series? Makes sense. Thanks, Ian
.. > > > Just changing the int to be a s16 would lower the memory overhead, > > > which is why I'd kind of like the abstraction to be minimal. > > > > Here I am not clear what for "changing the int to be a s16". Could you > > elaberate a bit for this? > > I meant this :-) > https://lore.kernel.org/lkml/20241207052133.102829-1-irogers@google.com/ How many time is this allocated? If it is 2 bytes in a larger structure it is likely to be noise. For a local the code is likely to be worse. Any maths and you start forcing the compiler to mask the value (on pretty much anything except x86). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Mon, Dec 9, 2024 at 1:36 PM David Laight <David.Laight@aculab.com> wrote: > > .. > > > > Just changing the int to be a s16 would lower the memory overhead, > > > > which is why I'd kind of like the abstraction to be minimal. > > > > > > Here I am not clear what for "changing the int to be a s16". Could you > > > elaberate a bit for this? > > > > I meant this :-) > > https://lore.kernel.org/lkml/20241207052133.102829-1-irogers@google.com/ > > How many time is this allocated? > If it is 2 bytes in a larger structure it is likely to be noise. > For a local the code is likely to be worse. > Any maths and you start forcing the compiler to mask the value > (on pretty much anything except x86). So the data structure is a sorted array of ints, this changes it to int16s. On the 32 socket GNR with > 2048 logical CPUs, the array would be over 8kb before and 4kb after for all online CPUs. On my more modest desktop with 72 logical cores the size goes from 288 bytes down to 144, a reduction of 2 cache lines. I'm not super excited about the memory savings, but the patch is only 8 lines in difference. Thanks, Ian
© 2016 - 2025 Red Hat, Inc.