[PATCH v2 0/6] Corrections to cpu map event encoding

Ian Rogers posted 6 patches 3 years, 10 months ago
tools/lib/perf/cpumap.c              |   2 +-
tools/lib/perf/include/perf/cpumap.h |   2 +-
tools/lib/perf/include/perf/event.h  |  61 ++++++++-
tools/perf/tests/cpumap.c            |  71 ++++++++---
tools/perf/tests/event_update.c      |  14 +--
tools/perf/util/cpumap.c             | 111 +++++++++++++---
tools/perf/util/cpumap.h             |   4 +-
tools/perf/util/event.h              |   4 -
tools/perf/util/header.c             |  24 ++--
tools/perf/util/session.c            |  35 +++---
tools/perf/util/synthetic-events.c   | 182 +++++++++++++--------------
tools/perf/util/synthetic-events.h   |   2 +-
12 files changed, 327 insertions(+), 185 deletions(-)
[PATCH v2 0/6] Corrections to cpu map event encoding
Posted by Ian Rogers 3 years, 10 months ago
A mask encoding of a cpu map is laid out as:
  u16 nr
  u16 long_size
  unsigned long mask[];
However, the mask may be 8-byte aligned meaning there is a 4-byte pad
after long_size. This means 32-bit and 64-bit builds see the mask as
being at different offsets. On top of this the structure is in the byte
data[] encoded as:
  u16 type
  char data[]
This means the mask's struct isn't the required 4 or 8 byte aligned, but
is offset by 2. Consequently the long reads and writes are causing
undefined behavior as the alignment is broken.

These changes do minor clean up with const, visibility of functions
and using the constant time max function. It then adds 32 and 64-bit
mask encoding variants, packed to match current alignment. Taking the
address of a packed struct leads to unaligned data, so function
arguments are altered to be passed the packed struct. To compact the
mask encoding further and drop the padding, the 4-byte variant is
preferred. Finally a new range encoding is added, that reduces the
size of the common case of a range of CPUs to a single u64.

On a 72 CPU (hyperthread) machine the original encoding of all CPUs is:
0x9a98 [0x28]: event: 74
.
. ... raw event: size 40 bytes
.  0000:  4a 00 00 00 00 00 28 00 01 00 02 00 08 00 00 00  J.....(.........
.  0010:  00 00 ff ff ff ff ff ff ff ff ff 00 00 00 00 00  ................
.  0020:  00 00 00 00 00 00 00 00                          ........        

0 0 0x9a98 [0x28]: PERF_RECORD_CPU_MAP

Using the 4-byte encoding it is:
0x9a98@pipe [0x20]: event: 74
.
. ... raw event: size 32 bytes
.  0000:  4a 00 00 00 00 00 20 00 01 00 03 00 04 00 ff ff  J..... .........
.  0010:  ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00  ................

0 0 0x9a98 [0x20]: PERF_RECORD_CPU_MAP

Finally, with the range encoding it is:
0x9ab8@pipe [0x10]: event: 74
.
. ... raw event: size 16 bytes
.  0000:  4a 00 00 00 00 00 10 00 02 00 00 00 00 00 47 00  J.............G.

0 0 0x9ab8 [0x10]: PERF_RECORD_CPU_MAP

v2. Fixes a bug in the size computation of the update header
    introduced by the last patch (Add range data encoding) and caught
    by address sanitizer.

Ian Rogers (6):
  perf cpumap: Const map for max
  perf cpumap: Synthetic events and const/static
  perf cpumap: Compute mask size in constant time
  perf cpumap: Fix alignment for masks in event encoding
  perf events: Prefer union over variable length array
  perf cpumap: Add range data encoding

 tools/lib/perf/cpumap.c              |   2 +-
 tools/lib/perf/include/perf/cpumap.h |   2 +-
 tools/lib/perf/include/perf/event.h  |  61 ++++++++-
 tools/perf/tests/cpumap.c            |  71 ++++++++---
 tools/perf/tests/event_update.c      |  14 +--
 tools/perf/util/cpumap.c             | 111 +++++++++++++---
 tools/perf/util/cpumap.h             |   4 +-
 tools/perf/util/event.h              |   4 -
 tools/perf/util/header.c             |  24 ++--
 tools/perf/util/session.c            |  35 +++---
 tools/perf/util/synthetic-events.c   | 182 +++++++++++++--------------
 tools/perf/util/synthetic-events.h   |   2 +-
 12 files changed, 327 insertions(+), 185 deletions(-)

-- 
2.36.1.476.g0c4daa206d-goog
Re: [PATCH v2 0/6] Corrections to cpu map event encoding
Posted by Ian Rogers 3 years, 9 months ago
On Tue, Jun 14, 2022 at 7:33 AM Ian Rogers <irogers@google.com> wrote:
>
> A mask encoding of a cpu map is laid out as:
>   u16 nr
>   u16 long_size
>   unsigned long mask[];
> However, the mask may be 8-byte aligned meaning there is a 4-byte pad
> after long_size. This means 32-bit and 64-bit builds see the mask as
> being at different offsets. On top of this the structure is in the byte
> data[] encoded as:
>   u16 type
>   char data[]
> This means the mask's struct isn't the required 4 or 8 byte aligned, but
> is offset by 2. Consequently the long reads and writes are causing
> undefined behavior as the alignment is broken.
>
> These changes do minor clean up with const, visibility of functions
> and using the constant time max function. It then adds 32 and 64-bit
> mask encoding variants, packed to match current alignment. Taking the
> address of a packed struct leads to unaligned data, so function
> arguments are altered to be passed the packed struct. To compact the
> mask encoding further and drop the padding, the 4-byte variant is
> preferred. Finally a new range encoding is added, that reduces the
> size of the common case of a range of CPUs to a single u64.
>
> On a 72 CPU (hyperthread) machine the original encoding of all CPUs is:
> 0x9a98 [0x28]: event: 74
> .
> . ... raw event: size 40 bytes
> .  0000:  4a 00 00 00 00 00 28 00 01 00 02 00 08 00 00 00  J.....(.........
> .  0010:  00 00 ff ff ff ff ff ff ff ff ff 00 00 00 00 00  ................
> .  0020:  00 00 00 00 00 00 00 00                          ........
>
> 0 0 0x9a98 [0x28]: PERF_RECORD_CPU_MAP
>
> Using the 4-byte encoding it is:
> 0x9a98@pipe [0x20]: event: 74
> .
> . ... raw event: size 32 bytes
> .  0000:  4a 00 00 00 00 00 20 00 01 00 03 00 04 00 ff ff  J..... .........
> .  0010:  ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00  ................
>
> 0 0 0x9a98 [0x20]: PERF_RECORD_CPU_MAP
>
> Finally, with the range encoding it is:
> 0x9ab8@pipe [0x10]: event: 74
> .
> . ... raw event: size 16 bytes
> .  0000:  4a 00 00 00 00 00 10 00 02 00 00 00 00 00 47 00  J.............G.
>
> 0 0 0x9ab8 [0x10]: PERF_RECORD_CPU_MAP
>
> v2. Fixes a bug in the size computation of the update header
>     introduced by the last patch (Add range data encoding) and caught
>     by address sanitizer.
>
> Ian Rogers (6):
>   perf cpumap: Const map for max
>   perf cpumap: Synthetic events and const/static
>   perf cpumap: Compute mask size in constant time
>   perf cpumap: Fix alignment for masks in event encoding
>   perf events: Prefer union over variable length array
>   perf cpumap: Add range data encoding

Ping. There was some feedback on this change but nothing to create a
v3. Feedback/acked-by/reviewed-by appreciated.

Thanks,
Ian

>  tools/lib/perf/cpumap.c              |   2 +-
>  tools/lib/perf/include/perf/cpumap.h |   2 +-
>  tools/lib/perf/include/perf/event.h  |  61 ++++++++-
>  tools/perf/tests/cpumap.c            |  71 ++++++++---
>  tools/perf/tests/event_update.c      |  14 +--
>  tools/perf/util/cpumap.c             | 111 +++++++++++++---
>  tools/perf/util/cpumap.h             |   4 +-
>  tools/perf/util/event.h              |   4 -
>  tools/perf/util/header.c             |  24 ++--
>  tools/perf/util/session.c            |  35 +++---
>  tools/perf/util/synthetic-events.c   | 182 +++++++++++++--------------
>  tools/perf/util/synthetic-events.h   |   2 +-
>  12 files changed, 327 insertions(+), 185 deletions(-)
>
> --
> 2.36.1.476.g0c4daa206d-goog
>
Re: [PATCH v2 0/6] Corrections to cpu map event encoding
Posted by Ian Rogers 3 years, 9 months ago
On Fri, Jul 15, 2022 at 10:01 AM Ian Rogers <irogers@google.com> wrote:
>
> On Tue, Jun 14, 2022 at 7:33 AM Ian Rogers <irogers@google.com> wrote:
> >
> > A mask encoding of a cpu map is laid out as:
> >   u16 nr
> >   u16 long_size
> >   unsigned long mask[];
> > However, the mask may be 8-byte aligned meaning there is a 4-byte pad
> > after long_size. This means 32-bit and 64-bit builds see the mask as
> > being at different offsets. On top of this the structure is in the byte
> > data[] encoded as:
> >   u16 type
> >   char data[]
> > This means the mask's struct isn't the required 4 or 8 byte aligned, but
> > is offset by 2. Consequently the long reads and writes are causing
> > undefined behavior as the alignment is broken.
> >
> > These changes do minor clean up with const, visibility of functions
> > and using the constant time max function. It then adds 32 and 64-bit
> > mask encoding variants, packed to match current alignment. Taking the
> > address of a packed struct leads to unaligned data, so function
> > arguments are altered to be passed the packed struct. To compact the
> > mask encoding further and drop the padding, the 4-byte variant is
> > preferred. Finally a new range encoding is added, that reduces the
> > size of the common case of a range of CPUs to a single u64.
> >
> > On a 72 CPU (hyperthread) machine the original encoding of all CPUs is:
> > 0x9a98 [0x28]: event: 74
> > .
> > . ... raw event: size 40 bytes
> > .  0000:  4a 00 00 00 00 00 28 00 01 00 02 00 08 00 00 00  J.....(.........
> > .  0010:  00 00 ff ff ff ff ff ff ff ff ff 00 00 00 00 00  ................
> > .  0020:  00 00 00 00 00 00 00 00                          ........
> >
> > 0 0 0x9a98 [0x28]: PERF_RECORD_CPU_MAP
> >
> > Using the 4-byte encoding it is:
> > 0x9a98@pipe [0x20]: event: 74
> > .
> > . ... raw event: size 32 bytes
> > .  0000:  4a 00 00 00 00 00 20 00 01 00 03 00 04 00 ff ff  J..... .........
> > .  0010:  ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00  ................
> >
> > 0 0 0x9a98 [0x20]: PERF_RECORD_CPU_MAP
> >
> > Finally, with the range encoding it is:
> > 0x9ab8@pipe [0x10]: event: 74
> > .
> > . ... raw event: size 16 bytes
> > .  0000:  4a 00 00 00 00 00 10 00 02 00 00 00 00 00 47 00  J.............G.
> >
> > 0 0 0x9ab8 [0x10]: PERF_RECORD_CPU_MAP
> >
> > v2. Fixes a bug in the size computation of the update header
> >     introduced by the last patch (Add range data encoding) and caught
> >     by address sanitizer.
> >
> > Ian Rogers (6):
> >   perf cpumap: Const map for max
> >   perf cpumap: Synthetic events and const/static
> >   perf cpumap: Compute mask size in constant time
> >   perf cpumap: Fix alignment for masks in event encoding
> >   perf events: Prefer union over variable length array
> >   perf cpumap: Add range data encoding
>
> Ping. There was some feedback on this change but nothing to create a
> v3. Feedback/acked-by/reviewed-by appreciated.

Ping. Feedback appreciated.

Thanks,
Ian

> Thanks,
> Ian
>
> >  tools/lib/perf/cpumap.c              |   2 +-
> >  tools/lib/perf/include/perf/cpumap.h |   2 +-
> >  tools/lib/perf/include/perf/event.h  |  61 ++++++++-
> >  tools/perf/tests/cpumap.c            |  71 ++++++++---
> >  tools/perf/tests/event_update.c      |  14 +--
> >  tools/perf/util/cpumap.c             | 111 +++++++++++++---
> >  tools/perf/util/cpumap.h             |   4 +-
> >  tools/perf/util/event.h              |   4 -
> >  tools/perf/util/header.c             |  24 ++--
> >  tools/perf/util/session.c            |  35 +++---
> >  tools/perf/util/synthetic-events.c   | 182 +++++++++++++--------------
> >  tools/perf/util/synthetic-events.h   |   2 +-
> >  12 files changed, 327 insertions(+), 185 deletions(-)
> >
> > --
> > 2.36.1.476.g0c4daa206d-goog
> >
Re: [PATCH v2 0/6] Corrections to cpu map event encoding
Posted by Jiri Olsa 3 years, 9 months ago
On Thu, Jul 28, 2022 at 07:01:09PM -0700, Ian Rogers wrote:
> On Fri, Jul 15, 2022 at 10:01 AM Ian Rogers <irogers@google.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 7:33 AM Ian Rogers <irogers@google.com> wrote:
> > >
> > > A mask encoding of a cpu map is laid out as:
> > >   u16 nr
> > >   u16 long_size
> > >   unsigned long mask[];
> > > However, the mask may be 8-byte aligned meaning there is a 4-byte pad
> > > after long_size. This means 32-bit and 64-bit builds see the mask as
> > > being at different offsets. On top of this the structure is in the byte
> > > data[] encoded as:
> > >   u16 type
> > >   char data[]
> > > This means the mask's struct isn't the required 4 or 8 byte aligned, but
> > > is offset by 2. Consequently the long reads and writes are causing
> > > undefined behavior as the alignment is broken.
> > >
> > > These changes do minor clean up with const, visibility of functions
> > > and using the constant time max function. It then adds 32 and 64-bit
> > > mask encoding variants, packed to match current alignment. Taking the
> > > address of a packed struct leads to unaligned data, so function
> > > arguments are altered to be passed the packed struct. To compact the
> > > mask encoding further and drop the padding, the 4-byte variant is
> > > preferred. Finally a new range encoding is added, that reduces the
> > > size of the common case of a range of CPUs to a single u64.
> > >
> > > On a 72 CPU (hyperthread) machine the original encoding of all CPUs is:
> > > 0x9a98 [0x28]: event: 74
> > > .
> > > . ... raw event: size 40 bytes
> > > .  0000:  4a 00 00 00 00 00 28 00 01 00 02 00 08 00 00 00  J.....(.........
> > > .  0010:  00 00 ff ff ff ff ff ff ff ff ff 00 00 00 00 00  ................
> > > .  0020:  00 00 00 00 00 00 00 00                          ........
> > >
> > > 0 0 0x9a98 [0x28]: PERF_RECORD_CPU_MAP
> > >
> > > Using the 4-byte encoding it is:
> > > 0x9a98@pipe [0x20]: event: 74
> > > .
> > > . ... raw event: size 32 bytes
> > > .  0000:  4a 00 00 00 00 00 20 00 01 00 03 00 04 00 ff ff  J..... .........
> > > .  0010:  ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00  ................
> > >
> > > 0 0 0x9a98 [0x20]: PERF_RECORD_CPU_MAP
> > >
> > > Finally, with the range encoding it is:
> > > 0x9ab8@pipe [0x10]: event: 74
> > > .
> > > . ... raw event: size 16 bytes
> > > .  0000:  4a 00 00 00 00 00 10 00 02 00 00 00 00 00 47 00  J.............G.
> > >
> > > 0 0 0x9ab8 [0x10]: PERF_RECORD_CPU_MAP
> > >
> > > v2. Fixes a bug in the size computation of the update header
> > >     introduced by the last patch (Add range data encoding) and caught
> > >     by address sanitizer.
> > >
> > > Ian Rogers (6):
> > >   perf cpumap: Const map for max
> > >   perf cpumap: Synthetic events and const/static
> > >   perf cpumap: Compute mask size in constant time
> > >   perf cpumap: Fix alignment for masks in event encoding
> > >   perf events: Prefer union over variable length array
> > >   perf cpumap: Add range data encoding
> >
> > Ping. There was some feedback on this change but nothing to create a
> > v3. Feedback/acked-by/reviewed-by appreciated.
> 
> Ping. Feedback appreciated.

hi,
there's some unanswered feedback:
https://lore.kernel.org/linux-perf-users/YrwY3SP+jsTwrRBw@krava/

jirka
Re: [PATCH v2 0/6] Corrections to cpu map event encoding
Posted by Ian Rogers 3 years, 9 months ago
On Fri, Jul 29, 2022 at 4:35 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, Jul 28, 2022 at 07:01:09PM -0700, Ian Rogers wrote:
> > On Fri, Jul 15, 2022 at 10:01 AM Ian Rogers <irogers@google.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 7:33 AM Ian Rogers <irogers@google.com> wrote:
> > > >
> > > > A mask encoding of a cpu map is laid out as:
> > > >   u16 nr
> > > >   u16 long_size
> > > >   unsigned long mask[];
> > > > However, the mask may be 8-byte aligned meaning there is a 4-byte pad
> > > > after long_size. This means 32-bit and 64-bit builds see the mask as
> > > > being at different offsets. On top of this the structure is in the byte
> > > > data[] encoded as:
> > > >   u16 type
> > > >   char data[]
> > > > This means the mask's struct isn't the required 4 or 8 byte aligned, but
> > > > is offset by 2. Consequently the long reads and writes are causing
> > > > undefined behavior as the alignment is broken.
> > > >
> > > > These changes do minor clean up with const, visibility of functions
> > > > and using the constant time max function. It then adds 32 and 64-bit
> > > > mask encoding variants, packed to match current alignment. Taking the
> > > > address of a packed struct leads to unaligned data, so function
> > > > arguments are altered to be passed the packed struct. To compact the
> > > > mask encoding further and drop the padding, the 4-byte variant is
> > > > preferred. Finally a new range encoding is added, that reduces the
> > > > size of the common case of a range of CPUs to a single u64.
> > > >
> > > > On a 72 CPU (hyperthread) machine the original encoding of all CPUs is:
> > > > 0x9a98 [0x28]: event: 74
> > > > .
> > > > . ... raw event: size 40 bytes
> > > > .  0000:  4a 00 00 00 00 00 28 00 01 00 02 00 08 00 00 00  J.....(.........
> > > > .  0010:  00 00 ff ff ff ff ff ff ff ff ff 00 00 00 00 00  ................
> > > > .  0020:  00 00 00 00 00 00 00 00                          ........
> > > >
> > > > 0 0 0x9a98 [0x28]: PERF_RECORD_CPU_MAP
> > > >
> > > > Using the 4-byte encoding it is:
> > > > 0x9a98@pipe [0x20]: event: 74
> > > > .
> > > > . ... raw event: size 32 bytes
> > > > .  0000:  4a 00 00 00 00 00 20 00 01 00 03 00 04 00 ff ff  J..... .........
> > > > .  0010:  ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00  ................
> > > >
> > > > 0 0 0x9a98 [0x20]: PERF_RECORD_CPU_MAP
> > > >
> > > > Finally, with the range encoding it is:
> > > > 0x9ab8@pipe [0x10]: event: 74
> > > > .
> > > > . ... raw event: size 16 bytes
> > > > .  0000:  4a 00 00 00 00 00 10 00 02 00 00 00 00 00 47 00  J.............G.
> > > >
> > > > 0 0 0x9ab8 [0x10]: PERF_RECORD_CPU_MAP
> > > >
> > > > v2. Fixes a bug in the size computation of the update header
> > > >     introduced by the last patch (Add range data encoding) and caught
> > > >     by address sanitizer.
> > > >
> > > > Ian Rogers (6):
> > > >   perf cpumap: Const map for max
> > > >   perf cpumap: Synthetic events and const/static
> > > >   perf cpumap: Compute mask size in constant time
> > > >   perf cpumap: Fix alignment for masks in event encoding
> > > >   perf events: Prefer union over variable length array
> > > >   perf cpumap: Add range data encoding
> > >
> > > Ping. There was some feedback on this change but nothing to create a
> > > v3. Feedback/acked-by/reviewed-by appreciated.
> >
> > Ping. Feedback appreciated.
>
> hi,
> there's some unanswered feedback:
> https://lore.kernel.org/linux-perf-users/YrwY3SP+jsTwrRBw@krava/
>
> jirka

Thanks Jirka,

Was there a comment in particular? My reply was here:
https://lore.kernel.org/linux-perf-users/CAP-5=fU=BpP4OT2axZLSfRnKxQxmv-HXj8khBgmx3XQMS+abgA@mail.gmail.com/
I double checked, everyone of your comments was answered.

Thanks,
Ian
Re: [PATCH v2 0/6] Corrections to cpu map event encoding
Posted by Jiri Olsa 3 years, 8 months ago
On Fri, Jul 29, 2022 at 07:28:36AM -0700, Ian Rogers wrote:
> On Fri, Jul 29, 2022 at 4:35 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Thu, Jul 28, 2022 at 07:01:09PM -0700, Ian Rogers wrote:
> > > On Fri, Jul 15, 2022 at 10:01 AM Ian Rogers <irogers@google.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 7:33 AM Ian Rogers <irogers@google.com> wrote:
> > > > >
> > > > > A mask encoding of a cpu map is laid out as:
> > > > >   u16 nr
> > > > >   u16 long_size
> > > > >   unsigned long mask[];
> > > > > However, the mask may be 8-byte aligned meaning there is a 4-byte pad
> > > > > after long_size. This means 32-bit and 64-bit builds see the mask as
> > > > > being at different offsets. On top of this the structure is in the byte
> > > > > data[] encoded as:
> > > > >   u16 type
> > > > >   char data[]
> > > > > This means the mask's struct isn't the required 4 or 8 byte aligned, but
> > > > > is offset by 2. Consequently the long reads and writes are causing
> > > > > undefined behavior as the alignment is broken.
> > > > >
> > > > > These changes do minor clean up with const, visibility of functions
> > > > > and using the constant time max function. It then adds 32 and 64-bit
> > > > > mask encoding variants, packed to match current alignment. Taking the
> > > > > address of a packed struct leads to unaligned data, so function
> > > > > arguments are altered to be passed the packed struct. To compact the
> > > > > mask encoding further and drop the padding, the 4-byte variant is
> > > > > preferred. Finally a new range encoding is added, that reduces the
> > > > > size of the common case of a range of CPUs to a single u64.
> > > > >
> > > > > On a 72 CPU (hyperthread) machine the original encoding of all CPUs is:
> > > > > 0x9a98 [0x28]: event: 74
> > > > > .
> > > > > . ... raw event: size 40 bytes
> > > > > .  0000:  4a 00 00 00 00 00 28 00 01 00 02 00 08 00 00 00  J.....(.........
> > > > > .  0010:  00 00 ff ff ff ff ff ff ff ff ff 00 00 00 00 00  ................
> > > > > .  0020:  00 00 00 00 00 00 00 00                          ........
> > > > >
> > > > > 0 0 0x9a98 [0x28]: PERF_RECORD_CPU_MAP
> > > > >
> > > > > Using the 4-byte encoding it is:
> > > > > 0x9a98@pipe [0x20]: event: 74
> > > > > .
> > > > > . ... raw event: size 32 bytes
> > > > > .  0000:  4a 00 00 00 00 00 20 00 01 00 03 00 04 00 ff ff  J..... .........
> > > > > .  0010:  ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00  ................
> > > > >
> > > > > 0 0 0x9a98 [0x20]: PERF_RECORD_CPU_MAP
> > > > >
> > > > > Finally, with the range encoding it is:
> > > > > 0x9ab8@pipe [0x10]: event: 74
> > > > > .
> > > > > . ... raw event: size 16 bytes
> > > > > .  0000:  4a 00 00 00 00 00 10 00 02 00 00 00 00 00 47 00  J.............G.
> > > > >
> > > > > 0 0 0x9ab8 [0x10]: PERF_RECORD_CPU_MAP
> > > > >
> > > > > v2. Fixes a bug in the size computation of the update header
> > > > >     introduced by the last patch (Add range data encoding) and caught
> > > > >     by address sanitizer.
> > > > >
> > > > > Ian Rogers (6):
> > > > >   perf cpumap: Const map for max
> > > > >   perf cpumap: Synthetic events and const/static
> > > > >   perf cpumap: Compute mask size in constant time
> > > > >   perf cpumap: Fix alignment for masks in event encoding
> > > > >   perf events: Prefer union over variable length array
> > > > >   perf cpumap: Add range data encoding
> > > >
> > > > Ping. There was some feedback on this change but nothing to create a
> > > > v3. Feedback/acked-by/reviewed-by appreciated.
> > >
> > > Ping. Feedback appreciated.
> >
> > hi,
> > there's some unanswered feedback:
> > https://lore.kernel.org/linux-perf-users/YrwY3SP+jsTwrRBw@krava/
> >
> > jirka
> 
> Thanks Jirka,
> 
> Was there a comment in particular? My reply was here:
> https://lore.kernel.org/linux-perf-users/CAP-5=fU=BpP4OT2axZLSfRnKxQxmv-HXj8khBgmx3XQMS+abgA@mail.gmail.com/
> I double checked, everyone of your comments was answered.

ugh sorry, seems it did not get into my inbox for some reason

jirka
Re: [PATCH v2 0/6] Corrections to cpu map event encoding
Posted by Jiri Olsa 3 years, 8 months ago
On Tue, Jun 14, 2022 at 07:33:47AM -0700, Ian Rogers wrote:
> A mask encoding of a cpu map is laid out as:
>   u16 nr
>   u16 long_size
>   unsigned long mask[];
> However, the mask may be 8-byte aligned meaning there is a 4-byte pad
> after long_size. This means 32-bit and 64-bit builds see the mask as
> being at different offsets. On top of this the structure is in the byte
> data[] encoded as:
>   u16 type
>   char data[]
> This means the mask's struct isn't the required 4 or 8 byte aligned, but
> is offset by 2. Consequently the long reads and writes are causing
> undefined behavior as the alignment is broken.
> 
> These changes do minor clean up with const, visibility of functions
> and using the constant time max function. It then adds 32 and 64-bit
> mask encoding variants, packed to match current alignment. Taking the
> address of a packed struct leads to unaligned data, so function
> arguments are altered to be passed the packed struct. To compact the
> mask encoding further and drop the padding, the 4-byte variant is
> preferred. Finally a new range encoding is added, that reduces the
> size of the common case of a range of CPUs to a single u64.
> 
> On a 72 CPU (hyperthread) machine the original encoding of all CPUs is:
> 0x9a98 [0x28]: event: 74
> .
> . ... raw event: size 40 bytes
> .  0000:  4a 00 00 00 00 00 28 00 01 00 02 00 08 00 00 00  J.....(.........
> .  0010:  00 00 ff ff ff ff ff ff ff ff ff 00 00 00 00 00  ................
> .  0020:  00 00 00 00 00 00 00 00                          ........        
> 
> 0 0 0x9a98 [0x28]: PERF_RECORD_CPU_MAP
> 
> Using the 4-byte encoding it is:
> 0x9a98@pipe [0x20]: event: 74
> .
> . ... raw event: size 32 bytes
> .  0000:  4a 00 00 00 00 00 20 00 01 00 03 00 04 00 ff ff  J..... .........
> .  0010:  ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00  ................
> 
> 0 0 0x9a98 [0x20]: PERF_RECORD_CPU_MAP
> 
> Finally, with the range encoding it is:
> 0x9ab8@pipe [0x10]: event: 74
> .
> . ... raw event: size 16 bytes
> .  0000:  4a 00 00 00 00 00 10 00 02 00 00 00 00 00 47 00  J.............G.
> 
> 0 0 0x9ab8 [0x10]: PERF_RECORD_CPU_MAP
> 
> v2. Fixes a bug in the size computation of the update header
>     introduced by the last patch (Add range data encoding) and caught
>     by address sanitizer.
> 
> Ian Rogers (6):
>   perf cpumap: Const map for max
>   perf cpumap: Synthetic events and const/static
>   perf cpumap: Compute mask size in constant time
>   perf cpumap: Fix alignment for masks in event encoding
>   perf events: Prefer union over variable length array
>   perf cpumap: Add range data encoding

Acked-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

> 
>  tools/lib/perf/cpumap.c              |   2 +-
>  tools/lib/perf/include/perf/cpumap.h |   2 +-
>  tools/lib/perf/include/perf/event.h  |  61 ++++++++-
>  tools/perf/tests/cpumap.c            |  71 ++++++++---
>  tools/perf/tests/event_update.c      |  14 +--
>  tools/perf/util/cpumap.c             | 111 +++++++++++++---
>  tools/perf/util/cpumap.h             |   4 +-
>  tools/perf/util/event.h              |   4 -
>  tools/perf/util/header.c             |  24 ++--
>  tools/perf/util/session.c            |  35 +++---
>  tools/perf/util/synthetic-events.c   | 182 +++++++++++++--------------
>  tools/perf/util/synthetic-events.h   |   2 +-
>  12 files changed, 327 insertions(+), 185 deletions(-)
> 
> -- 
> 2.36.1.476.g0c4daa206d-goog
>