[v2] Add bpf_xdp_get_xfrm_state() kfunc

[PATCH ipsec-next v2 3/6] libbpf: Add BPF_CORE_WRITE_BITFIELD() macro

Posted by Daniel Xu 2 years, 2 months ago

Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
writing wrapper to make the verifier happy.

Two alternatives to this approach are:

1. Use the upcoming `preserve_static_offset` [0] attribute to disable
   CO-RE on specific structs.
2. Use broader byte-sized writes to write to bitfields.

(1) is a bit a bit hard to use. It requires specific and
not-very-obvious annotations to bpftool generated vmlinux.h. It's also
not generally available in released LLVM versions yet.

(2) makes the code quite hard to read and write. And especially if
BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
to have an inverse helper for writing.

[0]: https://reviews.llvm.org/D133361
From: Eduard Zingerman <eddyz87@gmail.com>

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 tools/lib/bpf/bpf_core_read.h | 36 +++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
index 1ac57bb7ac55..7a764f65d299 100644
--- a/tools/lib/bpf/bpf_core_read.h
+++ b/tools/lib/bpf/bpf_core_read.h
@@ -111,6 +111,42 @@ enum bpf_enum_value_kind {
 	val;								      \
 })
 
+/*
+ * Write to a bitfield, identified by s->field.
+ * This is the inverse of BPF_CORE_WRITE_BITFIELD().
+ */
+#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({			\
+	void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET);	\
+	unsigned int byte_size = __CORE_RELO(s, field, BYTE_SIZE);	\
+	unsigned int lshift = __CORE_RELO(s, field, LSHIFT_U64);	\
+	unsigned int rshift = __CORE_RELO(s, field, RSHIFT_U64);	\
+	unsigned int bit_size = (rshift - lshift);			\
+	unsigned long long nval, val, hi, lo;				\
+									\
+	asm volatile("" : "+r"(p));					\
+									\
+	switch (byte_size) {						\
+	case 1: val = *(unsigned char *)p; break;			\
+	case 2: val = *(unsigned short *)p; break;			\
+	case 4: val = *(unsigned int *)p; break;			\
+	case 8: val = *(unsigned long long *)p; break;			\
+	}								\
+	hi = val >> (bit_size + rshift);				\
+	hi <<= bit_size + rshift;					\
+	lo = val << (bit_size + lshift);				\
+	lo >>= bit_size + lshift;					\
+	nval = new_val;							\
+	nval <<= lshift;						\
+	nval >>= rshift;						\
+	val = hi | nval | lo;						\
+	switch (byte_size) {						\
+	case 1: *(unsigned char *)p      = val; break;			\
+	case 2: *(unsigned short *)p     = val; break;			\
+	case 4: *(unsigned int *)p       = val; break;			\
+	case 8: *(unsigned long long *)p = val; break;			\
+	}								\
+})
+
 #define ___bpf_field_ref1(field)	(field)
 #define ___bpf_field_ref2(type, field)	(((typeof(type) *)0)->field)
 #define ___bpf_field_ref(args...)					    \
-- 
2.42.1

Re: [PATCH ipsec-next v2 3/6] libbpf: Add BPF_CORE_WRITE_BITFIELD() macro

Posted by Eduard Zingerman 2 years, 2 months ago

On Tue, 2023-11-28 at 10:54 -0700, Daniel Xu wrote:
> Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
> writing wrapper to make the verifier happy.
> 
> Two alternatives to this approach are:
> 
> 1. Use the upcoming `preserve_static_offset` [0] attribute to disable
>    CO-RE on specific structs.
> 2. Use broader byte-sized writes to write to bitfields.
> 
> (1) is a bit a bit hard to use. It requires specific and
> not-very-obvious annotations to bpftool generated vmlinux.h. It's also
> not generally available in released LLVM versions yet.
> 
> (2) makes the code quite hard to read and write. And especially if
> BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
> to have an inverse helper for writing.
> 
> [0]: https://reviews.llvm.org/D133361
> From: Eduard Zingerman <eddyz87@gmail.com>
> 
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> ---

Could you please also add a selftest (or several) using __retval()
annotation for this macro?

Re: [PATCH ipsec-next v2 3/6] libbpf: Add BPF_CORE_WRITE_BITFIELD() macro

Posted by Daniel Xu 2 years, 2 months ago

On Tue, Nov 28, 2023 at 07:59:01PM +0200, Eduard Zingerman wrote:
> On Tue, 2023-11-28 at 10:54 -0700, Daniel Xu wrote:
> > Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
> > writing wrapper to make the verifier happy.
> > 
> > Two alternatives to this approach are:
> > 
> > 1. Use the upcoming `preserve_static_offset` [0] attribute to disable
> >    CO-RE on specific structs.
> > 2. Use broader byte-sized writes to write to bitfields.
> > 
> > (1) is a bit a bit hard to use. It requires specific and
> > not-very-obvious annotations to bpftool generated vmlinux.h. It's also
> > not generally available in released LLVM versions yet.
> > 
> > (2) makes the code quite hard to read and write. And especially if
> > BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
> > to have an inverse helper for writing.
> > 
> > [0]: https://reviews.llvm.org/D133361
> > From: Eduard Zingerman <eddyz87@gmail.com>
> > 
> > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> > ---
> 
> Could you please also add a selftest (or several) using __retval()
> annotation for this macro?

Good call about adding tests -- I found a few bugs with the code from
the other thread. But boy did they take a lot of brain cells to figure
out.

There was some 6th grade algebra involved too -- I'll do my best to
explain it in the commit msg for v3.


Here are the fixes in case you are curious:

diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
index 7a764f65d299..8f02c558c0ff 100644
--- a/tools/lib/bpf/bpf_core_read.h
+++ b/tools/lib/bpf/bpf_core_read.h
@@ -120,7 +120,9 @@ enum bpf_enum_value_kind {
        unsigned int byte_size = __CORE_RELO(s, field, BYTE_SIZE);      \
        unsigned int lshift = __CORE_RELO(s, field, LSHIFT_U64);        \
        unsigned int rshift = __CORE_RELO(s, field, RSHIFT_U64);        \
-       unsigned int bit_size = (rshift - lshift);                      \
+       unsigned int bit_size = (64 - rshift);                          \
+       unsigned int hi_size = lshift;                                  \
+       unsigned int lo_size = (rshift - lshift);                       \
        unsigned long long nval, val, hi, lo;                           \
                                                                        \
        asm volatile("" : "+r"(p));                                     \
@@ -131,13 +133,13 @@ enum bpf_enum_value_kind {
        case 4: val = *(unsigned int *)p; break;                        \
        case 8: val = *(unsigned long long *)p; break;                  \
        }                                                               \
-       hi = val >> (bit_size + rshift);                                \
-       hi <<= bit_size + rshift;                                       \
-       lo = val << (bit_size + lshift);                                \
-       lo >>= bit_size + lshift;                                       \
+       hi = val >> (64 - hi_size);                                     \
+       hi <<= 64 - hi_size;                                            \
+       lo = val << (64 - lo_size);                                     \
+       lo >>= 64 - lo_size;                                            \
        nval = new_val;                                                 \
-       nval <<= lshift;                                                \
-       nval >>= rshift;                                                \
+       nval <<= (64 - bit_size);                                       \
+       nval >>= (64 - bit_size - lo_size);                             \
        val = hi | nval | lo;                                           \
        switch (byte_size) {                                            \
        case 1: *(unsigned char *)p      = val; break;                  \


Thanks,
Daniel

Re: [PATCH ipsec-next v2 3/6] libbpf: Add BPF_CORE_WRITE_BITFIELD() macro

Posted by Andrii Nakryiko 2 years, 2 months ago

On Thu, Nov 30, 2023 at 5:33 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
>
> On Tue, Nov 28, 2023 at 07:59:01PM +0200, Eduard Zingerman wrote:
> > On Tue, 2023-11-28 at 10:54 -0700, Daniel Xu wrote:
> > > Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
> > > writing wrapper to make the verifier happy.
> > >
> > > Two alternatives to this approach are:
> > >
> > > 1. Use the upcoming `preserve_static_offset` [0] attribute to disable
> > >    CO-RE on specific structs.
> > > 2. Use broader byte-sized writes to write to bitfields.
> > >
> > > (1) is a bit a bit hard to use. It requires specific and
> > > not-very-obvious annotations to bpftool generated vmlinux.h. It's also
> > > not generally available in released LLVM versions yet.
> > >
> > > (2) makes the code quite hard to read and write. And especially if
> > > BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
> > > to have an inverse helper for writing.
> > >
> > > [0]: https://reviews.llvm.org/D133361
> > > From: Eduard Zingerman <eddyz87@gmail.com>
> > >
> > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> > > ---
> >
> > Could you please also add a selftest (or several) using __retval()
> > annotation for this macro?
>
> Good call about adding tests -- I found a few bugs with the code from
> the other thread. But boy did they take a lot of brain cells to figure
> out.
>
> There was some 6th grade algebra involved too -- I'll do my best to
> explain it in the commit msg for v3.
>
>
> Here are the fixes in case you are curious:
>
> diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
> index 7a764f65d299..8f02c558c0ff 100644
> --- a/tools/lib/bpf/bpf_core_read.h
> +++ b/tools/lib/bpf/bpf_core_read.h
> @@ -120,7 +120,9 @@ enum bpf_enum_value_kind {
>         unsigned int byte_size = __CORE_RELO(s, field, BYTE_SIZE);      \
>         unsigned int lshift = __CORE_RELO(s, field, LSHIFT_U64);        \
>         unsigned int rshift = __CORE_RELO(s, field, RSHIFT_U64);        \
> -       unsigned int bit_size = (rshift - lshift);                      \
> +       unsigned int bit_size = (64 - rshift);                          \
> +       unsigned int hi_size = lshift;                                  \
> +       unsigned int lo_size = (rshift - lshift);                       \

nit: let's drop unnecessary ()

>         unsigned long long nval, val, hi, lo;                           \
>                                                                         \
>         asm volatile("" : "+r"(p));                                     \
> @@ -131,13 +133,13 @@ enum bpf_enum_value_kind {
>         case 4: val = *(unsigned int *)p; break;                        \
>         case 8: val = *(unsigned long long *)p; break;                  \
>         }                                                               \
> -       hi = val >> (bit_size + rshift);                                \
> -       hi <<= bit_size + rshift;                                       \
> -       lo = val << (bit_size + lshift);                                \
> -       lo >>= bit_size + lshift;                                       \
> +       hi = val >> (64 - hi_size);                                     \
> +       hi <<= 64 - hi_size;                                            \
> +       lo = val << (64 - lo_size);                                     \
> +       lo >>= 64 - lo_size;                                            \
>         nval = new_val;                                                 \
> -       nval <<= lshift;                                                \
> -       nval >>= rshift;                                                \
> +       nval <<= (64 - bit_size);                                       \
> +       nval >>= (64 - bit_size - lo_size);                             \
>         val = hi | nval | lo;                                           \

this looks.. unusual. I'd imagine we calculate a mask, mask out bits
we are replacing, and then OR with new values, roughly (assuming all
the right left/right shift values and stuff)

/* clear bits */
val &= ~(bitfield_mask << shift);
/* set bits */
val |= (nval & bitfield_mask) << shift;

?

>         switch (byte_size) {                                            \
>         case 1: *(unsigned char *)p      = val; break;                  \
>
>
> Thanks,
> Daniel

Re: [PATCH ipsec-next v2 3/6] libbpf: Add BPF_CORE_WRITE_BITFIELD() macro

Posted by Eduard Zingerman 2 years, 2 months ago

On Thu, 2023-11-30 at 18:33 -0700, Daniel Xu wrote:
[...]
> Good call about adding tests -- I found a few bugs with the code from
> the other thread. But boy did they take a lot of brain cells to figure
> out.
> 
> There was some 6th grade algebra involved too -- I'll do my best to
> explain it in the commit msg for v3.
> 
> Here are the fixes in case you are curious:

Ouch, I knew my code from 3am can't be trusted, sorry for that.
Your math seem to make sense, thank you.

[...]

Re: [PATCH ipsec-next v2 3/6] libbpf: Add BPF_CORE_WRITE_BITFIELD() macro

Posted by Daniel Xu 2 years, 2 months ago

On Tue, Nov 28, 2023 at 07:59:01PM +0200, Eduard Zingerman wrote:
> On Tue, 2023-11-28 at 10:54 -0700, Daniel Xu wrote:
> > Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
> > writing wrapper to make the verifier happy.
> > 
> > Two alternatives to this approach are:
> > 
> > 1. Use the upcoming `preserve_static_offset` [0] attribute to disable
> >    CO-RE on specific structs.
> > 2. Use broader byte-sized writes to write to bitfields.
> > 
> > (1) is a bit a bit hard to use. It requires specific and
> > not-very-obvious annotations to bpftool generated vmlinux.h. It's also
> > not generally available in released LLVM versions yet.
> > 
> > (2) makes the code quite hard to read and write. And especially if
> > BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
> > to have an inverse helper for writing.
> > 
> > [0]: https://reviews.llvm.org/D133361
> > From: Eduard Zingerman <eddyz87@gmail.com>
> > 
> > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> > ---
> 
> Could you please also add a selftest (or several) using __retval()
> annotation for this macro?

Sure, I'll take a look.

Thanks,
Daniel