[RFC PATCH v2 0/5] eBPF RSS support for virtio-net

Andrew Melnychenko posted 5 patches 3 years, 4 months ago
Failed in applying to current master (apply log)
There is a newer version of this series
MAINTAINERS                    |    7 +
configure                      |   33 +
docs/ebpf_rss.rst              |  133 +
ebpf/EbpfElf_to_C.py           |   36 +
ebpf/Makefile.ebpf             |   33 +
ebpf/ebpf_rss-stub.c           |   40 +
ebpf/ebpf_rss.c                |  186 ++
ebpf/ebpf_rss.h                |   44 +
ebpf/meson.build               |    1 +
ebpf/rss.bpf.c                 |  505 +++
ebpf/tun_rss_steering.h        | 5439 ++++++++++++++++++++++++++++++++
hw/net/vhost_net.c             |    2 +
hw/net/virtio-net.c            |  120 +-
include/hw/virtio/virtio-net.h |    4 +
include/net/net.h              |    2 +
meson.build                    |   11 +
net/tap-bsd.c                  |    5 +
net/tap-linux.c                |   13 +
net/tap-linux.h                |    1 +
net/tap-solaris.c              |    5 +
net/tap-stub.c                 |    5 +
net/tap.c                      |    9 +
net/tap_int.h                  |    1 +
net/vhost-vdpa.c               |    2 +
24 files changed, 6633 insertions(+), 4 deletions(-)
create mode 100644 docs/ebpf_rss.rst
create mode 100644 ebpf/EbpfElf_to_C.py
create mode 100755 ebpf/Makefile.ebpf
create mode 100644 ebpf/ebpf_rss-stub.c
create mode 100644 ebpf/ebpf_rss.c
create mode 100644 ebpf/ebpf_rss.h
create mode 100644 ebpf/meson.build
create mode 100644 ebpf/rss.bpf.c
create mode 100644 ebpf/tun_rss_steering.h
[RFC PATCH v2 0/5] eBPF RSS support for virtio-net
Posted by Andrew Melnychenko 3 years, 4 months ago
This set of patches introduces the usage of eBPF for packet steering
and RSS hash calculation:
* RSS(Receive Side Scaling) is used to distribute network packets to
guest virtqueues by calculating packet hash
* Additionally adding support for the usage of RSS with vhost

The eBPF works on kernels 5.8+
On earlier kerneld it fails to load and the RSS feature is reported
only without vhost and implemented in 'in-qemu' software.

Implementation notes:
Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
Added libbpf dependency and eBPF support.
The eBPF program is part of the qemu and presented as an array
of BPF ELF file data.
The compilation of eBPF is not part of QEMU build and can be done 
using provided Makefile.ebpf(need to adjust 'linuxhdrs').
Added changes to virtio-net and vhost, primary eBPF RSS is used.
'in-qemu' RSS used in the case of hash population and as a fallback option.
For vhost, the hash population feature is not reported to the guest.

Please also see the documentation in PATCH 5/5.

I am sending those patches as RFC to initiate the discussions and get
feedback on the following points:
* Fallback when eBPF is not supported by the kernel
* Live migration to the kernel that doesn't have eBPF support
* Integration with current QEMU build
* Additional usage for eBPF for packet filtering

Known issues:
* hash population not supported by eBPF RSS: 'in-qemu' RSS used
as a fallback, also, hash population feature is not reported to guests
with vhost.
* big-endian BPF support: for now, eBPF isn't supported on
big-endian systems. Can be added in future if required.
* huge .h file with eBPF binary. The size of .h file containing
eBPF binary is currently ~5K lines, because the binary is built with debug information.
The binary without debug/BTF info can't be loaded by libbpf.
We're looking for possibilities to reduce the size of the .h files.

Changes since v1:
* using libbpf instead of direct 'bpf' system call.
* added libbpf dependency to the configure/meson scripts.
* changed python script for eBPF .h file generation.
* changed eBPF program - reading L3 proto from ethernet frame.
* added TUNSETSTEERINGEBPF define for TUN.
* changed the maintainer's info.
* added license headers.
* refactored code.

Andrew (5):
  net: Added SetSteeringEBPF method for NetClientState.
  ebpf: Added eBPF RSS program.
  ebpf: Added eBPF RSS loader.
  virtio-net: Added eBPF RSS to virtio-net.
  docs: Added eBPF RSS documentation.

 MAINTAINERS                    |    7 +
 configure                      |   33 +
 docs/ebpf_rss.rst              |  133 +
 ebpf/EbpfElf_to_C.py           |   36 +
 ebpf/Makefile.ebpf             |   33 +
 ebpf/ebpf_rss-stub.c           |   40 +
 ebpf/ebpf_rss.c                |  186 ++
 ebpf/ebpf_rss.h                |   44 +
 ebpf/meson.build               |    1 +
 ebpf/rss.bpf.c                 |  505 +++
 ebpf/tun_rss_steering.h        | 5439 ++++++++++++++++++++++++++++++++
 hw/net/vhost_net.c             |    2 +
 hw/net/virtio-net.c            |  120 +-
 include/hw/virtio/virtio-net.h |    4 +
 include/net/net.h              |    2 +
 meson.build                    |   11 +
 net/tap-bsd.c                  |    5 +
 net/tap-linux.c                |   13 +
 net/tap-linux.h                |    1 +
 net/tap-solaris.c              |    5 +
 net/tap-stub.c                 |    5 +
 net/tap.c                      |    9 +
 net/tap_int.h                  |    1 +
 net/vhost-vdpa.c               |    2 +
 24 files changed, 6633 insertions(+), 4 deletions(-)
 create mode 100644 docs/ebpf_rss.rst
 create mode 100644 ebpf/EbpfElf_to_C.py
 create mode 100755 ebpf/Makefile.ebpf
 create mode 100644 ebpf/ebpf_rss-stub.c
 create mode 100644 ebpf/ebpf_rss.c
 create mode 100644 ebpf/ebpf_rss.h
 create mode 100644 ebpf/meson.build
 create mode 100644 ebpf/rss.bpf.c
 create mode 100644 ebpf/tun_rss_steering.h

-- 
2.29.2


Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net
Posted by Jason Wang 3 years, 4 months ago
On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
> This set of patches introduces the usage of eBPF for packet steering
> and RSS hash calculation:
> * RSS(Receive Side Scaling) is used to distribute network packets to
> guest virtqueues by calculating packet hash
> * Additionally adding support for the usage of RSS with vhost
>
> The eBPF works on kernels 5.8+
> On earlier kerneld it fails to load and the RSS feature is reported
> only without vhost and implemented in 'in-qemu' software.
>
> Implementation notes:
> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> Added libbpf dependency and eBPF support.
> The eBPF program is part of the qemu and presented as an array
> of BPF ELF file data.
> The compilation of eBPF is not part of QEMU build and can be done
> using provided Makefile.ebpf(need to adjust 'linuxhdrs').
> Added changes to virtio-net and vhost, primary eBPF RSS is used.
> 'in-qemu' RSS used in the case of hash population and as a fallback option.
> For vhost, the hash population feature is not reported to the guest.
>
> Please also see the documentation in PATCH 5/5.
>
> I am sending those patches as RFC to initiate the discussions and get
> feedback on the following points:
> * Fallback when eBPF is not supported by the kernel
> * Live migration to the kernel that doesn't have eBPF support
> * Integration with current QEMU build
> * Additional usage for eBPF for packet filtering
>
> Known issues:
> * hash population not supported by eBPF RSS: 'in-qemu' RSS used
> as a fallback, also, hash population feature is not reported to guests
> with vhost.
> * big-endian BPF support: for now, eBPF isn't supported on
> big-endian systems. Can be added in future if required.
> * huge .h file with eBPF binary. The size of .h file containing
> eBPF binary is currently ~5K lines, because the binary is built with debug information.
> The binary without debug/BTF info can't be loaded by libbpf.
> We're looking for possibilities to reduce the size of the .h files.


A question here, is this because the binary file contains DWARF data? If 
yes, is it a building or loading dependency? If it's latter, maybe we 
can try to strip them out, anyhow it can't be recognized by kernel.

Thanks


>
> Changes since v1:
> * using libbpf instead of direct 'bpf' system call.
> * added libbpf dependency to the configure/meson scripts.
> * changed python script for eBPF .h file generation.
> * changed eBPF program - reading L3 proto from ethernet frame.
> * added TUNSETSTEERINGEBPF define for TUN.
> * changed the maintainer's info.
> * added license headers.
> * refactored code.
>
> Andrew (5):
>    net: Added SetSteeringEBPF method for NetClientState.
>    ebpf: Added eBPF RSS program.
>    ebpf: Added eBPF RSS loader.
>    virtio-net: Added eBPF RSS to virtio-net.
>    docs: Added eBPF RSS documentation.
>
>   MAINTAINERS                    |    7 +
>   configure                      |   33 +
>   docs/ebpf_rss.rst              |  133 +
>   ebpf/EbpfElf_to_C.py           |   36 +
>   ebpf/Makefile.ebpf             |   33 +
>   ebpf/ebpf_rss-stub.c           |   40 +
>   ebpf/ebpf_rss.c                |  186 ++
>   ebpf/ebpf_rss.h                |   44 +
>   ebpf/meson.build               |    1 +
>   ebpf/rss.bpf.c                 |  505 +++
>   ebpf/tun_rss_steering.h        | 5439 ++++++++++++++++++++++++++++++++
>   hw/net/vhost_net.c             |    2 +
>   hw/net/virtio-net.c            |  120 +-
>   include/hw/virtio/virtio-net.h |    4 +
>   include/net/net.h              |    2 +
>   meson.build                    |   11 +
>   net/tap-bsd.c                  |    5 +
>   net/tap-linux.c                |   13 +
>   net/tap-linux.h                |    1 +
>   net/tap-solaris.c              |    5 +
>   net/tap-stub.c                 |    5 +
>   net/tap.c                      |    9 +
>   net/tap_int.h                  |    1 +
>   net/vhost-vdpa.c               |    2 +
>   24 files changed, 6633 insertions(+), 4 deletions(-)
>   create mode 100644 docs/ebpf_rss.rst
>   create mode 100644 ebpf/EbpfElf_to_C.py
>   create mode 100755 ebpf/Makefile.ebpf
>   create mode 100644 ebpf/ebpf_rss-stub.c
>   create mode 100644 ebpf/ebpf_rss.c
>   create mode 100644 ebpf/ebpf_rss.h
>   create mode 100644 ebpf/meson.build
>   create mode 100644 ebpf/rss.bpf.c
>   create mode 100644 ebpf/tun_rss_steering.h
>


Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net
Posted by Yuri Benditovich 3 years, 4 months ago
On Mon, Nov 23, 2020 at 8:08 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
> > This set of patches introduces the usage of eBPF for packet steering
> > and RSS hash calculation:
> > * RSS(Receive Side Scaling) is used to distribute network packets to
> > guest virtqueues by calculating packet hash
> > * Additionally adding support for the usage of RSS with vhost
> >
> > The eBPF works on kernels 5.8+
> > On earlier kerneld it fails to load and the RSS feature is reported
> > only without vhost and implemented in 'in-qemu' software.
> >
> > Implementation notes:
> > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> > Added libbpf dependency and eBPF support.
> > The eBPF program is part of the qemu and presented as an array
> > of BPF ELF file data.
> > The compilation of eBPF is not part of QEMU build and can be done
> > using provided Makefile.ebpf(need to adjust 'linuxhdrs').
> > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> > 'in-qemu' RSS used in the case of hash population and as a fallback
> option.
> > For vhost, the hash population feature is not reported to the guest.
> >
> > Please also see the documentation in PATCH 5/5.
> >
> > I am sending those patches as RFC to initiate the discussions and get
> > feedback on the following points:
> > * Fallback when eBPF is not supported by the kernel
> > * Live migration to the kernel that doesn't have eBPF support
> > * Integration with current QEMU build
> > * Additional usage for eBPF for packet filtering
> >
> > Known issues:
> > * hash population not supported by eBPF RSS: 'in-qemu' RSS used
> > as a fallback, also, hash population feature is not reported to guests
> > with vhost.
> > * big-endian BPF support: for now, eBPF isn't supported on
> > big-endian systems. Can be added in future if required.
> > * huge .h file with eBPF binary. The size of .h file containing
> > eBPF binary is currently ~5K lines, because the binary is built with
> debug information.
> > The binary without debug/BTF info can't be loaded by libbpf.
> > We're looking for possibilities to reduce the size of the .h files.
>
>
> A question here, is this because the binary file contains DWARF data? If
> yes, is it a building or loading dependency? If it's latter, maybe we
> can try to strip them out, anyhow it can't be recognized by kernel.
>
> Thanks
>
>
After some experiments we can see that stripping of debug sections reduces
the size of
ELF from ~45K to ~20K (we tried to strip more but the libbpf fails to load
it, libbpf needs BTF and symbols)
So I suggest to reevaluate the necessity of libbpf.
For this specific BPF it does not present advantage and we hardly can
create some reusable code
related to libbpf, i.e. any further BPF will need its own libbpf wrapper.
The BTF is really good feature and in case some later BPF will need an
access to kernel
structures it will use libbpf loader.
What you think about it?


>
> >
> > Changes since v1:
> > * using libbpf instead of direct 'bpf' system call.
> > * added libbpf dependency to the configure/meson scripts.
> > * changed python script for eBPF .h file generation.
> > * changed eBPF program - reading L3 proto from ethernet frame.
> > * added TUNSETSTEERINGEBPF define for TUN.
> > * changed the maintainer's info.
> > * added license headers.
> > * refactored code.
> >
> > Andrew (5):
> >    net: Added SetSteeringEBPF method for NetClientState.
> >    ebpf: Added eBPF RSS program.
> >    ebpf: Added eBPF RSS loader.
> >    virtio-net: Added eBPF RSS to virtio-net.
> >    docs: Added eBPF RSS documentation.
> >
> >   MAINTAINERS                    |    7 +
> >   configure                      |   33 +
> >   docs/ebpf_rss.rst              |  133 +
> >   ebpf/EbpfElf_to_C.py           |   36 +
> >   ebpf/Makefile.ebpf             |   33 +
> >   ebpf/ebpf_rss-stub.c           |   40 +
> >   ebpf/ebpf_rss.c                |  186 ++
> >   ebpf/ebpf_rss.h                |   44 +
> >   ebpf/meson.build               |    1 +
> >   ebpf/rss.bpf.c                 |  505 +++
> >   ebpf/tun_rss_steering.h        | 5439 ++++++++++++++++++++++++++++++++
> >   hw/net/vhost_net.c             |    2 +
> >   hw/net/virtio-net.c            |  120 +-
> >   include/hw/virtio/virtio-net.h |    4 +
> >   include/net/net.h              |    2 +
> >   meson.build                    |   11 +
> >   net/tap-bsd.c                  |    5 +
> >   net/tap-linux.c                |   13 +
> >   net/tap-linux.h                |    1 +
> >   net/tap-solaris.c              |    5 +
> >   net/tap-stub.c                 |    5 +
> >   net/tap.c                      |    9 +
> >   net/tap_int.h                  |    1 +
> >   net/vhost-vdpa.c               |    2 +
> >   24 files changed, 6633 insertions(+), 4 deletions(-)
> >   create mode 100644 docs/ebpf_rss.rst
> >   create mode 100644 ebpf/EbpfElf_to_C.py
> >   create mode 100755 ebpf/Makefile.ebpf
> >   create mode 100644 ebpf/ebpf_rss-stub.c
> >   create mode 100644 ebpf/ebpf_rss.c
> >   create mode 100644 ebpf/ebpf_rss.h
> >   create mode 100644 ebpf/meson.build
> >   create mode 100644 ebpf/rss.bpf.c
> >   create mode 100644 ebpf/tun_rss_steering.h
> >
>
>
Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net
Posted by Jason Wang 3 years, 4 months ago
On 2020/11/26 下午8:52, Yuri Benditovich wrote:
>
>
> On Mon, Nov 23, 2020 at 8:08 AM Jason Wang <jasowang@redhat.com 
> <mailto:jasowang@redhat.com>> wrote:
>
>
>     On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
>     > This set of patches introduces the usage of eBPF for packet steering
>     > and RSS hash calculation:
>     > * RSS(Receive Side Scaling) is used to distribute network packets to
>     > guest virtqueues by calculating packet hash
>     > * Additionally adding support for the usage of RSS with vhost
>     >
>     > The eBPF works on kernels 5.8+
>     > On earlier kerneld it fails to load and the RSS feature is reported
>     > only without vhost and implemented in 'in-qemu' software.
>     >
>     > Implementation notes:
>     > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
>     > Added libbpf dependency and eBPF support.
>     > The eBPF program is part of the qemu and presented as an array
>     > of BPF ELF file data.
>     > The compilation of eBPF is not part of QEMU build and can be done
>     > using provided Makefile.ebpf(need to adjust 'linuxhdrs').
>     > Added changes to virtio-net and vhost, primary eBPF RSS is used.
>     > 'in-qemu' RSS used in the case of hash population and as a
>     fallback option.
>     > For vhost, the hash population feature is not reported to the guest.
>     >
>     > Please also see the documentation in PATCH 5/5.
>     >
>     > I am sending those patches as RFC to initiate the discussions
>     and get
>     > feedback on the following points:
>     > * Fallback when eBPF is not supported by the kernel
>     > * Live migration to the kernel that doesn't have eBPF support
>     > * Integration with current QEMU build
>     > * Additional usage for eBPF for packet filtering
>     >
>     > Known issues:
>     > * hash population not supported by eBPF RSS: 'in-qemu' RSS used
>     > as a fallback, also, hash population feature is not reported to
>     guests
>     > with vhost.
>     > * big-endian BPF support: for now, eBPF isn't supported on
>     > big-endian systems. Can be added in future if required.
>     > * huge .h file with eBPF binary. The size of .h file containing
>     > eBPF binary is currently ~5K lines, because the binary is built
>     with debug information.
>     > The binary without debug/BTF info can't be loaded by libbpf.
>     > We're looking for possibilities to reduce the size of the .h files.
>
>
>     A question here, is this because the binary file contains DWARF
>     data? If
>     yes, is it a building or loading dependency? If it's latter, maybe we
>     can try to strip them out, anyhow it can't be recognized by kernel.
>
>     Thanks
>
>
> After some experiments we can see that stripping of debug sections 
> reduces the size of
> ELF from ~45K to ~20K (we tried to strip more but the libbpf fails to 
> load it, libbpf needs BTF and symbols)
> So I suggest to reevaluate the necessity of libbpf.
> For this specific BPF it does not present advantage and we hardly can 
> create some reusable code
> related to libbpf, i.e. any further BPF will need its own libbpf wrapper.
> The BTF is really good feature and in case some later BPF will need an 
> access to kernel
> structures it will use libbpf loader.
> What you think about it?


If we can find a way to use BTF without libbpf, it should be acceptable.

Thanks


>
>     >
>     > Changes since v1:
>     > * using libbpf instead of direct 'bpf' system call.
>     > * added libbpf dependency to the configure/meson scripts.
>     > * changed python script for eBPF .h file generation.
>     > * changed eBPF program - reading L3 proto from ethernet frame.
>     > * added TUNSETSTEERINGEBPF define for TUN.
>     > * changed the maintainer's info.
>     > * added license headers.
>     > * refactored code.
>     >
>     > Andrew (5):
>     >    net: Added SetSteeringEBPF method for NetClientState.
>     >    ebpf: Added eBPF RSS program.
>     >    ebpf: Added eBPF RSS loader.
>     >    virtio-net: Added eBPF RSS to virtio-net.
>     >    docs: Added eBPF RSS documentation.
>     >
>     >   MAINTAINERS                    |    7 +
>     >   configure                      |   33 +
>     >   docs/ebpf_rss.rst              |  133 +
>     >   ebpf/EbpfElf_to_C.py           |   36 +
>     >   ebpf/Makefile.ebpf             |   33 +
>     >   ebpf/ebpf_rss-stub.c           |   40 +
>     >   ebpf/ebpf_rss.c                |  186 ++
>     >   ebpf/ebpf_rss.h                |   44 +
>     >   ebpf/meson.build               |    1 +
>     >   ebpf/rss.bpf.c                 |  505 +++
>     >   ebpf/tun_rss_steering.h        | 5439
>     ++++++++++++++++++++++++++++++++
>     >   hw/net/vhost_net.c             |    2 +
>     >   hw/net/virtio-net.c            |  120 +-
>     >   include/hw/virtio/virtio-net.h |    4 +
>     >   include/net/net.h              |    2 +
>     >   meson.build                    |   11 +
>     >   net/tap-bsd.c                  |    5 +
>     >   net/tap-linux.c                |   13 +
>     >   net/tap-linux.h                |    1 +
>     >   net/tap-solaris.c              |    5 +
>     >   net/tap-stub.c                 |    5 +
>     >   net/tap.c                      |    9 +
>     >   net/tap_int.h                  |    1 +
>     >   net/vhost-vdpa.c               |    2 +
>     >   24 files changed, 6633 insertions(+), 4 deletions(-)
>     >   create mode 100644 docs/ebpf_rss.rst
>     >   create mode 100644 ebpf/EbpfElf_to_C.py
>     >   create mode 100755 ebpf/Makefile.ebpf
>     >   create mode 100644 ebpf/ebpf_rss-stub.c
>     >   create mode 100644 ebpf/ebpf_rss.c
>     >   create mode 100644 ebpf/ebpf_rss.h
>     >   create mode 100644 ebpf/meson.build
>     >   create mode 100644 ebpf/rss.bpf.c
>     >   create mode 100644 ebpf/tun_rss_steering.h
>     >
>


Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net
Posted by Yuri Benditovich 3 years, 4 months ago
On Fri, Nov 27, 2020 at 6:36 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/26 下午8:52, Yuri Benditovich wrote:
> >
> >
> > On Mon, Nov 23, 2020 at 8:08 AM Jason Wang <jasowang@redhat.com
> > <mailto:jasowang@redhat.com>> wrote:
> >
> >
> >     On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
> >     > This set of patches introduces the usage of eBPF for packet
> steering
> >     > and RSS hash calculation:
> >     > * RSS(Receive Side Scaling) is used to distribute network packets
> to
> >     > guest virtqueues by calculating packet hash
> >     > * Additionally adding support for the usage of RSS with vhost
> >     >
> >     > The eBPF works on kernels 5.8+
> >     > On earlier kerneld it fails to load and the RSS feature is reported
> >     > only without vhost and implemented in 'in-qemu' software.
> >     >
> >     > Implementation notes:
> >     > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF
> program.
> >     > Added libbpf dependency and eBPF support.
> >     > The eBPF program is part of the qemu and presented as an array
> >     > of BPF ELF file data.
> >     > The compilation of eBPF is not part of QEMU build and can be done
> >     > using provided Makefile.ebpf(need to adjust 'linuxhdrs').
> >     > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> >     > 'in-qemu' RSS used in the case of hash population and as a
> >     fallback option.
> >     > For vhost, the hash population feature is not reported to the
> guest.
> >     >
> >     > Please also see the documentation in PATCH 5/5.
> >     >
> >     > I am sending those patches as RFC to initiate the discussions
> >     and get
> >     > feedback on the following points:
> >     > * Fallback when eBPF is not supported by the kernel
> >     > * Live migration to the kernel that doesn't have eBPF support
> >     > * Integration with current QEMU build
> >     > * Additional usage for eBPF for packet filtering
> >     >
> >     > Known issues:
> >     > * hash population not supported by eBPF RSS: 'in-qemu' RSS used
> >     > as a fallback, also, hash population feature is not reported to
> >     guests
> >     > with vhost.
> >     > * big-endian BPF support: for now, eBPF isn't supported on
> >     > big-endian systems. Can be added in future if required.
> >     > * huge .h file with eBPF binary. The size of .h file containing
> >     > eBPF binary is currently ~5K lines, because the binary is built
> >     with debug information.
> >     > The binary without debug/BTF info can't be loaded by libbpf.
> >     > We're looking for possibilities to reduce the size of the .h files.
> >
> >
> >     A question here, is this because the binary file contains DWARF
> >     data? If
> >     yes, is it a building or loading dependency? If it's latter, maybe we
> >     can try to strip them out, anyhow it can't be recognized by kernel.
> >
> >     Thanks
> >
> >
> > After some experiments we can see that stripping of debug sections
> > reduces the size of
> > ELF from ~45K to ~20K (we tried to strip more but the libbpf fails to
> > load it, libbpf needs BTF and symbols)
> > So I suggest to reevaluate the necessity of libbpf.
> > For this specific BPF it does not present advantage and we hardly can
> > create some reusable code
> > related to libbpf, i.e. any further BPF will need its own libbpf wrapper.
> > The BTF is really good feature and in case some later BPF will need an
> > access to kernel
> > structures it will use libbpf loader.
> > What you think about it?
>
>
> If we can find a way to use BTF without libbpf, it should be acceptable.
>
> But the point is that the RSS BPF does not need the BTF as it does not use
any kernel structures.
When we have, for example, filter BPF that will need the BTF - we'll  use
libbpf for it.
Anyway we do not have here any infrastructural code related to libbpf,



> Thanks
>
>
> >
> >     >
> >     > Changes since v1:
> >     > * using libbpf instead of direct 'bpf' system call.
> >     > * added libbpf dependency to the configure/meson scripts.
> >     > * changed python script for eBPF .h file generation.
> >     > * changed eBPF program - reading L3 proto from ethernet frame.
> >     > * added TUNSETSTEERINGEBPF define for TUN.
> >     > * changed the maintainer's info.
> >     > * added license headers.
> >     > * refactored code.
> >     >
> >     > Andrew (5):
> >     >    net: Added SetSteeringEBPF method for NetClientState.
> >     >    ebpf: Added eBPF RSS program.
> >     >    ebpf: Added eBPF RSS loader.
> >     >    virtio-net: Added eBPF RSS to virtio-net.
> >     >    docs: Added eBPF RSS documentation.
> >     >
> >     >   MAINTAINERS                    |    7 +
> >     >   configure                      |   33 +
> >     >   docs/ebpf_rss.rst              |  133 +
> >     >   ebpf/EbpfElf_to_C.py           |   36 +
> >     >   ebpf/Makefile.ebpf             |   33 +
> >     >   ebpf/ebpf_rss-stub.c           |   40 +
> >     >   ebpf/ebpf_rss.c                |  186 ++
> >     >   ebpf/ebpf_rss.h                |   44 +
> >     >   ebpf/meson.build               |    1 +
> >     >   ebpf/rss.bpf.c                 |  505 +++
> >     >   ebpf/tun_rss_steering.h        | 5439
> >     ++++++++++++++++++++++++++++++++
> >     >   hw/net/vhost_net.c             |    2 +
> >     >   hw/net/virtio-net.c            |  120 +-
> >     >   include/hw/virtio/virtio-net.h |    4 +
> >     >   include/net/net.h              |    2 +
> >     >   meson.build                    |   11 +
> >     >   net/tap-bsd.c                  |    5 +
> >     >   net/tap-linux.c                |   13 +
> >     >   net/tap-linux.h                |    1 +
> >     >   net/tap-solaris.c              |    5 +
> >     >   net/tap-stub.c                 |    5 +
> >     >   net/tap.c                      |    9 +
> >     >   net/tap_int.h                  |    1 +
> >     >   net/vhost-vdpa.c               |    2 +
> >     >   24 files changed, 6633 insertions(+), 4 deletions(-)
> >     >   create mode 100644 docs/ebpf_rss.rst
> >     >   create mode 100644 ebpf/EbpfElf_to_C.py
> >     >   create mode 100755 ebpf/Makefile.ebpf
> >     >   create mode 100644 ebpf/ebpf_rss-stub.c
> >     >   create mode 100644 ebpf/ebpf_rss.c
> >     >   create mode 100644 ebpf/ebpf_rss.h
> >     >   create mode 100644 ebpf/meson.build
> >     >   create mode 100644 ebpf/rss.bpf.c
> >     >   create mode 100644 ebpf/tun_rss_steering.h
> >     >
> >
>
>
Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net
Posted by Jason Wang 3 years, 4 months ago
On 2020/11/27 下午2:06, Yuri Benditovich wrote:
>
>
>     > After some experiments we can see that stripping of debug sections
>     > reduces the size of
>     > ELF from ~45K to ~20K (we tried to strip more but the libbpf
>     fails to
>     > load it, libbpf needs BTF and symbols)
>     > So I suggest to reevaluate the necessity of libbpf.
>     > For this specific BPF it does not present advantage and we
>     hardly can
>     > create some reusable code
>     > related to libbpf, i.e. any further BPF will need its own libbpf
>     wrapper.
>     > The BTF is really good feature and in case some later BPF will
>     need an
>     > access to kernel
>     > structures it will use libbpf loader.
>     > What you think about it?
>
>
>     If we can find a way to use BTF without libbpf, it should be
>     acceptable.
>
> But the point is that the RSS BPF does not need the BTF as it does not 
> use any kernel structures.


Kinds of, it tries to access skb. But yes, it doesn't access any 
metadata of skb.


> When we have, for example, filter BPF that will need the BTF - we'll  
> use libbpf for it.
> Anyway we do not have here any infrastructural code related to libbpf,


Right, so I think we can probably start from a non BTF version without 
libbpf. And adding other features on top.

Thanks


Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net
Posted by Jason Wang 3 years, 3 months ago
On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
> This set of patches introduces the usage of eBPF for packet steering
> and RSS hash calculation:
> * RSS(Receive Side Scaling) is used to distribute network packets to
> guest virtqueues by calculating packet hash
> * Additionally adding support for the usage of RSS with vhost
>
> The eBPF works on kernels 5.8+
> On earlier kerneld it fails to load and the RSS feature is reported
> only without vhost and implemented in 'in-qemu' software.
>
> Implementation notes:
> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> Added libbpf dependency and eBPF support.
> The eBPF program is part of the qemu and presented as an array
> of BPF ELF file data.
> The compilation of eBPF is not part of QEMU build and can be done
> using provided Makefile.ebpf(need to adjust 'linuxhdrs').
> Added changes to virtio-net and vhost, primary eBPF RSS is used.
> 'in-qemu' RSS used in the case of hash population and as a fallback option.
> For vhost, the hash population feature is not reported to the guest.
>
> Please also see the documentation in PATCH 5/5.
>
> I am sending those patches as RFC to initiate the discussions and get
> feedback on the following points:
> * Fallback when eBPF is not supported by the kernel
> * Live migration to the kernel that doesn't have eBPF support
> * Integration with current QEMU build
> * Additional usage for eBPF for packet filtering
>
> Known issues:
> * hash population not supported by eBPF RSS: 'in-qemu' RSS used
> as a fallback, also, hash population feature is not reported to guests
> with vhost.
> * big-endian BPF support: for now, eBPF isn't supported on
> big-endian systems. Can be added in future if required.
> * huge .h file with eBPF binary. The size of .h file containing
> eBPF binary is currently ~5K lines, because the binary is built with debug information.
> The binary without debug/BTF info can't be loaded by libbpf.
> We're looking for possibilities to reduce the size of the .h files.


Adding Toke for sharing more idea from eBPF side.

We had some discussion on the eBPF issues:

1) Whether or not to use libbpf. Toke strongly suggest to use libbpf
2) Whether or not to use BTF. Toke confirmed that if we don't access any 
skb metadata, BTF is not strictly required for CO-RE. But it might still 
useful for e.g debugging.
3) About the huge (5K lines, see patch #2 Toke). Toke confirmed that we 
can strip debug symbols, but Yuri found some sections can't be stripped, 
we can keep discussing here.

Toke, feel free to correct me if I was wrong.

Thanks


>
> Changes since v1:
> * using libbpf instead of direct 'bpf' system call.
> * added libbpf dependency to the configure/meson scripts.
> * changed python script for eBPF .h file generation.
> * changed eBPF program - reading L3 proto from ethernet frame.
> * added TUNSETSTEERINGEBPF define for TUN.
> * changed the maintainer's info.
> * added license headers.
> * refactored code.
>
> Andrew (5):
>    net: Added SetSteeringEBPF method for NetClientState.
>    ebpf: Added eBPF RSS program.
>    ebpf: Added eBPF RSS loader.
>    virtio-net: Added eBPF RSS to virtio-net.
>    docs: Added eBPF RSS documentation.
>
>   MAINTAINERS                    |    7 +
>   configure                      |   33 +
>   docs/ebpf_rss.rst              |  133 +
>   ebpf/EbpfElf_to_C.py           |   36 +
>   ebpf/Makefile.ebpf             |   33 +
>   ebpf/ebpf_rss-stub.c           |   40 +
>   ebpf/ebpf_rss.c                |  186 ++
>   ebpf/ebpf_rss.h                |   44 +
>   ebpf/meson.build               |    1 +
>   ebpf/rss.bpf.c                 |  505 +++
>   ebpf/tun_rss_steering.h        | 5439 ++++++++++++++++++++++++++++++++
>   hw/net/vhost_net.c             |    2 +
>   hw/net/virtio-net.c            |  120 +-
>   include/hw/virtio/virtio-net.h |    4 +
>   include/net/net.h              |    2 +
>   meson.build                    |   11 +
>   net/tap-bsd.c                  |    5 +
>   net/tap-linux.c                |   13 +
>   net/tap-linux.h                |    1 +
>   net/tap-solaris.c              |    5 +
>   net/tap-stub.c                 |    5 +
>   net/tap.c                      |    9 +
>   net/tap_int.h                  |    1 +
>   net/vhost-vdpa.c               |    2 +
>   24 files changed, 6633 insertions(+), 4 deletions(-)
>   create mode 100644 docs/ebpf_rss.rst
>   create mode 100644 ebpf/EbpfElf_to_C.py
>   create mode 100755 ebpf/Makefile.ebpf
>   create mode 100644 ebpf/ebpf_rss-stub.c
>   create mode 100644 ebpf/ebpf_rss.c
>   create mode 100644 ebpf/ebpf_rss.h
>   create mode 100644 ebpf/meson.build
>   create mode 100644 ebpf/rss.bpf.c
>   create mode 100644 ebpf/tun_rss_steering.h
>


Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net
Posted by Toke Høiland-Jørgensen 3 years, 3 months ago
Jason Wang <jasowang@redhat.com> writes:

> On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
>> This set of patches introduces the usage of eBPF for packet steering
>> and RSS hash calculation:
>> * RSS(Receive Side Scaling) is used to distribute network packets to
>> guest virtqueues by calculating packet hash
>> * Additionally adding support for the usage of RSS with vhost
>>
>> The eBPF works on kernels 5.8+
>> On earlier kerneld it fails to load and the RSS feature is reported
>> only without vhost and implemented in 'in-qemu' software.
>>
>> Implementation notes:
>> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
>> Added libbpf dependency and eBPF support.
>> The eBPF program is part of the qemu and presented as an array
>> of BPF ELF file data.
>> The compilation of eBPF is not part of QEMU build and can be done
>> using provided Makefile.ebpf(need to adjust 'linuxhdrs').
>> Added changes to virtio-net and vhost, primary eBPF RSS is used.
>> 'in-qemu' RSS used in the case of hash population and as a fallback option.
>> For vhost, the hash population feature is not reported to the guest.
>>
>> Please also see the documentation in PATCH 5/5.
>>
>> I am sending those patches as RFC to initiate the discussions and get
>> feedback on the following points:
>> * Fallback when eBPF is not supported by the kernel
>> * Live migration to the kernel that doesn't have eBPF support
>> * Integration with current QEMU build
>> * Additional usage for eBPF for packet filtering
>>
>> Known issues:
>> * hash population not supported by eBPF RSS: 'in-qemu' RSS used
>> as a fallback, also, hash population feature is not reported to guests
>> with vhost.
>> * big-endian BPF support: for now, eBPF isn't supported on
>> big-endian systems. Can be added in future if required.
>> * huge .h file with eBPF binary. The size of .h file containing
>> eBPF binary is currently ~5K lines, because the binary is built with debug information.
>> The binary without debug/BTF info can't be loaded by libbpf.
>> We're looking for possibilities to reduce the size of the .h files.
>
>
> Adding Toke for sharing more idea from eBPF side.
>
> We had some discussion on the eBPF issues:
>
> 1) Whether or not to use libbpf. Toke strongly suggest to use libbpf
> 2) Whether or not to use BTF. Toke confirmed that if we don't access any 
> skb metadata, BTF is not strictly required for CO-RE. But it might still 
> useful for e.g debugging.
> 3) About the huge (5K lines, see patch #2 Toke). Toke confirmed that we 
> can strip debug symbols, but Yuri found some sections can't be stripped, 
> we can keep discussing here.

I just tried simply running 'strip' on a sample trivial XDP program,
which brought its size down from ~5k to ~1k and preserved the BTF
information without me having to do anything.

As a side note, though, instead of embedding the BPF program into a .h,
you could simply ship it as a .o and load it from the file system. We do
that with xdp-tools and install the bpf object files into /usr/$LIB/bpf/.

-Toke


Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net
Posted by Yuri Benditovich 3 years, 3 months ago
On Wed, Dec 2, 2020 at 4:18 PM Toke Høiland-Jørgensen <toke@redhat.com>
wrote:

> Jason Wang <jasowang@redhat.com> writes:
>
> > On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
> >> This set of patches introduces the usage of eBPF for packet steering
> >> and RSS hash calculation:
> >> * RSS(Receive Side Scaling) is used to distribute network packets to
> >> guest virtqueues by calculating packet hash
> >> * Additionally adding support for the usage of RSS with vhost
> >>
> >> The eBPF works on kernels 5.8+
> >> On earlier kerneld it fails to load and the RSS feature is reported
> >> only without vhost and implemented in 'in-qemu' software.
> >>
> >> Implementation notes:
> >> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> >> Added libbpf dependency and eBPF support.
> >> The eBPF program is part of the qemu and presented as an array
> >> of BPF ELF file data.
> >> The compilation of eBPF is not part of QEMU build and can be done
> >> using provided Makefile.ebpf(need to adjust 'linuxhdrs').
> >> Added changes to virtio-net and vhost, primary eBPF RSS is used.
> >> 'in-qemu' RSS used in the case of hash population and as a fallback
> option.
> >> For vhost, the hash population feature is not reported to the guest.
> >>
> >> Please also see the documentation in PATCH 5/5.
> >>
> >> I am sending those patches as RFC to initiate the discussions and get
> >> feedback on the following points:
> >> * Fallback when eBPF is not supported by the kernel
> >> * Live migration to the kernel that doesn't have eBPF support
> >> * Integration with current QEMU build
> >> * Additional usage for eBPF for packet filtering
> >>
> >> Known issues:
> >> * hash population not supported by eBPF RSS: 'in-qemu' RSS used
> >> as a fallback, also, hash population feature is not reported to guests
> >> with vhost.
> >> * big-endian BPF support: for now, eBPF isn't supported on
> >> big-endian systems. Can be added in future if required.
> >> * huge .h file with eBPF binary. The size of .h file containing
> >> eBPF binary is currently ~5K lines, because the binary is built with
> debug information.
> >> The binary without debug/BTF info can't be loaded by libbpf.
> >> We're looking for possibilities to reduce the size of the .h files.
> >
> >
> > Adding Toke for sharing more idea from eBPF side.
> >
> > We had some discussion on the eBPF issues:
> >
> > 1) Whether or not to use libbpf. Toke strongly suggest to use libbpf
> > 2) Whether or not to use BTF. Toke confirmed that if we don't access any
> > skb metadata, BTF is not strictly required for CO-RE. But it might still
> > useful for e.g debugging.
> > 3) About the huge (5K lines, see patch #2 Toke). Toke confirmed that we
> > can strip debug symbols, but Yuri found some sections can't be stripped,
> > we can keep discussing here.
>
> I just tried simply running 'strip' on a sample trivial XDP program,
> which brought its size down from ~5k to ~1k and preserved the BTF
> information without me having to do anything.
>

With our eBPF code the numbers are slightly different:
The code size without BTF: 7.5K (built without '-g')
Built with '-g': 45K
Stripped: 19K
The difference between 7.5 and 19K still seems significant, especially when
we do not use any kernel structures and do not need these BTF sections
This is only reason to prefer non-libbpf option for this specific eBPF



>
> As a side note, though, instead of embedding the BPF program into a .h,
> you could simply ship it as a .o and load it from the file system. We do
> that with xdp-tools and install the bpf object files into /usr/$LIB/bpf/.
>

Yes, we've discussed this option and decided to go with embedding the BPF
https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg02157.html


> -Toke
>
>
Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net
Posted by Toke Høiland-Jørgensen 3 years, 3 months ago
Yuri Benditovich <yuri.benditovich@daynix.com> writes:

> On Wed, Dec 2, 2020 at 4:18 PM Toke Høiland-Jørgensen <toke@redhat.com>
> wrote:
>
>> Jason Wang <jasowang@redhat.com> writes:
>>
>> > On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
>> >> This set of patches introduces the usage of eBPF for packet steering
>> >> and RSS hash calculation:
>> >> * RSS(Receive Side Scaling) is used to distribute network packets to
>> >> guest virtqueues by calculating packet hash
>> >> * Additionally adding support for the usage of RSS with vhost
>> >>
>> >> The eBPF works on kernels 5.8+
>> >> On earlier kerneld it fails to load and the RSS feature is reported
>> >> only without vhost and implemented in 'in-qemu' software.
>> >>
>> >> Implementation notes:
>> >> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
>> >> Added libbpf dependency and eBPF support.
>> >> The eBPF program is part of the qemu and presented as an array
>> >> of BPF ELF file data.
>> >> The compilation of eBPF is not part of QEMU build and can be done
>> >> using provided Makefile.ebpf(need to adjust 'linuxhdrs').
>> >> Added changes to virtio-net and vhost, primary eBPF RSS is used.
>> >> 'in-qemu' RSS used in the case of hash population and as a fallback
>> option.
>> >> For vhost, the hash population feature is not reported to the guest.
>> >>
>> >> Please also see the documentation in PATCH 5/5.
>> >>
>> >> I am sending those patches as RFC to initiate the discussions and get
>> >> feedback on the following points:
>> >> * Fallback when eBPF is not supported by the kernel
>> >> * Live migration to the kernel that doesn't have eBPF support
>> >> * Integration with current QEMU build
>> >> * Additional usage for eBPF for packet filtering
>> >>
>> >> Known issues:
>> >> * hash population not supported by eBPF RSS: 'in-qemu' RSS used
>> >> as a fallback, also, hash population feature is not reported to guests
>> >> with vhost.
>> >> * big-endian BPF support: for now, eBPF isn't supported on
>> >> big-endian systems. Can be added in future if required.
>> >> * huge .h file with eBPF binary. The size of .h file containing
>> >> eBPF binary is currently ~5K lines, because the binary is built with
>> debug information.
>> >> The binary without debug/BTF info can't be loaded by libbpf.
>> >> We're looking for possibilities to reduce the size of the .h files.
>> >
>> >
>> > Adding Toke for sharing more idea from eBPF side.
>> >
>> > We had some discussion on the eBPF issues:
>> >
>> > 1) Whether or not to use libbpf. Toke strongly suggest to use libbpf
>> > 2) Whether or not to use BTF. Toke confirmed that if we don't access any
>> > skb metadata, BTF is not strictly required for CO-RE. But it might still
>> > useful for e.g debugging.
>> > 3) About the huge (5K lines, see patch #2 Toke). Toke confirmed that we
>> > can strip debug symbols, but Yuri found some sections can't be stripped,
>> > we can keep discussing here.
>>
>> I just tried simply running 'strip' on a sample trivial XDP program,
>> which brought its size down from ~5k to ~1k and preserved the BTF
>> information without me having to do anything.
>>
>
> With our eBPF code the numbers are slightly different:
> The code size without BTF: 7.5K (built without '-g')
> Built with '-g': 45K
> Stripped: 19K
> The difference between 7.5 and 19K still seems significant, especially when
> we do not use any kernel structures and do not need these BTF sections

That does seem like a lot of BTF information. Did you confirm (with
objdump) that it's the .BTF* sections that take up these extra 12k? Do
you have some really complicated data structures in the file or
something? Got a link to the source somewhere that isn't a web mailing
list archive? :)

In any case, while I do think it smells a little of premature
optimisation, you can of course strip the BTF information until you need
it. Having it around makes debugging easier (bpftool will expand your
map structures for you when dumping maps, and that sort of thing), but
it's not really essential if you don't need CO-RE.

> This is only reason to prefer non-libbpf option for this specific eBPF

You can still use libbpf without BTF. It's using BTF without libbpf that
tends to not work so well...

>> As a side note, though, instead of embedding the BPF program into a .h,
>> you could simply ship it as a .o and load it from the file system. We do
>> that with xdp-tools and install the bpf object files into /usr/$LIB/bpf/.
>>
>
> Yes, we've discussed this option and decided to go with embedding the BPF
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg02157.html

Right, okay. I'll note, though, that if your concern is that the BPF
code should always match the rest of the code base, omitting the
compilation if there's no Clang present seems like it could lead to
problems :)

Also, if you do go the embedded-bytecode route, you may want to have a
look at the upstream 'skeleton' concept. It takes a BPF object file and
automatically generates a header file that gives you direct access to
maps, programs and global data in C. There are some examples in
selftests/bpf on how to use it, but you basically just run
'bpftool gen skeleton mybpf.o'.

-Toke


Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net
Posted by Yuri Benditovich 3 years, 3 months ago
On Fri, Dec 4, 2020 at 12:09 PM Toke Høiland-Jørgensen <toke@redhat.com>
wrote:

> Yuri Benditovich <yuri.benditovich@daynix.com> writes:
>
> > On Wed, Dec 2, 2020 at 4:18 PM Toke Høiland-Jørgensen <toke@redhat.com>
> > wrote:
> >
> >> Jason Wang <jasowang@redhat.com> writes:
> >>
> >> > On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
> >> >> This set of patches introduces the usage of eBPF for packet steering
> >> >> and RSS hash calculation:
> >> >> * RSS(Receive Side Scaling) is used to distribute network packets to
> >> >> guest virtqueues by calculating packet hash
> >> >> * Additionally adding support for the usage of RSS with vhost
> >> >>
> >> >> The eBPF works on kernels 5.8+
> >> >> On earlier kerneld it fails to load and the RSS feature is reported
> >> >> only without vhost and implemented in 'in-qemu' software.
> >> >>
> >> >> Implementation notes:
> >> >> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> >> >> Added libbpf dependency and eBPF support.
> >> >> The eBPF program is part of the qemu and presented as an array
> >> >> of BPF ELF file data.
> >> >> The compilation of eBPF is not part of QEMU build and can be done
> >> >> using provided Makefile.ebpf(need to adjust 'linuxhdrs').
> >> >> Added changes to virtio-net and vhost, primary eBPF RSS is used.
> >> >> 'in-qemu' RSS used in the case of hash population and as a fallback
> >> option.
> >> >> For vhost, the hash population feature is not reported to the guest.
> >> >>
> >> >> Please also see the documentation in PATCH 5/5.
> >> >>
> >> >> I am sending those patches as RFC to initiate the discussions and get
> >> >> feedback on the following points:
> >> >> * Fallback when eBPF is not supported by the kernel
> >> >> * Live migration to the kernel that doesn't have eBPF support
> >> >> * Integration with current QEMU build
> >> >> * Additional usage for eBPF for packet filtering
> >> >>
> >> >> Known issues:
> >> >> * hash population not supported by eBPF RSS: 'in-qemu' RSS used
> >> >> as a fallback, also, hash population feature is not reported to
> guests
> >> >> with vhost.
> >> >> * big-endian BPF support: for now, eBPF isn't supported on
> >> >> big-endian systems. Can be added in future if required.
> >> >> * huge .h file with eBPF binary. The size of .h file containing
> >> >> eBPF binary is currently ~5K lines, because the binary is built with
> >> debug information.
> >> >> The binary without debug/BTF info can't be loaded by libbpf.
> >> >> We're looking for possibilities to reduce the size of the .h files.
> >> >
> >> >
> >> > Adding Toke for sharing more idea from eBPF side.
> >> >
> >> > We had some discussion on the eBPF issues:
> >> >
> >> > 1) Whether or not to use libbpf. Toke strongly suggest to use libbpf
> >> > 2) Whether or not to use BTF. Toke confirmed that if we don't access
> any
> >> > skb metadata, BTF is not strictly required for CO-RE. But it might
> still
> >> > useful for e.g debugging.
> >> > 3) About the huge (5K lines, see patch #2 Toke). Toke confirmed that
> we
> >> > can strip debug symbols, but Yuri found some sections can't be
> stripped,
> >> > we can keep discussing here.
> >>
> >> I just tried simply running 'strip' on a sample trivial XDP program,
> >> which brought its size down from ~5k to ~1k and preserved the BTF
> >> information without me having to do anything.
> >>
> >
> > With our eBPF code the numbers are slightly different:
> > The code size without BTF: 7.5K (built without '-g')
> > Built with '-g': 45K
> > Stripped: 19K
> > The difference between 7.5 and 19K still seems significant, especially
> when
> > we do not use any kernel structures and do not need these BTF sections
>
> That does seem like a lot of BTF information. Did you confirm (with
> objdump) that it's the .BTF* sections that take up these extra 12k? Do
> you have some really complicated data structures in the file or
> something? Got a link to the source somewhere that isn't a web mailing
> list archive? :)
>
>
Looks like the extra size is related to BTF: there are 4 BTF sections that
take 12.5K
  [ 7] .BTF              PROGBITS        0000000000000000 00144c 00175d 00
     0   0  1
  [ 8] .rel.BTF          REL             0000000000000000 002bb0 000040 10
    14   7  8
  [ 9] .BTF.ext          PROGBITS        0000000000000000 002bf0 000cd0 00
     0   0  1
  [10] .rel.BTF.ext      REL             0000000000000000 0038c0 000ca0 10
    14   9  8

All the sources are at:
The branch without libbpf
https://github.com/daynix/qemu/tree/eBPF_RFC
The branch with libbpf
https://github.com/daynix/qemu/tree/eBPF_RFCv2

all the eBPF-related code is under qemu/ebpf directory.


> In any case, while I do think it smells a little of premature
> optimisation, you can of course strip the BTF information until you need
> it. Having it around makes debugging easier (bpftool will expand your
> map structures for you when dumping maps, and that sort of thing), but
> it's not really essential if you don't need CO-RE.
>
> > This is only reason to prefer non-libbpf option for this specific eBPF
>
> You can still use libbpf without BTF. It's using BTF without libbpf that
> tends to not work so well...
>
>
If we build the eBPF without '-g' or strip the BTF information out of the
object file the libbpf crashes right after issuing printout "libbpf: BTF is
required, but is missing or corrupted".
We did not investigate this too deeply but on the first glance it looks
like the presence of maps automatically makes the libbpf to require BTF.


> >> As a side note, though, instead of embedding the BPF program into a .h,
> >> you could simply ship it as a .o and load it from the file system. We do
> >> that with xdp-tools and install the bpf object files into
> /usr/$LIB/bpf/.
> >>
> >
> > Yes, we've discussed this option and decided to go with embedding the BPF
> > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg02157.html
>
> Right, okay. I'll note, though, that if your concern is that the BPF
> code should always match the rest of the code base, omitting the
> compilation if there's no Clang present seems like it could lead to
> problems :)
>
> Also, if you do go the embedded-bytecode route, you may want to have a
> look at the upstream 'skeleton' concept. It takes a BPF object file and
> automatically generates a header file that gives you direct access to
> maps, programs and global data in C. There are some examples in
> selftests/bpf on how to use it, but you basically just run
> 'bpftool gen skeleton mybpf.o'.
>
>
Indeed looks very interesting. We've missed this feature.
Thank you very much!


> -Toke
>
>
Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net
Posted by Toke Høiland-Jørgensen 3 years, 3 months ago
Yuri Benditovich <yuri.benditovich@daynix.com> writes:

> On Fri, Dec 4, 2020 at 12:09 PM Toke Høiland-Jørgensen <toke@redhat.com>
> wrote:
>
>> Yuri Benditovich <yuri.benditovich@daynix.com> writes:
>>
>> > On Wed, Dec 2, 2020 at 4:18 PM Toke Høiland-Jørgensen <toke@redhat.com>
>> > wrote:
>> >
>> >> Jason Wang <jasowang@redhat.com> writes:
>> >>
>> >> > On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
>> >> >> This set of patches introduces the usage of eBPF for packet steering
>> >> >> and RSS hash calculation:
>> >> >> * RSS(Receive Side Scaling) is used to distribute network packets to
>> >> >> guest virtqueues by calculating packet hash
>> >> >> * Additionally adding support for the usage of RSS with vhost
>> >> >>
>> >> >> The eBPF works on kernels 5.8+
>> >> >> On earlier kerneld it fails to load and the RSS feature is reported
>> >> >> only without vhost and implemented in 'in-qemu' software.
>> >> >>
>> >> >> Implementation notes:
>> >> >> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
>> >> >> Added libbpf dependency and eBPF support.
>> >> >> The eBPF program is part of the qemu and presented as an array
>> >> >> of BPF ELF file data.
>> >> >> The compilation of eBPF is not part of QEMU build and can be done
>> >> >> using provided Makefile.ebpf(need to adjust 'linuxhdrs').
>> >> >> Added changes to virtio-net and vhost, primary eBPF RSS is used.
>> >> >> 'in-qemu' RSS used in the case of hash population and as a fallback
>> >> option.
>> >> >> For vhost, the hash population feature is not reported to the guest.
>> >> >>
>> >> >> Please also see the documentation in PATCH 5/5.
>> >> >>
>> >> >> I am sending those patches as RFC to initiate the discussions and get
>> >> >> feedback on the following points:
>> >> >> * Fallback when eBPF is not supported by the kernel
>> >> >> * Live migration to the kernel that doesn't have eBPF support
>> >> >> * Integration with current QEMU build
>> >> >> * Additional usage for eBPF for packet filtering
>> >> >>
>> >> >> Known issues:
>> >> >> * hash population not supported by eBPF RSS: 'in-qemu' RSS used
>> >> >> as a fallback, also, hash population feature is not reported to
>> guests
>> >> >> with vhost.
>> >> >> * big-endian BPF support: for now, eBPF isn't supported on
>> >> >> big-endian systems. Can be added in future if required.
>> >> >> * huge .h file with eBPF binary. The size of .h file containing
>> >> >> eBPF binary is currently ~5K lines, because the binary is built with
>> >> debug information.
>> >> >> The binary without debug/BTF info can't be loaded by libbpf.
>> >> >> We're looking for possibilities to reduce the size of the .h files.
>> >> >
>> >> >
>> >> > Adding Toke for sharing more idea from eBPF side.
>> >> >
>> >> > We had some discussion on the eBPF issues:
>> >> >
>> >> > 1) Whether or not to use libbpf. Toke strongly suggest to use libbpf
>> >> > 2) Whether or not to use BTF. Toke confirmed that if we don't access
>> any
>> >> > skb metadata, BTF is not strictly required for CO-RE. But it might
>> still
>> >> > useful for e.g debugging.
>> >> > 3) About the huge (5K lines, see patch #2 Toke). Toke confirmed that
>> we
>> >> > can strip debug symbols, but Yuri found some sections can't be
>> stripped,
>> >> > we can keep discussing here.
>> >>
>> >> I just tried simply running 'strip' on a sample trivial XDP program,
>> >> which brought its size down from ~5k to ~1k and preserved the BTF
>> >> information without me having to do anything.
>> >>
>> >
>> > With our eBPF code the numbers are slightly different:
>> > The code size without BTF: 7.5K (built without '-g')
>> > Built with '-g': 45K
>> > Stripped: 19K
>> > The difference between 7.5 and 19K still seems significant, especially
>> when
>> > we do not use any kernel structures and do not need these BTF sections
>>
>> That does seem like a lot of BTF information. Did you confirm (with
>> objdump) that it's the .BTF* sections that take up these extra 12k? Do
>> you have some really complicated data structures in the file or
>> something? Got a link to the source somewhere that isn't a web mailing
>> list archive? :)
>>
>>
> Looks like the extra size is related to BTF: there are 4 BTF sections that
> take 12.5K
>   [ 7] .BTF              PROGBITS        0000000000000000 00144c 00175d 00
>      0   0  1
>   [ 8] .rel.BTF          REL             0000000000000000 002bb0 000040 10
>     14   7  8
>   [ 9] .BTF.ext          PROGBITS        0000000000000000 002bf0 000cd0 00
>      0   0  1
>   [10] .rel.BTF.ext      REL             0000000000000000 0038c0 000ca0 10
>     14   9  8

Right, okay, that does not look completely outrageous with the amount of
code and type information you have in that file.

> All the sources are at:
> The branch without libbpf
> https://github.com/daynix/qemu/tree/eBPF_RFC
> The branch with libbpf
> https://github.com/daynix/qemu/tree/eBPF_RFCv2
>
> all the eBPF-related code is under qemu/ebpf directory.

Ah, cool, thanks!

>> In any case, while I do think it smells a little of premature
>> optimisation, you can of course strip the BTF information until you need
>> it. Having it around makes debugging easier (bpftool will expand your
>> map structures for you when dumping maps, and that sort of thing), but
>> it's not really essential if you don't need CO-RE.
>>
>> > This is only reason to prefer non-libbpf option for this specific eBPF
>>
>> You can still use libbpf without BTF. It's using BTF without libbpf that
>> tends to not work so well...
>>
>>
> If we build the eBPF without '-g' or strip the BTF information out of the
> object file the libbpf crashes right after issuing printout "libbpf: BTF is
> required, but is missing or corrupted".
> We did not investigate this too deeply but on the first glance it looks
> like the presence of maps automatically makes the libbpf to require BTF.

Ah, right. Well, you're using the BTF-based map definition syntax. So
yeah, that does require BTF: The __uint() and __type() macros really
expand to type definitions that are specifically crafted to be embedded
as BTF in the file.

You could use the old-style map definitions that don't use BTF[0], but
BTF is really where things are going in BPF-land so I think longer term
you'll probably end up needing it anyway. So going to this much trouble
just to save 10k on binary size seems to me like it's a decision you'll
end up regretting :)

[0] https://github.com/xdp-project/xdp-tutorial/blob/master/basic03-map-counter/xdp_prog_kern.c#L11

-Toke


Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net
Posted by Yuri Benditovich 3 years, 3 months ago
On Fri, Dec 4, 2020 at 3:57 PM Toke Høiland-Jørgensen <toke@redhat.com>
wrote:

> Yuri Benditovich <yuri.benditovich@daynix.com> writes:
>
> > On Fri, Dec 4, 2020 at 12:09 PM Toke Høiland-Jørgensen <toke@redhat.com>
> > wrote:
> >
> >> Yuri Benditovich <yuri.benditovich@daynix.com> writes:
> >>
> >> > On Wed, Dec 2, 2020 at 4:18 PM Toke Høiland-Jørgensen <
> toke@redhat.com>
> >> > wrote:
> >> >
> >> >> Jason Wang <jasowang@redhat.com> writes:
> >> >>
> >> >> > On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
> >> >> >> This set of patches introduces the usage of eBPF for packet
> steering
> >> >> >> and RSS hash calculation:
> >> >> >> * RSS(Receive Side Scaling) is used to distribute network packets
> to
> >> >> >> guest virtqueues by calculating packet hash
> >> >> >> * Additionally adding support for the usage of RSS with vhost
> >> >> >>
> >> >> >> The eBPF works on kernels 5.8+
> >> >> >> On earlier kerneld it fails to load and the RSS feature is
> reported
> >> >> >> only without vhost and implemented in 'in-qemu' software.
> >> >> >>
> >> >> >> Implementation notes:
> >> >> >> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF
> program.
> >> >> >> Added libbpf dependency and eBPF support.
> >> >> >> The eBPF program is part of the qemu and presented as an array
> >> >> >> of BPF ELF file data.
> >> >> >> The compilation of eBPF is not part of QEMU build and can be done
> >> >> >> using provided Makefile.ebpf(need to adjust 'linuxhdrs').
> >> >> >> Added changes to virtio-net and vhost, primary eBPF RSS is used.
> >> >> >> 'in-qemu' RSS used in the case of hash population and as a
> fallback
> >> >> option.
> >> >> >> For vhost, the hash population feature is not reported to the
> guest.
> >> >> >>
> >> >> >> Please also see the documentation in PATCH 5/5.
> >> >> >>
> >> >> >> I am sending those patches as RFC to initiate the discussions and
> get
> >> >> >> feedback on the following points:
> >> >> >> * Fallback when eBPF is not supported by the kernel
> >> >> >> * Live migration to the kernel that doesn't have eBPF support
> >> >> >> * Integration with current QEMU build
> >> >> >> * Additional usage for eBPF for packet filtering
> >> >> >>
> >> >> >> Known issues:
> >> >> >> * hash population not supported by eBPF RSS: 'in-qemu' RSS used
> >> >> >> as a fallback, also, hash population feature is not reported to
> >> guests
> >> >> >> with vhost.
> >> >> >> * big-endian BPF support: for now, eBPF isn't supported on
> >> >> >> big-endian systems. Can be added in future if required.
> >> >> >> * huge .h file with eBPF binary. The size of .h file containing
> >> >> >> eBPF binary is currently ~5K lines, because the binary is built
> with
> >> >> debug information.
> >> >> >> The binary without debug/BTF info can't be loaded by libbpf.
> >> >> >> We're looking for possibilities to reduce the size of the .h
> files.
> >> >> >
> >> >> >
> >> >> > Adding Toke for sharing more idea from eBPF side.
> >> >> >
> >> >> > We had some discussion on the eBPF issues:
> >> >> >
> >> >> > 1) Whether or not to use libbpf. Toke strongly suggest to use
> libbpf
> >> >> > 2) Whether or not to use BTF. Toke confirmed that if we don't
> access
> >> any
> >> >> > skb metadata, BTF is not strictly required for CO-RE. But it might
> >> still
> >> >> > useful for e.g debugging.
> >> >> > 3) About the huge (5K lines, see patch #2 Toke). Toke confirmed
> that
> >> we
> >> >> > can strip debug symbols, but Yuri found some sections can't be
> >> stripped,
> >> >> > we can keep discussing here.
> >> >>
> >> >> I just tried simply running 'strip' on a sample trivial XDP program,
> >> >> which brought its size down from ~5k to ~1k and preserved the BTF
> >> >> information without me having to do anything.
> >> >>
> >> >
> >> > With our eBPF code the numbers are slightly different:
> >> > The code size without BTF: 7.5K (built without '-g')
> >> > Built with '-g': 45K
> >> > Stripped: 19K
> >> > The difference between 7.5 and 19K still seems significant, especially
> >> when
> >> > we do not use any kernel structures and do not need these BTF sections
> >>
> >> That does seem like a lot of BTF information. Did you confirm (with
> >> objdump) that it's the .BTF* sections that take up these extra 12k? Do
> >> you have some really complicated data structures in the file or
> >> something? Got a link to the source somewhere that isn't a web mailing
> >> list archive? :)
> >>
> >>
> > Looks like the extra size is related to BTF: there are 4 BTF sections
> that
> > take 12.5K
> >   [ 7] .BTF              PROGBITS        0000000000000000 00144c 00175d
> 00
> >      0   0  1
> >   [ 8] .rel.BTF          REL             0000000000000000 002bb0 000040
> 10
> >     14   7  8
> >   [ 9] .BTF.ext          PROGBITS        0000000000000000 002bf0 000cd0
> 00
> >      0   0  1
> >   [10] .rel.BTF.ext      REL             0000000000000000 0038c0 000ca0
> 10
> >     14   9  8
>
> Right, okay, that does not look completely outrageous with the amount of
> code and type information you have in that file.
>
> > All the sources are at:
> > The branch without libbpf
> > https://github.com/daynix/qemu/tree/eBPF_RFC
> > The branch with libbpf
> > https://github.com/daynix/qemu/tree/eBPF_RFCv2
> >
> > all the eBPF-related code is under qemu/ebpf directory.
>
> Ah, cool, thanks!
>
> >> In any case, while I do think it smells a little of premature
> >> optimisation, you can of course strip the BTF information until you need
> >> it. Having it around makes debugging easier (bpftool will expand your
> >> map structures for you when dumping maps, and that sort of thing), but
> >> it's not really essential if you don't need CO-RE.
> >>
> >> > This is only reason to prefer non-libbpf option for this specific eBPF
> >>
> >> You can still use libbpf without BTF. It's using BTF without libbpf that
> >> tends to not work so well...
> >>
> >>
> > If we build the eBPF without '-g' or strip the BTF information out of the
> > object file the libbpf crashes right after issuing printout "libbpf: BTF
> is
> > required, but is missing or corrupted".
> > We did not investigate this too deeply but on the first glance it looks
> > like the presence of maps automatically makes the libbpf to require BTF.
>
> Ah, right. Well, you're using the BTF-based map definition syntax. So
> yeah, that does require BTF: The __uint() and __type() macros really
> expand to type definitions that are specifically crafted to be embedded
> as BTF in the file.
>

Yes, now the EBPF built without '-g' also can be loaded via libbpf and we
can enable/disable BTF as we need.
Again, thank you very much!



>
> You could use the old-style map definitions that don't use BTF[0], but
> BTF is really where things are going in BPF-land so I think longer term
> you'll probably end up needing it anyway. So going to this much trouble
> just to save 10k on binary size seems to me like it's a decision you'll
> end up regretting :)
>
> [0]
> https://github.com/xdp-project/xdp-tutorial/blob/master/basic03-map-counter/xdp_prog_kern.c#L11
>
> -Toke
>
>