[RFC PATCH 0/6] eBPF RSS support for virtio-net

Andrew Melnychenko posted 6 patches 3 years, 6 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20201102185115.7425-1-andrew@daynix.com
Maintainers: Jason Wang <jasowang@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>
There is a newer version of this series
MAINTAINERS                    |   6 +
configure                      |  36 +++
docs/ebpf.rst                  |  29 ++
docs/ebpf_rss.rst              | 129 ++++++++
ebpf/EbpfElf_to_C.py           |  67 ++++
ebpf/Makefile.ebpf             |  38 +++
ebpf/ebpf-stub.c               |  28 ++
ebpf/ebpf.c                    | 107 +++++++
ebpf/ebpf.h                    |  35 +++
ebpf/ebpf_rss.c                | 178 +++++++++++
ebpf/ebpf_rss.h                |  30 ++
ebpf/meson.build               |   1 +
ebpf/rss.bpf.c                 | 470 ++++++++++++++++++++++++++++
ebpf/trace-events              |   4 +
ebpf/trace.h                   |   2 +
ebpf/tun_rss_steering.h        | 556 +++++++++++++++++++++++++++++++++
hw/net/vhost_net.c             |   2 +
hw/net/virtio-net.c            | 120 ++++++-
include/hw/virtio/virtio-net.h |   4 +
include/net/net.h              |   2 +
meson.build                    |   3 +
net/tap-bsd.c                  |   5 +
net/tap-linux.c                |  19 ++
net/tap-solaris.c              |   5 +
net/tap-stub.c                 |   5 +
net/tap.c                      |   9 +
net/tap_int.h                  |   1 +
net/vhost-vdpa.c               |   2 +
28 files changed, 1889 insertions(+), 4 deletions(-)
create mode 100644 docs/ebpf.rst
create mode 100644 docs/ebpf_rss.rst
create mode 100644 ebpf/EbpfElf_to_C.py
create mode 100755 ebpf/Makefile.ebpf
create mode 100644 ebpf/ebpf-stub.c
create mode 100644 ebpf/ebpf.c
create mode 100644 ebpf/ebpf.h
create mode 100644 ebpf/ebpf_rss.c
create mode 100644 ebpf/ebpf_rss.h
create mode 100644 ebpf/meson.build
create mode 100644 ebpf/rss.bpf.c
create mode 100644 ebpf/trace-events
create mode 100644 ebpf/trace.h
create mode 100644 ebpf/tun_rss_steering.h
[RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Andrew Melnychenko 3 years, 6 months ago
Basic idea is to use eBPF to calculate and steer packets in TAP.
RSS(Receive Side Scaling) is used to distribute network packets to guest virtqueues
by calculating packet hash.
eBPF RSS allows us to use RSS with vhost TAP.

This set of patches introduces the usage of eBPF for packet steering
and RSS hash calculation:
* RSS(Receive Side Scaling) is used to distribute network packets to
guest virtqueues by calculating packet hash
* eBPF RSS suppose to be faster than already existing 'software'
implementation in QEMU
* Additionally adding support for the usage of RSS with vhost

Supported kernels: 5.8+

Implementation notes:
Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
Added eBPF support to qemu directly through a system call, see the
bpf(2) for details.
The eBPF program is part of the qemu and presented as an array of bpf
instructions.
The program can be recompiled by provided Makefile.ebpf(need to adjust
'linuxhdrs'),
although it's not required to build QEMU with eBPF support.
Added changes to virtio-net and vhost, primary eBPF RSS is used.
'Software' RSS used in the case of hash population and as a fallback option.
For vhost, the hash population feature is not reported to the guest.

Please also see the documentation in PATCH 6/6.

I am sending those patches as RFC to initiate the discussions and get
feedback on the following points:
* Fallback when eBPF is not supported by the kernel
* Live migration to the kernel that doesn't have eBPF support
* Integration with current QEMU build
* Additional usage for eBPF for packet filtering

Know issues:
* hash population not supported by eBPF RSS: 'software' RSS used
as a fallback, also, hash population feature is not reported to guests
with vhost.
* big-endian BPF support: for now, eBPF is disabled for big-endian systems.

Andrew (6):
  Added SetSteeringEBPF method for NetClientState.
  ebpf: Added basic eBPF API.
  ebpf: Added eBPF RSS program.
  ebpf: Added eBPF RSS loader.
  virtio-net: Added eBPF RSS to virtio-net.
  docs: Added eBPF documentation.

 MAINTAINERS                    |   6 +
 configure                      |  36 +++
 docs/ebpf.rst                  |  29 ++
 docs/ebpf_rss.rst              | 129 ++++++++
 ebpf/EbpfElf_to_C.py           |  67 ++++
 ebpf/Makefile.ebpf             |  38 +++
 ebpf/ebpf-stub.c               |  28 ++
 ebpf/ebpf.c                    | 107 +++++++
 ebpf/ebpf.h                    |  35 +++
 ebpf/ebpf_rss.c                | 178 +++++++++++
 ebpf/ebpf_rss.h                |  30 ++
 ebpf/meson.build               |   1 +
 ebpf/rss.bpf.c                 | 470 ++++++++++++++++++++++++++++
 ebpf/trace-events              |   4 +
 ebpf/trace.h                   |   2 +
 ebpf/tun_rss_steering.h        | 556 +++++++++++++++++++++++++++++++++
 hw/net/vhost_net.c             |   2 +
 hw/net/virtio-net.c            | 120 ++++++-
 include/hw/virtio/virtio-net.h |   4 +
 include/net/net.h              |   2 +
 meson.build                    |   3 +
 net/tap-bsd.c                  |   5 +
 net/tap-linux.c                |  19 ++
 net/tap-solaris.c              |   5 +
 net/tap-stub.c                 |   5 +
 net/tap.c                      |   9 +
 net/tap_int.h                  |   1 +
 net/vhost-vdpa.c               |   2 +
 28 files changed, 1889 insertions(+), 4 deletions(-)
 create mode 100644 docs/ebpf.rst
 create mode 100644 docs/ebpf_rss.rst
 create mode 100644 ebpf/EbpfElf_to_C.py
 create mode 100755 ebpf/Makefile.ebpf
 create mode 100644 ebpf/ebpf-stub.c
 create mode 100644 ebpf/ebpf.c
 create mode 100644 ebpf/ebpf.h
 create mode 100644 ebpf/ebpf_rss.c
 create mode 100644 ebpf/ebpf_rss.h
 create mode 100644 ebpf/meson.build
 create mode 100644 ebpf/rss.bpf.c
 create mode 100644 ebpf/trace-events
 create mode 100644 ebpf/trace.h
 create mode 100644 ebpf/tun_rss_steering.h

-- 
2.28.0


Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Jason Wang 3 years, 6 months ago
On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> Basic idea is to use eBPF to calculate and steer packets in TAP.
> RSS(Receive Side Scaling) is used to distribute network packets to guest virtqueues
> by calculating packet hash.
> eBPF RSS allows us to use RSS with vhost TAP.
>
> This set of patches introduces the usage of eBPF for packet steering
> and RSS hash calculation:
> * RSS(Receive Side Scaling) is used to distribute network packets to
> guest virtqueues by calculating packet hash
> * eBPF RSS suppose to be faster than already existing 'software'
> implementation in QEMU
> * Additionally adding support for the usage of RSS with vhost
>
> Supported kernels: 5.8+
>
> Implementation notes:
> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> Added eBPF support to qemu directly through a system call, see the
> bpf(2) for details.
> The eBPF program is part of the qemu and presented as an array of bpf
> instructions.
> The program can be recompiled by provided Makefile.ebpf(need to adjust
> 'linuxhdrs'),
> although it's not required to build QEMU with eBPF support.
> Added changes to virtio-net and vhost, primary eBPF RSS is used.
> 'Software' RSS used in the case of hash population and as a fallback option.
> For vhost, the hash population feature is not reported to the guest.
>
> Please also see the documentation in PATCH 6/6.
>
> I am sending those patches as RFC to initiate the discussions and get
> feedback on the following points:
> * Fallback when eBPF is not supported by the kernel


Yes, and it could also a lacking of CAP_BPF.


> * Live migration to the kernel that doesn't have eBPF support


Is there anything that we needs special treatment here?


> * Integration with current QEMU build


Yes, a question here:

1) Any reason for not using libbpf, e.g it has been shipped with some 
distros
2) It would be better if we can avoid shipping bytecodes


> * Additional usage for eBPF for packet filtering


Another interesting topics in to implement mac/vlan filters. And in the 
future, I plan to add mac based steering. All of these could be done via 
eBPF.


>
> Know issues:
> * hash population not supported by eBPF RSS: 'software' RSS used


Is this because there's not way to write to vnet header in STERRING BPF?


> as a fallback, also, hash population feature is not reported to guests
> with vhost.
> * big-endian BPF support: for now, eBPF is disabled for big-endian systems.


Are there any blocker for this?

Just some quick questions after a glance of the codes. Will go through 
them tomorrow.

Thanks


>
> Andrew (6):
>    Added SetSteeringEBPF method for NetClientState.
>    ebpf: Added basic eBPF API.
>    ebpf: Added eBPF RSS program.
>    ebpf: Added eBPF RSS loader.
>    virtio-net: Added eBPF RSS to virtio-net.
>    docs: Added eBPF documentation.
>
>   MAINTAINERS                    |   6 +
>   configure                      |  36 +++
>   docs/ebpf.rst                  |  29 ++
>   docs/ebpf_rss.rst              | 129 ++++++++
>   ebpf/EbpfElf_to_C.py           |  67 ++++
>   ebpf/Makefile.ebpf             |  38 +++
>   ebpf/ebpf-stub.c               |  28 ++
>   ebpf/ebpf.c                    | 107 +++++++
>   ebpf/ebpf.h                    |  35 +++
>   ebpf/ebpf_rss.c                | 178 +++++++++++
>   ebpf/ebpf_rss.h                |  30 ++
>   ebpf/meson.build               |   1 +
>   ebpf/rss.bpf.c                 | 470 ++++++++++++++++++++++++++++
>   ebpf/trace-events              |   4 +
>   ebpf/trace.h                   |   2 +
>   ebpf/tun_rss_steering.h        | 556 +++++++++++++++++++++++++++++++++
>   hw/net/vhost_net.c             |   2 +
>   hw/net/virtio-net.c            | 120 ++++++-
>   include/hw/virtio/virtio-net.h |   4 +
>   include/net/net.h              |   2 +
>   meson.build                    |   3 +
>   net/tap-bsd.c                  |   5 +
>   net/tap-linux.c                |  19 ++
>   net/tap-solaris.c              |   5 +
>   net/tap-stub.c                 |   5 +
>   net/tap.c                      |   9 +
>   net/tap_int.h                  |   1 +
>   net/vhost-vdpa.c               |   2 +
>   28 files changed, 1889 insertions(+), 4 deletions(-)
>   create mode 100644 docs/ebpf.rst
>   create mode 100644 docs/ebpf_rss.rst
>   create mode 100644 ebpf/EbpfElf_to_C.py
>   create mode 100755 ebpf/Makefile.ebpf
>   create mode 100644 ebpf/ebpf-stub.c
>   create mode 100644 ebpf/ebpf.c
>   create mode 100644 ebpf/ebpf.h
>   create mode 100644 ebpf/ebpf_rss.c
>   create mode 100644 ebpf/ebpf_rss.h
>   create mode 100644 ebpf/meson.build
>   create mode 100644 ebpf/rss.bpf.c
>   create mode 100644 ebpf/trace-events
>   create mode 100644 ebpf/trace.h
>   create mode 100644 ebpf/tun_rss_steering.h
>


Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Yuri Benditovich 3 years, 6 months ago
On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > Basic idea is to use eBPF to calculate and steer packets in TAP.
> > RSS(Receive Side Scaling) is used to distribute network packets to guest
> virtqueues
> > by calculating packet hash.
> > eBPF RSS allows us to use RSS with vhost TAP.
> >
> > This set of patches introduces the usage of eBPF for packet steering
> > and RSS hash calculation:
> > * RSS(Receive Side Scaling) is used to distribute network packets to
> > guest virtqueues by calculating packet hash
> > * eBPF RSS suppose to be faster than already existing 'software'
> > implementation in QEMU
> > * Additionally adding support for the usage of RSS with vhost
> >
> > Supported kernels: 5.8+
> >
> > Implementation notes:
> > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> > Added eBPF support to qemu directly through a system call, see the
> > bpf(2) for details.
> > The eBPF program is part of the qemu and presented as an array of bpf
> > instructions.
> > The program can be recompiled by provided Makefile.ebpf(need to adjust
> > 'linuxhdrs'),
> > although it's not required to build QEMU with eBPF support.
> > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> > 'Software' RSS used in the case of hash population and as a fallback
> option.
> > For vhost, the hash population feature is not reported to the guest.
> >
> > Please also see the documentation in PATCH 6/6.
> >
> > I am sending those patches as RFC to initiate the discussions and get
> > feedback on the following points:
> > * Fallback when eBPF is not supported by the kernel
>
>
> Yes, and it could also a lacking of CAP_BPF.
>
>
> > * Live migration to the kernel that doesn't have eBPF support
>
>
> Is there anything that we needs special treatment here?
>
> Possible case: rss=on, vhost=on, source system with kernel 5.8 (everything
works) -> dest. system 5.6 (bpf does not work), the adapter functions, but
all the steering does not use proper queues.




>
> > * Integration with current QEMU build
>
>
> Yes, a question here:
>
> 1) Any reason for not using libbpf, e.g it has been shipped with some
> distros
>

We intentionally do not use libbpf, as it present only on some distros.
We can switch to libbpf, but this will disable bpf if libbpf is not
installed


> 2) It would be better if we can avoid shipping bytecodes
>


This creates new dependencies: llvm + clang + ...
We would prefer byte code and ability to generate it if prerequisites are
installed.


>
>
> > * Additional usage for eBPF for packet filtering
>
>
> Another interesting topics in to implement mac/vlan filters. And in the
> future, I plan to add mac based steering. All of these could be done via
> eBPF.
>
>
No problem, we can cooperate if needed


>
> >
> > Know issues:
> > * hash population not supported by eBPF RSS: 'software' RSS used
>
>
> Is this because there's not way to write to vnet header in STERRING BPF?
>
> Yes. We plan to submit changes for kernel to cooperate with BPF and
populate the hash, this work is in progress


>
> > as a fallback, also, hash population feature is not reported to guests
> > with vhost.
> > * big-endian BPF support: for now, eBPF is disabled for big-endian
> systems.
>
>
> Are there any blocker for this?
>

No, can be added in v2


>
> Just some quick questions after a glance of the codes. Will go through
> them tomorrow.
>
> Thanks
>
>
> >
> > Andrew (6):
> >    Added SetSteeringEBPF method for NetClientState.
> >    ebpf: Added basic eBPF API.
> >    ebpf: Added eBPF RSS program.
> >    ebpf: Added eBPF RSS loader.
> >    virtio-net: Added eBPF RSS to virtio-net.
> >    docs: Added eBPF documentation.
> >
> >   MAINTAINERS                    |   6 +
> >   configure                      |  36 +++
> >   docs/ebpf.rst                  |  29 ++
> >   docs/ebpf_rss.rst              | 129 ++++++++
> >   ebpf/EbpfElf_to_C.py           |  67 ++++
> >   ebpf/Makefile.ebpf             |  38 +++
> >   ebpf/ebpf-stub.c               |  28 ++
> >   ebpf/ebpf.c                    | 107 +++++++
> >   ebpf/ebpf.h                    |  35 +++
> >   ebpf/ebpf_rss.c                | 178 +++++++++++
> >   ebpf/ebpf_rss.h                |  30 ++
> >   ebpf/meson.build               |   1 +
> >   ebpf/rss.bpf.c                 | 470 ++++++++++++++++++++++++++++
> >   ebpf/trace-events              |   4 +
> >   ebpf/trace.h                   |   2 +
> >   ebpf/tun_rss_steering.h        | 556 +++++++++++++++++++++++++++++++++
> >   hw/net/vhost_net.c             |   2 +
> >   hw/net/virtio-net.c            | 120 ++++++-
> >   include/hw/virtio/virtio-net.h |   4 +
> >   include/net/net.h              |   2 +
> >   meson.build                    |   3 +
> >   net/tap-bsd.c                  |   5 +
> >   net/tap-linux.c                |  19 ++
> >   net/tap-solaris.c              |   5 +
> >   net/tap-stub.c                 |   5 +
> >   net/tap.c                      |   9 +
> >   net/tap_int.h                  |   1 +
> >   net/vhost-vdpa.c               |   2 +
> >   28 files changed, 1889 insertions(+), 4 deletions(-)
> >   create mode 100644 docs/ebpf.rst
> >   create mode 100644 docs/ebpf_rss.rst
> >   create mode 100644 ebpf/EbpfElf_to_C.py
> >   create mode 100755 ebpf/Makefile.ebpf
> >   create mode 100644 ebpf/ebpf-stub.c
> >   create mode 100644 ebpf/ebpf.c
> >   create mode 100644 ebpf/ebpf.h
> >   create mode 100644 ebpf/ebpf_rss.c
> >   create mode 100644 ebpf/ebpf_rss.h
> >   create mode 100644 ebpf/meson.build
> >   create mode 100644 ebpf/rss.bpf.c
> >   create mode 100644 ebpf/trace-events
> >   create mode 100644 ebpf/trace.h
> >   create mode 100644 ebpf/tun_rss_steering.h
> >
>
>
Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Daniel P. Berrangé 3 years, 6 months ago
On Tue, Nov 03, 2020 at 12:32:43PM +0200, Yuri Benditovich wrote:
> On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com> wrote:
> 
> >
> > On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > > Basic idea is to use eBPF to calculate and steer packets in TAP.
> > > RSS(Receive Side Scaling) is used to distribute network packets to guest
> > virtqueues
> > > by calculating packet hash.
> > > eBPF RSS allows us to use RSS with vhost TAP.
> > >
> > > This set of patches introduces the usage of eBPF for packet steering
> > > and RSS hash calculation:
> > > * RSS(Receive Side Scaling) is used to distribute network packets to
> > > guest virtqueues by calculating packet hash
> > > * eBPF RSS suppose to be faster than already existing 'software'
> > > implementation in QEMU
> > > * Additionally adding support for the usage of RSS with vhost
> > >
> > > Supported kernels: 5.8+
> > >
> > > Implementation notes:
> > > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> > > Added eBPF support to qemu directly through a system call, see the
> > > bpf(2) for details.
> > > The eBPF program is part of the qemu and presented as an array of bpf
> > > instructions.
> > > The program can be recompiled by provided Makefile.ebpf(need to adjust
> > > 'linuxhdrs'),
> > > although it's not required to build QEMU with eBPF support.
> > > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> > > 'Software' RSS used in the case of hash population and as a fallback
> > option.
> > > For vhost, the hash population feature is not reported to the guest.
> > >
> > > Please also see the documentation in PATCH 6/6.
> > >
> > > I am sending those patches as RFC to initiate the discussions and get
> > > feedback on the following points:
> > > * Fallback when eBPF is not supported by the kernel
> >
> >
> > Yes, and it could also a lacking of CAP_BPF.
> >
> >
> > > * Live migration to the kernel that doesn't have eBPF support
> >
> >
> > Is there anything that we needs special treatment here?
> >
> > Possible case: rss=on, vhost=on, source system with kernel 5.8 (everything
> works) -> dest. system 5.6 (bpf does not work), the adapter functions, but
> all the steering does not use proper queues.
> 
> 
> 
> 
> >
> > > * Integration with current QEMU build
> >
> >
> > Yes, a question here:
> >
> > 1) Any reason for not using libbpf, e.g it has been shipped with some
> > distros
> >
> 
> We intentionally do not use libbpf, as it present only on some distros.
> We can switch to libbpf, but this will disable bpf if libbpf is not
> installed

If we were modifying existing funtionality then introducing a dep on
libbpf would be a problem as you'd be breaking existing QEMU users
on distros without libbpf.

This is brand new functionality though, so it is fine to place a
requirement on libbpf. If distros don't ship that library and they
want BPF features in QEMU, then those distros should take responsibility
for adding libbpf to their package set.

> > 2) It would be better if we can avoid shipping bytecodes
> >
> 
> 
> This creates new dependencies: llvm + clang + ...
> We would prefer byte code and ability to generate it if prerequisites are
> installed.

I've double checked with Fedora, and generating the BPF program from
source is a mandatory requirement for QEMU. Pre-generated BPF bytecode
is not permitted.

There was also a question raised about the kernel ABI compatibility
for BPF programs ? 

  https://lwn.net/Articles/831402/

  "The basic problem is that when BPF is compiled, it uses a set
   of kernel headers that describe various kernel data structures
   for that particular version, which may be different from those
   on the kernel where the program is run. Until relatively recently,
   that was solved by distributing the BPF as C code along with the
   Clang compiler to build the BPF on the system where it was going
   to be run."

Is this not an issue for QEMU's usage of BPF here ?

The dependancy on llvm is unfortunate for people who build with GCC,
but at least they can opt-out via a configure switch if they really
want to. As that LWN article notes, GCC will gain BPF support


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Jason Wang 3 years, 6 months ago
On 2020/11/3 下午7:56, Daniel P. Berrangé wrote:
> On Tue, Nov 03, 2020 at 12:32:43PM +0200, Yuri Benditovich wrote:
>> On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>>> On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
>>>> Basic idea is to use eBPF to calculate and steer packets in TAP.
>>>> RSS(Receive Side Scaling) is used to distribute network packets to guest
>>> virtqueues
>>>> by calculating packet hash.
>>>> eBPF RSS allows us to use RSS with vhost TAP.
>>>>
>>>> This set of patches introduces the usage of eBPF for packet steering
>>>> and RSS hash calculation:
>>>> * RSS(Receive Side Scaling) is used to distribute network packets to
>>>> guest virtqueues by calculating packet hash
>>>> * eBPF RSS suppose to be faster than already existing 'software'
>>>> implementation in QEMU
>>>> * Additionally adding support for the usage of RSS with vhost
>>>>
>>>> Supported kernels: 5.8+
>>>>
>>>> Implementation notes:
>>>> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
>>>> Added eBPF support to qemu directly through a system call, see the
>>>> bpf(2) for details.
>>>> The eBPF program is part of the qemu and presented as an array of bpf
>>>> instructions.
>>>> The program can be recompiled by provided Makefile.ebpf(need to adjust
>>>> 'linuxhdrs'),
>>>> although it's not required to build QEMU with eBPF support.
>>>> Added changes to virtio-net and vhost, primary eBPF RSS is used.
>>>> 'Software' RSS used in the case of hash population and as a fallback
>>> option.
>>>> For vhost, the hash population feature is not reported to the guest.
>>>>
>>>> Please also see the documentation in PATCH 6/6.
>>>>
>>>> I am sending those patches as RFC to initiate the discussions and get
>>>> feedback on the following points:
>>>> * Fallback when eBPF is not supported by the kernel
>>>
>>> Yes, and it could also a lacking of CAP_BPF.
>>>
>>>
>>>> * Live migration to the kernel that doesn't have eBPF support
>>>
>>> Is there anything that we needs special treatment here?
>>>
>>> Possible case: rss=on, vhost=on, source system with kernel 5.8 (everything
>> works) -> dest. system 5.6 (bpf does not work), the adapter functions, but
>> all the steering does not use proper queues.
>>
>>
>>
>>
>>>> * Integration with current QEMU build
>>>
>>> Yes, a question here:
>>>
>>> 1) Any reason for not using libbpf, e.g it has been shipped with some
>>> distros
>>>
>> We intentionally do not use libbpf, as it present only on some distros.
>> We can switch to libbpf, but this will disable bpf if libbpf is not
>> installed
> If we were modifying existing funtionality then introducing a dep on
> libbpf would be a problem as you'd be breaking existing QEMU users
> on distros without libbpf.
>
> This is brand new functionality though, so it is fine to place a
> requirement on libbpf. If distros don't ship that library and they
> want BPF features in QEMU, then those distros should take responsibility
> for adding libbpf to their package set.
>
>>> 2) It would be better if we can avoid shipping bytecodes
>>>
>>
>> This creates new dependencies: llvm + clang + ...
>> We would prefer byte code and ability to generate it if prerequisites are
>> installed.
> I've double checked with Fedora, and generating the BPF program from
> source is a mandatory requirement for QEMU. Pre-generated BPF bytecode
> is not permitted.
>
> There was also a question raised about the kernel ABI compatibility
> for BPF programs ?
>
>    https://lwn.net/Articles/831402/
>
>    "The basic problem is that when BPF is compiled, it uses a set
>     of kernel headers that describe various kernel data structures
>     for that particular version, which may be different from those
>     on the kernel where the program is run. Until relatively recently,
>     that was solved by distributing the BPF as C code along with the
>     Clang compiler to build the BPF on the system where it was going
>     to be run."
>
> Is this not an issue for QEMU's usage of BPF here ?


That's good point. Actually, DPDK ships RSS bytecodes but I don't know 
it works.

But as mentioned in the link, if we generate the code with BTF that 
would be fine.

Thanks


>
> The dependancy on llvm is unfortunate for people who build with GCC,
> but at least they can opt-out via a configure switch if they really
> want to. As that LWN article notes, GCC will gain BPF support
>
>
> Regards,
> Daniel


Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Jason Wang 3 years, 6 months ago
On 2020/11/3 下午6:32, Yuri Benditovich wrote:
>
>
> On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com 
> <mailto:jasowang@redhat.com>> wrote:
>
>
>     On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
>     > Basic idea is to use eBPF to calculate and steer packets in TAP.
>     > RSS(Receive Side Scaling) is used to distribute network packets
>     to guest virtqueues
>     > by calculating packet hash.
>     > eBPF RSS allows us to use RSS with vhost TAP.
>     >
>     > This set of patches introduces the usage of eBPF for packet steering
>     > and RSS hash calculation:
>     > * RSS(Receive Side Scaling) is used to distribute network packets to
>     > guest virtqueues by calculating packet hash
>     > * eBPF RSS suppose to be faster than already existing 'software'
>     > implementation in QEMU
>     > * Additionally adding support for the usage of RSS with vhost
>     >
>     > Supported kernels: 5.8+
>     >
>     > Implementation notes:
>     > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
>     > Added eBPF support to qemu directly through a system call, see the
>     > bpf(2) for details.
>     > The eBPF program is part of the qemu and presented as an array
>     of bpf
>     > instructions.
>     > The program can be recompiled by provided Makefile.ebpf(need to
>     adjust
>     > 'linuxhdrs'),
>     > although it's not required to build QEMU with eBPF support.
>     > Added changes to virtio-net and vhost, primary eBPF RSS is used.
>     > 'Software' RSS used in the case of hash population and as a
>     fallback option.
>     > For vhost, the hash population feature is not reported to the guest.
>     >
>     > Please also see the documentation in PATCH 6/6.
>     >
>     > I am sending those patches as RFC to initiate the discussions
>     and get
>     > feedback on the following points:
>     > * Fallback when eBPF is not supported by the kernel
>
>
>     Yes, and it could also a lacking of CAP_BPF.
>
>
>     > * Live migration to the kernel that doesn't have eBPF support
>
>
>     Is there anything that we needs special treatment here?
>
> Possible case: rss=on, vhost=on, source system with kernel 5.8 
> (everything works) -> dest. system 5.6 (bpf does not work), the 
> adapter functions, but all the steering does not use proper queues.


Right, I think we need to disable vhost on dest.


>
>
>
>     > * Integration with current QEMU build
>
>
>     Yes, a question here:
>
>     1) Any reason for not using libbpf, e.g it has been shipped with some
>     distros
>
>
> We intentionally do not use libbpf, as it present only on some distros.
> We can switch to libbpf, but this will disable bpf if libbpf is not 
> installed


That's better I think.


>     2) It would be better if we can avoid shipping bytecodes
>
>
>
> This creates new dependencies: llvm + clang + ...
> We would prefer byte code and ability to generate it if prerequisites 
> are installed.


It's probably ok if we treat the bytecode as a kind of firmware.

But in the long run, it's still worthwhile consider the qemu source is 
used for development and llvm/clang should be a common requirement for 
generating eBPF bytecode for host.


>
>
>     > * Additional usage for eBPF for packet filtering
>
>
>     Another interesting topics in to implement mac/vlan filters. And
>     in the
>     future, I plan to add mac based steering. All of these could be
>     done via
>     eBPF.
>
>
> No problem, we can cooperate if needed
>
>
>     >
>     > Know issues:
>     > * hash population not supported by eBPF RSS: 'software' RSS used
>
>
>     Is this because there's not way to write to vnet header in
>     STERRING BPF?
>
> Yes. We plan to submit changes for kernel to cooperate with BPF and 
> populate the hash, this work is in progress


That would require a new type of eBPF program and may need some work on 
verifier.

Btw, macvtap is still lacking even steering ebpf program. Would you want 
to post a patch to support that?


>
>     > as a fallback, also, hash population feature is not reported to
>     guests
>     > with vhost.
>     > * big-endian BPF support: for now, eBPF is disabled for
>     big-endian systems.
>
>
>     Are there any blocker for this?
>
>
> No, can be added in v2


Cool.

Thanks


>
>     Just some quick questions after a glance of the codes. Will go
>     through
>     them tomorrow.
>
>     Thanks
>
>
>     >
>     > Andrew (6):
>     >    Added SetSteeringEBPF method for NetClientState.
>     >    ebpf: Added basic eBPF API.
>     >    ebpf: Added eBPF RSS program.
>     >    ebpf: Added eBPF RSS loader.
>     >    virtio-net: Added eBPF RSS to virtio-net.
>     >    docs: Added eBPF documentation.
>     >
>     >   MAINTAINERS                    |   6 +
>     >   configure                      |  36 +++
>     >   docs/ebpf.rst                  |  29 ++
>     >   docs/ebpf_rss.rst              | 129 ++++++++
>     >   ebpf/EbpfElf_to_C.py           |  67 ++++
>     >   ebpf/Makefile.ebpf             |  38 +++
>     >   ebpf/ebpf-stub.c               |  28 ++
>     >   ebpf/ebpf.c                    | 107 +++++++
>     >   ebpf/ebpf.h                    |  35 +++
>     >   ebpf/ebpf_rss.c                | 178 +++++++++++
>     >   ebpf/ebpf_rss.h                |  30 ++
>     >   ebpf/meson.build               |   1 +
>     >   ebpf/rss.bpf.c                 | 470 ++++++++++++++++++++++++++++
>     >   ebpf/trace-events              |   4 +
>     >   ebpf/trace.h                   |   2 +
>     >   ebpf/tun_rss_steering.h        | 556
>     +++++++++++++++++++++++++++++++++
>     >   hw/net/vhost_net.c             |   2 +
>     >   hw/net/virtio-net.c            | 120 ++++++-
>     >   include/hw/virtio/virtio-net.h |   4 +
>     >   include/net/net.h              |   2 +
>     >   meson.build                    |   3 +
>     >   net/tap-bsd.c                  |   5 +
>     >   net/tap-linux.c                |  19 ++
>     >   net/tap-solaris.c              |   5 +
>     >   net/tap-stub.c                 |   5 +
>     >   net/tap.c                      |   9 +
>     >   net/tap_int.h                  |   1 +
>     >   net/vhost-vdpa.c               |   2 +
>     >   28 files changed, 1889 insertions(+), 4 deletions(-)
>     >   create mode 100644 docs/ebpf.rst
>     >   create mode 100644 docs/ebpf_rss.rst
>     >   create mode 100644 ebpf/EbpfElf_to_C.py
>     >   create mode 100755 ebpf/Makefile.ebpf
>     >   create mode 100644 ebpf/ebpf-stub.c
>     >   create mode 100644 ebpf/ebpf.c
>     >   create mode 100644 ebpf/ebpf.h
>     >   create mode 100644 ebpf/ebpf_rss.c
>     >   create mode 100644 ebpf/ebpf_rss.h
>     >   create mode 100644 ebpf/meson.build
>     >   create mode 100644 ebpf/rss.bpf.c
>     >   create mode 100644 ebpf/trace-events
>     >   create mode 100644 ebpf/trace.h
>     >   create mode 100644 ebpf/tun_rss_steering.h
>     >
>


Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Daniel P. Berrangé 3 years, 6 months ago
On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
> 
> On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> > 
> > 
> > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> > <mailto:jasowang@redhat.com>> wrote:
> > 
> > 
> >     On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> >     > Basic idea is to use eBPF to calculate and steer packets in TAP.
> >     > RSS(Receive Side Scaling) is used to distribute network packets
> >     to guest virtqueues
> >     > by calculating packet hash.
> >     > eBPF RSS allows us to use RSS with vhost TAP.
> >     >
> >     > This set of patches introduces the usage of eBPF for packet steering
> >     > and RSS hash calculation:
> >     > * RSS(Receive Side Scaling) is used to distribute network packets to
> >     > guest virtqueues by calculating packet hash
> >     > * eBPF RSS suppose to be faster than already existing 'software'
> >     > implementation in QEMU
> >     > * Additionally adding support for the usage of RSS with vhost
> >     >
> >     > Supported kernels: 5.8+
> >     >
> >     > Implementation notes:
> >     > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> >     > Added eBPF support to qemu directly through a system call, see the
> >     > bpf(2) for details.
> >     > The eBPF program is part of the qemu and presented as an array
> >     of bpf
> >     > instructions.
> >     > The program can be recompiled by provided Makefile.ebpf(need to
> >     adjust
> >     > 'linuxhdrs'),
> >     > although it's not required to build QEMU with eBPF support.
> >     > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> >     > 'Software' RSS used in the case of hash population and as a
> >     fallback option.
> >     > For vhost, the hash population feature is not reported to the guest.
> >     >
> >     > Please also see the documentation in PATCH 6/6.
> >     >
> >     > I am sending those patches as RFC to initiate the discussions
> >     and get
> >     > feedback on the following points:
> >     > * Fallback when eBPF is not supported by the kernel
> > 
> > 
> >     Yes, and it could also a lacking of CAP_BPF.
> > 
> > 
> >     > * Live migration to the kernel that doesn't have eBPF support
> > 
> > 
> >     Is there anything that we needs special treatment here?
> > 
> > Possible case: rss=on, vhost=on, source system with kernel 5.8
> > (everything works) -> dest. system 5.6 (bpf does not work), the adapter
> > functions, but all the steering does not use proper queues.
> 
> 
> Right, I think we need to disable vhost on dest.
> 
> 
> > 
> > 
> > 
> >     > * Integration with current QEMU build
> > 
> > 
> >     Yes, a question here:
> > 
> >     1) Any reason for not using libbpf, e.g it has been shipped with some
> >     distros
> > 
> > 
> > We intentionally do not use libbpf, as it present only on some distros.
> > We can switch to libbpf, but this will disable bpf if libbpf is not
> > installed
> 
> 
> That's better I think.
> 
> 
> >     2) It would be better if we can avoid shipping bytecodes
> > 
> > 
> > 
> > This creates new dependencies: llvm + clang + ...
> > We would prefer byte code and ability to generate it if prerequisites
> > are installed.
> 
> 
> It's probably ok if we treat the bytecode as a kind of firmware.

That is explicitly *not* OK for inclusion in Fedora. They require that
BPF is compiled from source, and rejected my suggestion that it could
be considered a kind of firmware and thus have an exception from building
from source.

> But in the long run, it's still worthwhile consider the qemu source is used
> for development and llvm/clang should be a common requirement for generating
> eBPF bytecode for host.

So we need to do this right straight way before this merges.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Jason Wang 3 years, 5 months ago
On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
>> On 2020/11/3 下午6:32, Yuri Benditovich wrote:
>>>
>>> On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
>>> <mailto:jasowang@redhat.com>> wrote:
>>>
>>>
>>>      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
>>>      > Basic idea is to use eBPF to calculate and steer packets in TAP.
>>>      > RSS(Receive Side Scaling) is used to distribute network packets
>>>      to guest virtqueues
>>>      > by calculating packet hash.
>>>      > eBPF RSS allows us to use RSS with vhost TAP.
>>>      >
>>>      > This set of patches introduces the usage of eBPF for packet steering
>>>      > and RSS hash calculation:
>>>      > * RSS(Receive Side Scaling) is used to distribute network packets to
>>>      > guest virtqueues by calculating packet hash
>>>      > * eBPF RSS suppose to be faster than already existing 'software'
>>>      > implementation in QEMU
>>>      > * Additionally adding support for the usage of RSS with vhost
>>>      >
>>>      > Supported kernels: 5.8+
>>>      >
>>>      > Implementation notes:
>>>      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
>>>      > Added eBPF support to qemu directly through a system call, see the
>>>      > bpf(2) for details.
>>>      > The eBPF program is part of the qemu and presented as an array
>>>      of bpf
>>>      > instructions.
>>>      > The program can be recompiled by provided Makefile.ebpf(need to
>>>      adjust
>>>      > 'linuxhdrs'),
>>>      > although it's not required to build QEMU with eBPF support.
>>>      > Added changes to virtio-net and vhost, primary eBPF RSS is used.
>>>      > 'Software' RSS used in the case of hash population and as a
>>>      fallback option.
>>>      > For vhost, the hash population feature is not reported to the guest.
>>>      >
>>>      > Please also see the documentation in PATCH 6/6.
>>>      >
>>>      > I am sending those patches as RFC to initiate the discussions
>>>      and get
>>>      > feedback on the following points:
>>>      > * Fallback when eBPF is not supported by the kernel
>>>
>>>
>>>      Yes, and it could also a lacking of CAP_BPF.
>>>
>>>
>>>      > * Live migration to the kernel that doesn't have eBPF support
>>>
>>>
>>>      Is there anything that we needs special treatment here?
>>>
>>> Possible case: rss=on, vhost=on, source system with kernel 5.8
>>> (everything works) -> dest. system 5.6 (bpf does not work), the adapter
>>> functions, but all the steering does not use proper queues.
>>
>> Right, I think we need to disable vhost on dest.
>>
>>
>>>
>>>
>>>      > * Integration with current QEMU build
>>>
>>>
>>>      Yes, a question here:
>>>
>>>      1) Any reason for not using libbpf, e.g it has been shipped with some
>>>      distros
>>>
>>>
>>> We intentionally do not use libbpf, as it present only on some distros.
>>> We can switch to libbpf, but this will disable bpf if libbpf is not
>>> installed
>>
>> That's better I think.
>>
>>
>>>      2) It would be better if we can avoid shipping bytecodes
>>>
>>>
>>>
>>> This creates new dependencies: llvm + clang + ...
>>> We would prefer byte code and ability to generate it if prerequisites
>>> are installed.
>>
>> It's probably ok if we treat the bytecode as a kind of firmware.
> That is explicitly *not* OK for inclusion in Fedora. They require that
> BPF is compiled from source, and rejected my suggestion that it could
> be considered a kind of firmware and thus have an exception from building
> from source.


Please refer what it was done in DPDK:

http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235

I don't think what proposed here makes anything different.

It's still a bytecode that lives in an array.


>
>> But in the long run, it's still worthwhile consider the qemu source is used
>> for development and llvm/clang should be a common requirement for generating
>> eBPF bytecode for host.
> So we need to do this right straight way before this merges.


Yes.

Thanks


>
> Regards,
> Daniel


Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Jason Wang 3 years, 5 months ago
On 2020/11/5 上午11:46, Jason Wang wrote:
>>
>> It's probably ok if we treat the bytecode as a kind of firmware.
> That is explicitly *not* OK for inclusion in Fedora. They require that
> BPF is compiled from source, and rejected my suggestion that it could
> be considered a kind of firmware and thus have an exception from building
> from source. 


Actually, there's another advantages. If we treat it as firmware, 
(actually it is). It allows us to upgrade it independently with qemu.

Thanks


Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Yuri Benditovich 3 years, 5 months ago
On Thu, Nov 5, 2020 at 5:52 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/5 上午11:46, Jason Wang wrote:
> >>
> >> It's probably ok if we treat the bytecode as a kind of firmware.
> > That is explicitly *not* OK for inclusion in Fedora. They require that
> > BPF is compiled from source, and rejected my suggestion that it could
> > be considered a kind of firmware and thus have an exception from building
> > from source.
>
>
> Actually, there's another advantages. If we treat it as firmware,
> (actually it is). It allows us to upgrade it independently with qemu.
>
> Hi Jason,
I think this is a big disadvantage to have the BPF binary outside of QEMU.
It is compiled with common structures (for example RSS configuration)
defined in QEMU and if it is not built in the QEMU then nobody is
responsible for the compatibility of the BPF and QEMU.
Just an array of instructions (af today) is ~2k, full object file (if we
use libbpf) is ~8K, so there is no big problem with the size.
If we even keep the entire object in QEMU, it is for sure 100% compatible.

Thanks
>
>
Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Daniel P. Berrangé 3 years, 5 months ago
On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
> 
> On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
> > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> > > > 
> > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> > > > <mailto:jasowang@redhat.com>> wrote:
> > > > 
> > > > 
> > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > > >      > Basic idea is to use eBPF to calculate and steer packets in TAP.
> > > >      > RSS(Receive Side Scaling) is used to distribute network packets
> > > >      to guest virtqueues
> > > >      > by calculating packet hash.
> > > >      > eBPF RSS allows us to use RSS with vhost TAP.
> > > >      >
> > > >      > This set of patches introduces the usage of eBPF for packet steering
> > > >      > and RSS hash calculation:
> > > >      > * RSS(Receive Side Scaling) is used to distribute network packets to
> > > >      > guest virtqueues by calculating packet hash
> > > >      > * eBPF RSS suppose to be faster than already existing 'software'
> > > >      > implementation in QEMU
> > > >      > * Additionally adding support for the usage of RSS with vhost
> > > >      >
> > > >      > Supported kernels: 5.8+
> > > >      >
> > > >      > Implementation notes:
> > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> > > >      > Added eBPF support to qemu directly through a system call, see the
> > > >      > bpf(2) for details.
> > > >      > The eBPF program is part of the qemu and presented as an array
> > > >      of bpf
> > > >      > instructions.
> > > >      > The program can be recompiled by provided Makefile.ebpf(need to
> > > >      adjust
> > > >      > 'linuxhdrs'),
> > > >      > although it's not required to build QEMU with eBPF support.
> > > >      > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> > > >      > 'Software' RSS used in the case of hash population and as a
> > > >      fallback option.
> > > >      > For vhost, the hash population feature is not reported to the guest.
> > > >      >
> > > >      > Please also see the documentation in PATCH 6/6.
> > > >      >
> > > >      > I am sending those patches as RFC to initiate the discussions
> > > >      and get
> > > >      > feedback on the following points:
> > > >      > * Fallback when eBPF is not supported by the kernel
> > > > 
> > > > 
> > > >      Yes, and it could also a lacking of CAP_BPF.
> > > > 
> > > > 
> > > >      > * Live migration to the kernel that doesn't have eBPF support
> > > > 
> > > > 
> > > >      Is there anything that we needs special treatment here?
> > > > 
> > > > Possible case: rss=on, vhost=on, source system with kernel 5.8
> > > > (everything works) -> dest. system 5.6 (bpf does not work), the adapter
> > > > functions, but all the steering does not use proper queues.
> > > 
> > > Right, I think we need to disable vhost on dest.
> > > 
> > > 
> > > > 
> > > > 
> > > >      > * Integration with current QEMU build
> > > > 
> > > > 
> > > >      Yes, a question here:
> > > > 
> > > >      1) Any reason for not using libbpf, e.g it has been shipped with some
> > > >      distros
> > > > 
> > > > 
> > > > We intentionally do not use libbpf, as it present only on some distros.
> > > > We can switch to libbpf, but this will disable bpf if libbpf is not
> > > > installed
> > > 
> > > That's better I think.
> > > 
> > > 
> > > >      2) It would be better if we can avoid shipping bytecodes
> > > > 
> > > > 
> > > > 
> > > > This creates new dependencies: llvm + clang + ...
> > > > We would prefer byte code and ability to generate it if prerequisites
> > > > are installed.
> > > 
> > > It's probably ok if we treat the bytecode as a kind of firmware.
> > That is explicitly *not* OK for inclusion in Fedora. They require that
> > BPF is compiled from source, and rejected my suggestion that it could
> > be considered a kind of firmware and thus have an exception from building
> > from source.
> 
> 
> Please refer what it was done in DPDK:
> 
> http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
> 
> I don't think what proposed here makes anything different.

I'm not convinced that what DPDK does is acceptable to Fedora either
based on the responses I've received when asking about BPF handling
during build.  I wouldn't suprise me, however, if this was simply
missed by reviewers when accepting DPDK into Fedora, because it is
not entirely obvious unless you are looking closely.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Daniel P. Berrangé 3 years, 5 months ago
On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé wrote:
> On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
> > 
> > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
> > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> > > > > 
> > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> > > > > <mailto:jasowang@redhat.com>> wrote:
> > > > > 
> > > > > 
> > > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > > > >      > Basic idea is to use eBPF to calculate and steer packets in TAP.
> > > > >      > RSS(Receive Side Scaling) is used to distribute network packets
> > > > >      to guest virtqueues
> > > > >      > by calculating packet hash.
> > > > >      > eBPF RSS allows us to use RSS with vhost TAP.
> > > > >      >
> > > > >      > This set of patches introduces the usage of eBPF for packet steering
> > > > >      > and RSS hash calculation:
> > > > >      > * RSS(Receive Side Scaling) is used to distribute network packets to
> > > > >      > guest virtqueues by calculating packet hash
> > > > >      > * eBPF RSS suppose to be faster than already existing 'software'
> > > > >      > implementation in QEMU
> > > > >      > * Additionally adding support for the usage of RSS with vhost
> > > > >      >
> > > > >      > Supported kernels: 5.8+
> > > > >      >
> > > > >      > Implementation notes:
> > > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> > > > >      > Added eBPF support to qemu directly through a system call, see the
> > > > >      > bpf(2) for details.
> > > > >      > The eBPF program is part of the qemu and presented as an array
> > > > >      of bpf
> > > > >      > instructions.
> > > > >      > The program can be recompiled by provided Makefile.ebpf(need to
> > > > >      adjust
> > > > >      > 'linuxhdrs'),
> > > > >      > although it's not required to build QEMU with eBPF support.
> > > > >      > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> > > > >      > 'Software' RSS used in the case of hash population and as a
> > > > >      fallback option.
> > > > >      > For vhost, the hash population feature is not reported to the guest.
> > > > >      >
> > > > >      > Please also see the documentation in PATCH 6/6.
> > > > >      >
> > > > >      > I am sending those patches as RFC to initiate the discussions
> > > > >      and get
> > > > >      > feedback on the following points:
> > > > >      > * Fallback when eBPF is not supported by the kernel
> > > > > 
> > > > > 
> > > > >      Yes, and it could also a lacking of CAP_BPF.
> > > > > 
> > > > > 
> > > > >      > * Live migration to the kernel that doesn't have eBPF support
> > > > > 
> > > > > 
> > > > >      Is there anything that we needs special treatment here?
> > > > > 
> > > > > Possible case: rss=on, vhost=on, source system with kernel 5.8
> > > > > (everything works) -> dest. system 5.6 (bpf does not work), the adapter
> > > > > functions, but all the steering does not use proper queues.
> > > > 
> > > > Right, I think we need to disable vhost on dest.
> > > > 
> > > > 
> > > > > 
> > > > > 
> > > > >      > * Integration with current QEMU build
> > > > > 
> > > > > 
> > > > >      Yes, a question here:
> > > > > 
> > > > >      1) Any reason for not using libbpf, e.g it has been shipped with some
> > > > >      distros
> > > > > 
> > > > > 
> > > > > We intentionally do not use libbpf, as it present only on some distros.
> > > > > We can switch to libbpf, but this will disable bpf if libbpf is not
> > > > > installed
> > > > 
> > > > That's better I think.
> > > > 
> > > > 
> > > > >      2) It would be better if we can avoid shipping bytecodes
> > > > > 
> > > > > 
> > > > > 
> > > > > This creates new dependencies: llvm + clang + ...
> > > > > We would prefer byte code and ability to generate it if prerequisites
> > > > > are installed.
> > > > 
> > > > It's probably ok if we treat the bytecode as a kind of firmware.
> > > That is explicitly *not* OK for inclusion in Fedora. They require that
> > > BPF is compiled from source, and rejected my suggestion that it could
> > > be considered a kind of firmware and thus have an exception from building
> > > from source.
> > 
> > 
> > Please refer what it was done in DPDK:
> > 
> > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
> > 
> > I don't think what proposed here makes anything different.
> 
> I'm not convinced that what DPDK does is acceptable to Fedora either
> based on the responses I've received when asking about BPF handling
> during build.  I wouldn't suprise me, however, if this was simply
> missed by reviewers when accepting DPDK into Fedora, because it is
> not entirely obvious unless you are looking closely.

FWIW, I'm pushing back against the idea that we have to compile the
BPF code from master source, as I think it is reasonable to have the
program embedded as a static array in the source code similar to what
DPDK does.  It doesn't feel much different from other places where apps
use generated sources, and don't build them from the original source
every time. eg "configure" is never re-generated from "configure.ac"
by Fedora packagers, they just use the generated "configure" script
as-is.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Yuri Benditovich 3 years, 5 months ago
First of all, thank you for all your feedbacks

Please help me to summarize and let us understand better what we do in v2:
Major questions are:
1. Building eBPF from source during qemu build vs. regenerating it on
demand and keeping in the repository
Solution 1a (~ as in v1): keep instructions or ELF in H file, generate it
out of qemu build. In general we'll need to have BE and LE binaries.
Solution 1b: build ELF or instructions during QEMU build if llvm + clang
exist. Then we will have only one (BE or LE, depending on current QEMU
build)
We agree with any solution - I believe you know the requirements better.

2. Use libbpf or not
In general we do not see any advantage of using libbpf. It works with
object files (does ELF parsing at time of loading), but it does not do any
magic.
Solution 2a. Switch to libbpf, generate object files (LE and BE) from
source, keep them inside QEMU (~8k each) or aside
Solution 2b. (as in v1) Use python script to parse object -> instructions
(~2k each)
We'd prefer not to use libbpf at the moment.
If due to some unknown reason we'll find it useful in future, we can switch
to it, this does not create any incompatibility. Then this will create a
dependency on libbpf.so

3. Keep instructions or ELF inside QEMU or as separate external file
Solution 3a (~as in v1): Built-in array of instructions or ELF. If we
generate them out of QEMU build - keep 2 arrays or instructions or ELF (BE
and LE),
Solution 3b: Install them as separate files (/usr/share/qemu).
We'd prefer 3a:
 Then there is a guarantee that the eBPF is built with exactly the same
config structures as QEMU (qemu creates a mapping of its structures, eBPF
uses them).
 No need to take care on scenarios like 'file not found', 'file is not
suitable' etc

4. Is there some real request to have the eBPF for big-endian?
If no, we can enable eBPF only for LE builds

Jason, Daniel, Michael
Can you please let us know what you think and why?

On Thu, Nov 5, 2020 at 3:19 PM Daniel P. Berrangé <berrange@redhat.com>
wrote:

> On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé wrote:
> > On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
> > >
> > > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> > > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
> > > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> > > > > >
> > > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> > > > > > <mailto:jasowang@redhat.com>> wrote:
> > > > > >
> > > > > >
> > > > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > > > > >      > Basic idea is to use eBPF to calculate and steer packets
> in TAP.
> > > > > >      > RSS(Receive Side Scaling) is used to distribute network
> packets
> > > > > >      to guest virtqueues
> > > > > >      > by calculating packet hash.
> > > > > >      > eBPF RSS allows us to use RSS with vhost TAP.
> > > > > >      >
> > > > > >      > This set of patches introduces the usage of eBPF for
> packet steering
> > > > > >      > and RSS hash calculation:
> > > > > >      > * RSS(Receive Side Scaling) is used to distribute network
> packets to
> > > > > >      > guest virtqueues by calculating packet hash
> > > > > >      > * eBPF RSS suppose to be faster than already existing
> 'software'
> > > > > >      > implementation in QEMU
> > > > > >      > * Additionally adding support for the usage of RSS with
> vhost
> > > > > >      >
> > > > > >      > Supported kernels: 5.8+
> > > > > >      >
> > > > > >      > Implementation notes:
> > > > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the
> eBPF program.
> > > > > >      > Added eBPF support to qemu directly through a system
> call, see the
> > > > > >      > bpf(2) for details.
> > > > > >      > The eBPF program is part of the qemu and presented as an
> array
> > > > > >      of bpf
> > > > > >      > instructions.
> > > > > >      > The program can be recompiled by provided
> Makefile.ebpf(need to
> > > > > >      adjust
> > > > > >      > 'linuxhdrs'),
> > > > > >      > although it's not required to build QEMU with eBPF
> support.
> > > > > >      > Added changes to virtio-net and vhost, primary eBPF RSS
> is used.
> > > > > >      > 'Software' RSS used in the case of hash population and as
> a
> > > > > >      fallback option.
> > > > > >      > For vhost, the hash population feature is not reported to
> the guest.
> > > > > >      >
> > > > > >      > Please also see the documentation in PATCH 6/6.
> > > > > >      >
> > > > > >      > I am sending those patches as RFC to initiate the
> discussions
> > > > > >      and get
> > > > > >      > feedback on the following points:
> > > > > >      > * Fallback when eBPF is not supported by the kernel
> > > > > >
> > > > > >
> > > > > >      Yes, and it could also a lacking of CAP_BPF.
> > > > > >
> > > > > >
> > > > > >      > * Live migration to the kernel that doesn't have eBPF
> support
> > > > > >
> > > > > >
> > > > > >      Is there anything that we needs special treatment here?
> > > > > >
> > > > > > Possible case: rss=on, vhost=on, source system with kernel 5.8
> > > > > > (everything works) -> dest. system 5.6 (bpf does not work), the
> adapter
> > > > > > functions, but all the steering does not use proper queues.
> > > > >
> > > > > Right, I think we need to disable vhost on dest.
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >      > * Integration with current QEMU build
> > > > > >
> > > > > >
> > > > > >      Yes, a question here:
> > > > > >
> > > > > >      1) Any reason for not using libbpf, e.g it has been shipped
> with some
> > > > > >      distros
> > > > > >
> > > > > >
> > > > > > We intentionally do not use libbpf, as it present only on some
> distros.
> > > > > > We can switch to libbpf, but this will disable bpf if libbpf is
> not
> > > > > > installed
> > > > >
> > > > > That's better I think.
> > > > >
> > > > >
> > > > > >      2) It would be better if we can avoid shipping bytecodes
> > > > > >
> > > > > >
> > > > > >
> > > > > > This creates new dependencies: llvm + clang + ...
> > > > > > We would prefer byte code and ability to generate it if
> prerequisites
> > > > > > are installed.
> > > > >
> > > > > It's probably ok if we treat the bytecode as a kind of firmware.
> > > > That is explicitly *not* OK for inclusion in Fedora. They require
> that
> > > > BPF is compiled from source, and rejected my suggestion that it could
> > > > be considered a kind of firmware and thus have an exception from
> building
> > > > from source.
> > >
> > >
> > > Please refer what it was done in DPDK:
> > >
> > > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
> > >
> > > I don't think what proposed here makes anything different.
> >
> > I'm not convinced that what DPDK does is acceptable to Fedora either
> > based on the responses I've received when asking about BPF handling
> > during build.  I wouldn't suprise me, however, if this was simply
> > missed by reviewers when accepting DPDK into Fedora, because it is
> > not entirely obvious unless you are looking closely.
>
> FWIW, I'm pushing back against the idea that we have to compile the
> BPF code from master source, as I think it is reasonable to have the
> program embedded as a static array in the source code similar to what
> DPDK does.  It doesn't feel much different from other places where apps
> use generated sources, and don't build them from the original source
> every time. eg "configure" is never re-generated from "configure.ac"
> by Fedora packagers, they just use the generated "configure" script
> as-is.
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>
>
Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Jason Wang 3 years, 5 months ago
On 2020/11/5 下午11:13, Yuri Benditovich wrote:
> First of all, thank you for all your feedbacks
>
> Please help me to summarize and let us understand better what we do in v2:
> Major questions are:
> 1. Building eBPF from source during qemu build vs. regenerating it on 
> demand and keeping in the repository
> Solution 1a (~ as in v1): keep instructions or ELF in H file, generate 
> it out of qemu build. In general we'll need to have BE and LE binaries.
> Solution 1b: build ELF or instructions during QEMU build if llvm + 
> clang exist. Then we will have only one (BE or LE, depending on 
> current QEMU build)
> We agree with any solution - I believe you know the requirements better.


I think we can go with 1a. (See Danial's comment)


>
> 2. Use libbpf or not
> In general we do not see any advantage of using libbpf. It works with 
> object files (does ELF parsing at time of loading), but it does not do 
> any magic.
> Solution 2a. Switch to libbpf, generate object files (LE and BE) from 
> source, keep them inside QEMU (~8k each) or aside


Can we simply use dynamic linking here?


> Solution 2b. (as in v1) Use python script to parse object -> 
> instructions (~2k each)
> We'd prefer not to use libbpf at the moment.
> If due to some unknown reason we'll find it useful in future, we can 
> switch to it, this does not create any incompatibility. Then this will 
> create a dependency on libbpf.so


I think we need to care about compatibility. E.g we need to enable BTF 
so I don't know how hard if we add BTF support in the current design. It 
would be probably OK it's not a lot of effort.


>
> 3. Keep instructions or ELF inside QEMU or as separate external file
> Solution 3a (~as in v1): Built-in array of instructions or ELF. If we 
> generate them out of QEMU build - keep 2 arrays or instructions or ELF 
> (BE and LE),
> Solution 3b: Install them as separate files (/usr/share/qemu).
> We'd prefer 3a:
>  Then there is a guarantee that the eBPF is built with exactly the 
> same config structures as QEMU (qemu creates a mapping of its 
> structures, eBPF uses them).
>  No need to take care on scenarios like 'file not found', 'file is not 
> suitable' etc


Yes, let's go 3a for upstream.


>
> 4. Is there some real request to have the eBPF for big-endian?
> If no, we can enable eBPF only for LE builds


We can go with LE first.

Thanks


>
> Jason, Daniel, Michael
> Can you please let us know what you think and why?
>
> On Thu, Nov 5, 2020 at 3:19 PM Daniel P. Berrangé <berrange@redhat.com 
> <mailto:berrange@redhat.com>> wrote:
>
>     On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé wrote:
>     > On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
>     > >
>     > > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
>     > > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
>     > > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
>     > > > > >
>     > > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang
>     <jasowang@redhat.com <mailto:jasowang@redhat.com>
>     > > > > > <mailto:jasowang@redhat.com
>     <mailto:jasowang@redhat.com>>> wrote:
>     > > > > >
>     > > > > >
>     > > > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
>     > > > > >      > Basic idea is to use eBPF to calculate and steer
>     packets in TAP.
>     > > > > >      > RSS(Receive Side Scaling) is used to distribute
>     network packets
>     > > > > >      to guest virtqueues
>     > > > > >      > by calculating packet hash.
>     > > > > >      > eBPF RSS allows us to use RSS with vhost TAP.
>     > > > > >      >
>     > > > > >      > This set of patches introduces the usage of eBPF
>     for packet steering
>     > > > > >      > and RSS hash calculation:
>     > > > > >      > * RSS(Receive Side Scaling) is used to distribute
>     network packets to
>     > > > > >      > guest virtqueues by calculating packet hash
>     > > > > >      > * eBPF RSS suppose to be faster than already
>     existing 'software'
>     > > > > >      > implementation in QEMU
>     > > > > >      > * Additionally adding support for the usage of
>     RSS with vhost
>     > > > > >      >
>     > > > > >      > Supported kernels: 5.8+
>     > > > > >      >
>     > > > > >      > Implementation notes:
>     > > > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to
>     set the eBPF program.
>     > > > > >      > Added eBPF support to qemu directly through a
>     system call, see the
>     > > > > >      > bpf(2) for details.
>     > > > > >      > The eBPF program is part of the qemu and
>     presented as an array
>     > > > > >      of bpf
>     > > > > >      > instructions.
>     > > > > >      > The program can be recompiled by provided
>     Makefile.ebpf(need to
>     > > > > >      adjust
>     > > > > >      > 'linuxhdrs'),
>     > > > > >      > although it's not required to build QEMU with
>     eBPF support.
>     > > > > >      > Added changes to virtio-net and vhost, primary
>     eBPF RSS is used.
>     > > > > >      > 'Software' RSS used in the case of hash
>     population and as a
>     > > > > >      fallback option.
>     > > > > >      > For vhost, the hash population feature is not
>     reported to the guest.
>     > > > > >      >
>     > > > > >      > Please also see the documentation in PATCH 6/6.
>     > > > > >      >
>     > > > > >      > I am sending those patches as RFC to initiate the
>     discussions
>     > > > > >      and get
>     > > > > >      > feedback on the following points:
>     > > > > >      > * Fallback when eBPF is not supported by the kernel
>     > > > > >
>     > > > > >
>     > > > > >      Yes, and it could also a lacking of CAP_BPF.
>     > > > > >
>     > > > > >
>     > > > > >      > * Live migration to the kernel that doesn't have
>     eBPF support
>     > > > > >
>     > > > > >
>     > > > > >      Is there anything that we needs special treatment here?
>     > > > > >
>     > > > > > Possible case: rss=on, vhost=on, source system with
>     kernel 5.8
>     > > > > > (everything works) -> dest. system 5.6 (bpf does not
>     work), the adapter
>     > > > > > functions, but all the steering does not use proper queues.
>     > > > >
>     > > > > Right, I think we need to disable vhost on dest.
>     > > > >
>     > > > >
>     > > > > >
>     > > > > >
>     > > > > >      > * Integration with current QEMU build
>     > > > > >
>     > > > > >
>     > > > > >      Yes, a question here:
>     > > > > >
>     > > > > >      1) Any reason for not using libbpf, e.g it has been
>     shipped with some
>     > > > > >      distros
>     > > > > >
>     > > > > >
>     > > > > > We intentionally do not use libbpf, as it present only
>     on some distros.
>     > > > > > We can switch to libbpf, but this will disable bpf if
>     libbpf is not
>     > > > > > installed
>     > > > >
>     > > > > That's better I think.
>     > > > >
>     > > > >
>     > > > > >      2) It would be better if we can avoid shipping
>     bytecodes
>     > > > > >
>     > > > > >
>     > > > > >
>     > > > > > This creates new dependencies: llvm + clang + ...
>     > > > > > We would prefer byte code and ability to generate it if
>     prerequisites
>     > > > > > are installed.
>     > > > >
>     > > > > It's probably ok if we treat the bytecode as a kind of
>     firmware.
>     > > > That is explicitly *not* OK for inclusion in Fedora. They
>     require that
>     > > > BPF is compiled from source, and rejected my suggestion that
>     it could
>     > > > be considered a kind of firmware and thus have an exception
>     from building
>     > > > from source.
>     > >
>     > >
>     > > Please refer what it was done in DPDK:
>     > >
>     > > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
>     > >
>     > > I don't think what proposed here makes anything different.
>     >
>     > I'm not convinced that what DPDK does is acceptable to Fedora either
>     > based on the responses I've received when asking about BPF handling
>     > during build.  I wouldn't suprise me, however, if this was simply
>     > missed by reviewers when accepting DPDK into Fedora, because it is
>     > not entirely obvious unless you are looking closely.
>
>     FWIW, I'm pushing back against the idea that we have to compile the
>     BPF code from master source, as I think it is reasonable to have the
>     program embedded as a static array in the source code similar to what
>     DPDK does.  It doesn't feel much different from other places where
>     apps
>     use generated sources, and don't build them from the original source
>     every time. eg "configure" is never re-generated from
>     "configure.ac <http://configure.ac>"
>     by Fedora packagers, they just use the generated "configure" script
>     as-is.
>
>     Regards,
>     Daniel
>     -- 
>     |: https://berrange.com     -o-
>     https://www.flickr.com/photos/dberrange :|
>     |: https://libvirt.org        -o- https://fstop138.berrange.com :|
>     |: https://entangle-photo.org   -o-
>     https://www.instagram.com/dberrange :|
>


Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Yuri Benditovich 3 years, 5 months ago
On Mon, Nov 9, 2020 at 4:14 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/5 下午11:13, Yuri Benditovich wrote:
> > First of all, thank you for all your feedbacks
> >
> > Please help me to summarize and let us understand better what we do in
> v2:
> > Major questions are:
> > 1. Building eBPF from source during qemu build vs. regenerating it on
> > demand and keeping in the repository
> > Solution 1a (~ as in v1): keep instructions or ELF in H file, generate
> > it out of qemu build. In general we'll need to have BE and LE binaries.
> > Solution 1b: build ELF or instructions during QEMU build if llvm +
> > clang exist. Then we will have only one (BE or LE, depending on
> > current QEMU build)
> > We agree with any solution - I believe you know the requirements better.
>
>
> I think we can go with 1a. (See Danial's comment)
>
>
> >
> > 2. Use libbpf or not
> > In general we do not see any advantage of using libbpf. It works with
> > object files (does ELF parsing at time of loading), but it does not do
> > any magic.
> > Solution 2a. Switch to libbpf, generate object files (LE and BE) from
> > source, keep them inside QEMU (~8k each) or aside
>
>
> Can we simply use dynamic linking here?
>
>
Can you please explain, where exactly you suggest to use dynamic linking?


>
> > Solution 2b. (as in v1) Use python script to parse object ->
> > instructions (~2k each)
> > We'd prefer not to use libbpf at the moment.
> > If due to some unknown reason we'll find it useful in future, we can
> > switch to it, this does not create any incompatibility. Then this will
> > create a dependency on libbpf.so
>
>
> I think we need to care about compatibility. E.g we need to enable BTF
> so I don't know how hard if we add BTF support in the current design. It
> would be probably OK it's not a lot of effort.
>
>
As far as we understand BTF helps in BPF debugging and libbpf supports it
as is.
Without libbpf we in v1 load the BPF instructions only.
If you think the BTF is mandatory (BTW, why?) I think it is better to
switch to libbpf and keep the entire ELF in the qemu data.


> >
> > 3. Keep instructions or ELF inside QEMU or as separate external file
> > Solution 3a (~as in v1): Built-in array of instructions or ELF. If we
> > generate them out of QEMU build - keep 2 arrays or instructions or ELF
> > (BE and LE),
> > Solution 3b: Install them as separate files (/usr/share/qemu).
> > We'd prefer 3a:
> >  Then there is a guarantee that the eBPF is built with exactly the
> > same config structures as QEMU (qemu creates a mapping of its
> > structures, eBPF uses them).
> >  No need to take care on scenarios like 'file not found', 'file is not
> > suitable' etc
>
>
> Yes, let's go 3a for upstream.
>
>
> >
> > 4. Is there some real request to have the eBPF for big-endian?
> > If no, we can enable eBPF only for LE builds
>
>
> We can go with LE first.
>
> Thanks
>
>
> >
> > Jason, Daniel, Michael
> > Can you please let us know what you think and why?
> >
> > On Thu, Nov 5, 2020 at 3:19 PM Daniel P. Berrangé <berrange@redhat.com
> > <mailto:berrange@redhat.com>> wrote:
> >
> >     On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé wrote:
> >     > On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
> >     > >
> >     > > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> >     > > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
> >     > > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> >     > > > > >
> >     > > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang
> >     <jasowang@redhat.com <mailto:jasowang@redhat.com>
> >     > > > > > <mailto:jasowang@redhat.com
> >     <mailto:jasowang@redhat.com>>> wrote:
> >     > > > > >
> >     > > > > >
> >     > > > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> >     > > > > >      > Basic idea is to use eBPF to calculate and steer
> >     packets in TAP.
> >     > > > > >      > RSS(Receive Side Scaling) is used to distribute
> >     network packets
> >     > > > > >      to guest virtqueues
> >     > > > > >      > by calculating packet hash.
> >     > > > > >      > eBPF RSS allows us to use RSS with vhost TAP.
> >     > > > > >      >
> >     > > > > >      > This set of patches introduces the usage of eBPF
> >     for packet steering
> >     > > > > >      > and RSS hash calculation:
> >     > > > > >      > * RSS(Receive Side Scaling) is used to distribute
> >     network packets to
> >     > > > > >      > guest virtqueues by calculating packet hash
> >     > > > > >      > * eBPF RSS suppose to be faster than already
> >     existing 'software'
> >     > > > > >      > implementation in QEMU
> >     > > > > >      > * Additionally adding support for the usage of
> >     RSS with vhost
> >     > > > > >      >
> >     > > > > >      > Supported kernels: 5.8+
> >     > > > > >      >
> >     > > > > >      > Implementation notes:
> >     > > > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to
> >     set the eBPF program.
> >     > > > > >      > Added eBPF support to qemu directly through a
> >     system call, see the
> >     > > > > >      > bpf(2) for details.
> >     > > > > >      > The eBPF program is part of the qemu and
> >     presented as an array
> >     > > > > >      of bpf
> >     > > > > >      > instructions.
> >     > > > > >      > The program can be recompiled by provided
> >     Makefile.ebpf(need to
> >     > > > > >      adjust
> >     > > > > >      > 'linuxhdrs'),
> >     > > > > >      > although it's not required to build QEMU with
> >     eBPF support.
> >     > > > > >      > Added changes to virtio-net and vhost, primary
> >     eBPF RSS is used.
> >     > > > > >      > 'Software' RSS used in the case of hash
> >     population and as a
> >     > > > > >      fallback option.
> >     > > > > >      > For vhost, the hash population feature is not
> >     reported to the guest.
> >     > > > > >      >
> >     > > > > >      > Please also see the documentation in PATCH 6/6.
> >     > > > > >      >
> >     > > > > >      > I am sending those patches as RFC to initiate the
> >     discussions
> >     > > > > >      and get
> >     > > > > >      > feedback on the following points:
> >     > > > > >      > * Fallback when eBPF is not supported by the kernel
> >     > > > > >
> >     > > > > >
> >     > > > > >      Yes, and it could also a lacking of CAP_BPF.
> >     > > > > >
> >     > > > > >
> >     > > > > >      > * Live migration to the kernel that doesn't have
> >     eBPF support
> >     > > > > >
> >     > > > > >
> >     > > > > >      Is there anything that we needs special treatment
> here?
> >     > > > > >
> >     > > > > > Possible case: rss=on, vhost=on, source system with
> >     kernel 5.8
> >     > > > > > (everything works) -> dest. system 5.6 (bpf does not
> >     work), the adapter
> >     > > > > > functions, but all the steering does not use proper queues.
> >     > > > >
> >     > > > > Right, I think we need to disable vhost on dest.
> >     > > > >
> >     > > > >
> >     > > > > >
> >     > > > > >
> >     > > > > >      > * Integration with current QEMU build
> >     > > > > >
> >     > > > > >
> >     > > > > >      Yes, a question here:
> >     > > > > >
> >     > > > > >      1) Any reason for not using libbpf, e.g it has been
> >     shipped with some
> >     > > > > >      distros
> >     > > > > >
> >     > > > > >
> >     > > > > > We intentionally do not use libbpf, as it present only
> >     on some distros.
> >     > > > > > We can switch to libbpf, but this will disable bpf if
> >     libbpf is not
> >     > > > > > installed
> >     > > > >
> >     > > > > That's better I think.
> >     > > > >
> >     > > > >
> >     > > > > >      2) It would be better if we can avoid shipping
> >     bytecodes
> >     > > > > >
> >     > > > > >
> >     > > > > >
> >     > > > > > This creates new dependencies: llvm + clang + ...
> >     > > > > > We would prefer byte code and ability to generate it if
> >     prerequisites
> >     > > > > > are installed.
> >     > > > >
> >     > > > > It's probably ok if we treat the bytecode as a kind of
> >     firmware.
> >     > > > That is explicitly *not* OK for inclusion in Fedora. They
> >     require that
> >     > > > BPF is compiled from source, and rejected my suggestion that
> >     it could
> >     > > > be considered a kind of firmware and thus have an exception
> >     from building
> >     > > > from source.
> >     > >
> >     > >
> >     > > Please refer what it was done in DPDK:
> >     > >
> >     > > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
> >     > >
> >     > > I don't think what proposed here makes anything different.
> >     >
> >     > I'm not convinced that what DPDK does is acceptable to Fedora
> either
> >     > based on the responses I've received when asking about BPF handling
> >     > during build.  I wouldn't suprise me, however, if this was simply
> >     > missed by reviewers when accepting DPDK into Fedora, because it is
> >     > not entirely obvious unless you are looking closely.
> >
> >     FWIW, I'm pushing back against the idea that we have to compile the
> >     BPF code from master source, as I think it is reasonable to have the
> >     program embedded as a static array in the source code similar to what
> >     DPDK does.  It doesn't feel much different from other places where
> >     apps
> >     use generated sources, and don't build them from the original source
> >     every time. eg "configure" is never re-generated from
> >     "configure.ac <http://configure.ac>"
> >     by Fedora packagers, they just use the generated "configure" script
> >     as-is.
> >
> >     Regards,
> >     Daniel
> >     --
> >     |: https://berrange.com     -o-
> >     https://www.flickr.com/photos/dberrange :|
> >     |: https://libvirt.org        -o- https://fstop138.berrange.com :|
> >     |: https://entangle-photo.org   -o-
> >     https://www.instagram.com/dberrange :|
> >
>
>
Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Jason Wang 3 years, 5 months ago
On 2020/11/9 下午9:33, Yuri Benditovich wrote:
>
>
> On Mon, Nov 9, 2020 at 4:14 AM Jason Wang <jasowang@redhat.com 
> <mailto:jasowang@redhat.com>> wrote:
>
>
>     On 2020/11/5 下午11:13, Yuri Benditovich wrote:
>     > First of all, thank you for all your feedbacks
>     >
>     > Please help me to summarize and let us understand better what we
>     do in v2:
>     > Major questions are:
>     > 1. Building eBPF from source during qemu build vs. regenerating
>     it on
>     > demand and keeping in the repository
>     > Solution 1a (~ as in v1): keep instructions or ELF in H file,
>     generate
>     > it out of qemu build. In general we'll need to have BE and LE
>     binaries.
>     > Solution 1b: build ELF or instructions during QEMU build if llvm +
>     > clang exist. Then we will have only one (BE or LE, depending on
>     > current QEMU build)
>     > We agree with any solution - I believe you know the requirements
>     better.
>
>
>     I think we can go with 1a. (See Danial's comment)
>
>
>     >
>     > 2. Use libbpf or not
>     > In general we do not see any advantage of using libbpf. It works
>     with
>     > object files (does ELF parsing at time of loading), but it does
>     not do
>     > any magic.
>     > Solution 2a. Switch to libbpf, generate object files (LE and BE)
>     from
>     > source, keep them inside QEMU (~8k each) or aside
>
>
>     Can we simply use dynamic linking here?
>
>
> Can you please explain, where exactly you suggest to use dynamic linking?


Yes. If I understand your 2a properly, you meant static linking of 
libbpf. So what I want to ask is the possibility of dynamic linking of 
libbpf here.


>
>     > Solution 2b. (as in v1) Use python script to parse object ->
>     > instructions (~2k each)
>     > We'd prefer not to use libbpf at the moment.
>     > If due to some unknown reason we'll find it useful in future, we
>     can
>     > switch to it, this does not create any incompatibility. Then
>     this will
>     > create a dependency on libbpf.so
>
>
>     I think we need to care about compatibility. E.g we need to enable
>     BTF
>     so I don't know how hard if we add BTF support in the current
>     design. It
>     would be probably OK it's not a lot of effort.
>
>
> As far as we understand BTF helps in BPF debugging and libbpf supports 
> it as is.
> Without libbpf we in v1 load the BPF instructions only.
> If you think the BTF is mandatory (BTW, why?) I think it is better to 
> switch to libbpf and keep the entire ELF in the qemu data.


It is used to make sure the BPF can do compile once run everywhere.

This is explained in detail in here: 
https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html.

Thanks


>
>
>     >
>     > 3. Keep instructions or ELF inside QEMU or as separate external file
>     > Solution 3a (~as in v1): Built-in array of instructions or ELF.
>     If we
>     > generate them out of QEMU build - keep 2 arrays or instructions
>     or ELF
>     > (BE and LE),
>     > Solution 3b: Install them as separate files (/usr/share/qemu).
>     > We'd prefer 3a:
>     >  Then there is a guarantee that the eBPF is built with exactly the
>     > same config structures as QEMU (qemu creates a mapping of its
>     > structures, eBPF uses them).
>     >  No need to take care on scenarios like 'file not found', 'file
>     is not
>     > suitable' etc
>
>
>     Yes, let's go 3a for upstream.
>
>
>     >
>     > 4. Is there some real request to have the eBPF for big-endian?
>     > If no, we can enable eBPF only for LE builds
>
>
>     We can go with LE first.
>
>     Thanks
>
>
>     >
>     > Jason, Daniel, Michael
>     > Can you please let us know what you think and why?
>     >
>     > On Thu, Nov 5, 2020 at 3:19 PM Daniel P. Berrangé
>     <berrange@redhat.com <mailto:berrange@redhat.com>
>     > <mailto:berrange@redhat.com <mailto:berrange@redhat.com>>> wrote:
>     >
>     >     On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé
>     wrote:
>     >     > On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
>     >     > >
>     >     > > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
>     >     > > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang
>     wrote:
>     >     > > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
>     >     > > > > >
>     >     > > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang
>     >     <jasowang@redhat.com <mailto:jasowang@redhat.com>
>     <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>
>     >     > > > > > <mailto:jasowang@redhat.com
>     <mailto:jasowang@redhat.com>
>     >     <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>>>
>     wrote:
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
>     >     > > > > >      > Basic idea is to use eBPF to calculate and
>     steer
>     >     packets in TAP.
>     >     > > > > >      > RSS(Receive Side Scaling) is used to distribute
>     >     network packets
>     >     > > > > >      to guest virtqueues
>     >     > > > > >      > by calculating packet hash.
>     >     > > > > >      > eBPF RSS allows us to use RSS with vhost TAP.
>     >     > > > > >      >
>     >     > > > > >      > This set of patches introduces the usage of
>     eBPF
>     >     for packet steering
>     >     > > > > >      > and RSS hash calculation:
>     >     > > > > >      > * RSS(Receive Side Scaling) is used to
>     distribute
>     >     network packets to
>     >     > > > > >      > guest virtqueues by calculating packet hash
>     >     > > > > >      > * eBPF RSS suppose to be faster than already
>     >     existing 'software'
>     >     > > > > >      > implementation in QEMU
>     >     > > > > >      > * Additionally adding support for the usage of
>     >     RSS with vhost
>     >     > > > > >      >
>     >     > > > > >      > Supported kernels: 5.8+
>     >     > > > > >      >
>     >     > > > > >      > Implementation notes:
>     >     > > > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to
>     >     set the eBPF program.
>     >     > > > > >      > Added eBPF support to qemu directly through a
>     >     system call, see the
>     >     > > > > >      > bpf(2) for details.
>     >     > > > > >      > The eBPF program is part of the qemu and
>     >     presented as an array
>     >     > > > > >      of bpf
>     >     > > > > >      > instructions.
>     >     > > > > >      > The program can be recompiled by provided
>     >     Makefile.ebpf(need to
>     >     > > > > >      adjust
>     >     > > > > >      > 'linuxhdrs'),
>     >     > > > > >      > although it's not required to build QEMU with
>     >     eBPF support.
>     >     > > > > >      > Added changes to virtio-net and vhost, primary
>     >     eBPF RSS is used.
>     >     > > > > >      > 'Software' RSS used in the case of hash
>     >     population and as a
>     >     > > > > >      fallback option.
>     >     > > > > >      > For vhost, the hash population feature is not
>     >     reported to the guest.
>     >     > > > > >      >
>     >     > > > > >      > Please also see the documentation in PATCH 6/6.
>     >     > > > > >      >
>     >     > > > > >      > I am sending those patches as RFC to
>     initiate the
>     >     discussions
>     >     > > > > >      and get
>     >     > > > > >      > feedback on the following points:
>     >     > > > > >      > * Fallback when eBPF is not supported by
>     the kernel
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >      Yes, and it could also a lacking of CAP_BPF.
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >      > * Live migration to the kernel that doesn't
>     have
>     >     eBPF support
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >      Is there anything that we needs special
>     treatment here?
>     >     > > > > >
>     >     > > > > > Possible case: rss=on, vhost=on, source system with
>     >     kernel 5.8
>     >     > > > > > (everything works) -> dest. system 5.6 (bpf does not
>     >     work), the adapter
>     >     > > > > > functions, but all the steering does not use
>     proper queues.
>     >     > > > >
>     >     > > > > Right, I think we need to disable vhost on dest.
>     >     > > > >
>     >     > > > >
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >      > * Integration with current QEMU build
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >      Yes, a question here:
>     >     > > > > >
>     >     > > > > >      1) Any reason for not using libbpf, e.g it
>     has been
>     >     shipped with some
>     >     > > > > >      distros
>     >     > > > > >
>     >     > > > > >
>     >     > > > > > We intentionally do not use libbpf, as it present only
>     >     on some distros.
>     >     > > > > > We can switch to libbpf, but this will disable bpf if
>     >     libbpf is not
>     >     > > > > > installed
>     >     > > > >
>     >     > > > > That's better I think.
>     >     > > > >
>     >     > > > >
>     >     > > > > >      2) It would be better if we can avoid shipping
>     >     bytecodes
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >
>     >     > > > > > This creates new dependencies: llvm + clang + ...
>     >     > > > > > We would prefer byte code and ability to generate
>     it if
>     >     prerequisites
>     >     > > > > > are installed.
>     >     > > > >
>     >     > > > > It's probably ok if we treat the bytecode as a kind of
>     >     firmware.
>     >     > > > That is explicitly *not* OK for inclusion in Fedora. They
>     >     require that
>     >     > > > BPF is compiled from source, and rejected my
>     suggestion that
>     >     it could
>     >     > > > be considered a kind of firmware and thus have an
>     exception
>     >     from building
>     >     > > > from source.
>     >     > >
>     >     > >
>     >     > > Please refer what it was done in DPDK:
>     >     > >
>     >     > > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
>     >     > >
>     >     > > I don't think what proposed here makes anything different.
>     >     >
>     >     > I'm not convinced that what DPDK does is acceptable to
>     Fedora either
>     >     > based on the responses I've received when asking about BPF
>     handling
>     >     > during build.  I wouldn't suprise me, however, if this was
>     simply
>     >     > missed by reviewers when accepting DPDK into Fedora,
>     because it is
>     >     > not entirely obvious unless you are looking closely.
>     >
>     >     FWIW, I'm pushing back against the idea that we have to
>     compile the
>     >     BPF code from master source, as I think it is reasonable to
>     have the
>     >     program embedded as a static array in the source code
>     similar to what
>     >     DPDK does.  It doesn't feel much different from other places
>     where
>     >     apps
>     >     use generated sources, and don't build them from the
>     original source
>     >     every time. eg "configure" is never re-generated from
>     >     "configure.ac <http://configure.ac> <http://configure.ac>"
>     >     by Fedora packagers, they just use the generated "configure"
>     script
>     >     as-is.
>     >
>     >     Regards,
>     >     Daniel
>     >     --
>     >     |: https://berrange.com     -o-
>     > https://www.flickr.com/photos/dberrange :|
>     >     |: https://libvirt.org        -o-
>     https://fstop138.berrange.com :|
>     >     |: https://entangle-photo.org   -o-
>     > https://www.instagram.com/dberrange :|
>     >
>


Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Yuri Benditovich 3 years, 5 months ago
On Tue, Nov 10, 2020 at 4:23 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/9 下午9:33, Yuri Benditovich wrote:
> >
> >
> > On Mon, Nov 9, 2020 at 4:14 AM Jason Wang <jasowang@redhat.com
> > <mailto:jasowang@redhat.com>> wrote:
> >
> >
> >     On 2020/11/5 下午11:13, Yuri Benditovich wrote:
> >     > First of all, thank you for all your feedbacks
> >     >
> >     > Please help me to summarize and let us understand better what we
> >     do in v2:
> >     > Major questions are:
> >     > 1. Building eBPF from source during qemu build vs. regenerating
> >     it on
> >     > demand and keeping in the repository
> >     > Solution 1a (~ as in v1): keep instructions or ELF in H file,
> >     generate
> >     > it out of qemu build. In general we'll need to have BE and LE
> >     binaries.
> >     > Solution 1b: build ELF or instructions during QEMU build if llvm +
> >     > clang exist. Then we will have only one (BE or LE, depending on
> >     > current QEMU build)
> >     > We agree with any solution - I believe you know the requirements
> >     better.
> >
> >
> >     I think we can go with 1a. (See Danial's comment)
> >
> >
> >     >
> >     > 2. Use libbpf or not
> >     > In general we do not see any advantage of using libbpf. It works
> >     with
> >     > object files (does ELF parsing at time of loading), but it does
> >     not do
> >     > any magic.
> >     > Solution 2a. Switch to libbpf, generate object files (LE and BE)
> >     from
> >     > source, keep them inside QEMU (~8k each) or aside
> >
> >
> >     Can we simply use dynamic linking here?
> >
> >
> > Can you please explain, where exactly you suggest to use dynamic linking?
>
>
> Yes. If I understand your 2a properly, you meant static linking of
> libbpf. So what I want to ask is the possibility of dynamic linking of
> libbpf here.
>
>
As Daniel explained above, QEMU is always linked dynamically vs libraries.
Also I see the libbpf package does not even contain the static library.
If the build environment contains libbpf, the libbpf.so becomes runtime
dependency, just as with other libs.


>
> >
> >     > Solution 2b. (as in v1) Use python script to parse object ->
> >     > instructions (~2k each)
> >     > We'd prefer not to use libbpf at the moment.
> >     > If due to some unknown reason we'll find it useful in future, we
> >     can
> >     > switch to it, this does not create any incompatibility. Then
> >     this will
> >     > create a dependency on libbpf.so
> >
> >
> >     I think we need to care about compatibility. E.g we need to enable
> >     BTF
> >     so I don't know how hard if we add BTF support in the current
> >     design. It
> >     would be probably OK it's not a lot of effort.
> >
> >
> > As far as we understand BTF helps in BPF debugging and libbpf supports
> > it as is.
> > Without libbpf we in v1 load the BPF instructions only.
> > If you think the BTF is mandatory (BTW, why?) I think it is better to
> > switch to libbpf and keep the entire ELF in the qemu data.
>
>
> It is used to make sure the BPF can do compile once run everywhere.
>
> This is explained in detail in here:
>
> https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html
> .
>
>
Thank you, then there is no question, we need to use libbpf.


> Thanks
>
>
> >
> >
> >     >
> >     > 3. Keep instructions or ELF inside QEMU or as separate external
> file
> >     > Solution 3a (~as in v1): Built-in array of instructions or ELF.
> >     If we
> >     > generate them out of QEMU build - keep 2 arrays or instructions
> >     or ELF
> >     > (BE and LE),
> >     > Solution 3b: Install them as separate files (/usr/share/qemu).
> >     > We'd prefer 3a:
> >     >  Then there is a guarantee that the eBPF is built with exactly the
> >     > same config structures as QEMU (qemu creates a mapping of its
> >     > structures, eBPF uses them).
> >     >  No need to take care on scenarios like 'file not found', 'file
> >     is not
> >     > suitable' etc
> >
> >
> >     Yes, let's go 3a for upstream.
> >
> >
> >     >
> >     > 4. Is there some real request to have the eBPF for big-endian?
> >     > If no, we can enable eBPF only for LE builds
> >
> >
> >     We can go with LE first.
> >
> >     Thanks
> >
> >
> >     >
> >     > Jason, Daniel, Michael
> >     > Can you please let us know what you think and why?
> >     >
> >     > On Thu, Nov 5, 2020 at 3:19 PM Daniel P. Berrangé
> >     <berrange@redhat.com <mailto:berrange@redhat.com>
> >     > <mailto:berrange@redhat.com <mailto:berrange@redhat.com>>> wrote:
> >     >
> >     >     On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé
> >     wrote:
> >     >     > On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
> >     >     > >
> >     >     > > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> >     >     > > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang
> >     wrote:
> >     >     > > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> >     >     > > > > >
> >     >     > > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang
> >     >     <jasowang@redhat.com <mailto:jasowang@redhat.com>
> >     <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>
> >     >     > > > > > <mailto:jasowang@redhat.com
> >     <mailto:jasowang@redhat.com>
> >     >     <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>>>
> >     wrote:
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> >     >     > > > > >      > Basic idea is to use eBPF to calculate and
> >     steer
> >     >     packets in TAP.
> >     >     > > > > >      > RSS(Receive Side Scaling) is used to
> distribute
> >     >     network packets
> >     >     > > > > >      to guest virtqueues
> >     >     > > > > >      > by calculating packet hash.
> >     >     > > > > >      > eBPF RSS allows us to use RSS with vhost TAP.
> >     >     > > > > >      >
> >     >     > > > > >      > This set of patches introduces the usage of
> >     eBPF
> >     >     for packet steering
> >     >     > > > > >      > and RSS hash calculation:
> >     >     > > > > >      > * RSS(Receive Side Scaling) is used to
> >     distribute
> >     >     network packets to
> >     >     > > > > >      > guest virtqueues by calculating packet hash
> >     >     > > > > >      > * eBPF RSS suppose to be faster than already
> >     >     existing 'software'
> >     >     > > > > >      > implementation in QEMU
> >     >     > > > > >      > * Additionally adding support for the usage of
> >     >     RSS with vhost
> >     >     > > > > >      >
> >     >     > > > > >      > Supported kernels: 5.8+
> >     >     > > > > >      >
> >     >     > > > > >      > Implementation notes:
> >     >     > > > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to
> >     >     set the eBPF program.
> >     >     > > > > >      > Added eBPF support to qemu directly through a
> >     >     system call, see the
> >     >     > > > > >      > bpf(2) for details.
> >     >     > > > > >      > The eBPF program is part of the qemu and
> >     >     presented as an array
> >     >     > > > > >      of bpf
> >     >     > > > > >      > instructions.
> >     >     > > > > >      > The program can be recompiled by provided
> >     >     Makefile.ebpf(need to
> >     >     > > > > >      adjust
> >     >     > > > > >      > 'linuxhdrs'),
> >     >     > > > > >      > although it's not required to build QEMU with
> >     >     eBPF support.
> >     >     > > > > >      > Added changes to virtio-net and vhost, primary
> >     >     eBPF RSS is used.
> >     >     > > > > >      > 'Software' RSS used in the case of hash
> >     >     population and as a
> >     >     > > > > >      fallback option.
> >     >     > > > > >      > For vhost, the hash population feature is not
> >     >     reported to the guest.
> >     >     > > > > >      >
> >     >     > > > > >      > Please also see the documentation in PATCH
> 6/6.
> >     >     > > > > >      >
> >     >     > > > > >      > I am sending those patches as RFC to
> >     initiate the
> >     >     discussions
> >     >     > > > > >      and get
> >     >     > > > > >      > feedback on the following points:
> >     >     > > > > >      > * Fallback when eBPF is not supported by
> >     the kernel
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      Yes, and it could also a lacking of CAP_BPF.
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      > * Live migration to the kernel that doesn't
> >     have
> >     >     eBPF support
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      Is there anything that we needs special
> >     treatment here?
> >     >     > > > > >
> >     >     > > > > > Possible case: rss=on, vhost=on, source system with
> >     >     kernel 5.8
> >     >     > > > > > (everything works) -> dest. system 5.6 (bpf does not
> >     >     work), the adapter
> >     >     > > > > > functions, but all the steering does not use
> >     proper queues.
> >     >     > > > >
> >     >     > > > > Right, I think we need to disable vhost on dest.
> >     >     > > > >
> >     >     > > > >
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      > * Integration with current QEMU build
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      Yes, a question here:
> >     >     > > > > >
> >     >     > > > > >      1) Any reason for not using libbpf, e.g it
> >     has been
> >     >     shipped with some
> >     >     > > > > >      distros
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > > We intentionally do not use libbpf, as it present
> only
> >     >     on some distros.
> >     >     > > > > > We can switch to libbpf, but this will disable bpf if
> >     >     libbpf is not
> >     >     > > > > > installed
> >     >     > > > >
> >     >     > > > > That's better I think.
> >     >     > > > >
> >     >     > > > >
> >     >     > > > > >      2) It would be better if we can avoid shipping
> >     >     bytecodes
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > > This creates new dependencies: llvm + clang + ...
> >     >     > > > > > We would prefer byte code and ability to generate
> >     it if
> >     >     prerequisites
> >     >     > > > > > are installed.
> >     >     > > > >
> >     >     > > > > It's probably ok if we treat the bytecode as a kind of
> >     >     firmware.
> >     >     > > > That is explicitly *not* OK for inclusion in Fedora. They
> >     >     require that
> >     >     > > > BPF is compiled from source, and rejected my
> >     suggestion that
> >     >     it could
> >     >     > > > be considered a kind of firmware and thus have an
> >     exception
> >     >     from building
> >     >     > > > from source.
> >     >     > >
> >     >     > >
> >     >     > > Please refer what it was done in DPDK:
> >     >     > >
> >     >     > > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
> >     >     > >
> >     >     > > I don't think what proposed here makes anything different.
> >     >     >
> >     >     > I'm not convinced that what DPDK does is acceptable to
> >     Fedora either
> >     >     > based on the responses I've received when asking about BPF
> >     handling
> >     >     > during build.  I wouldn't suprise me, however, if this was
> >     simply
> >     >     > missed by reviewers when accepting DPDK into Fedora,
> >     because it is
> >     >     > not entirely obvious unless you are looking closely.
> >     >
> >     >     FWIW, I'm pushing back against the idea that we have to
> >     compile the
> >     >     BPF code from master source, as I think it is reasonable to
> >     have the
> >     >     program embedded as a static array in the source code
> >     similar to what
> >     >     DPDK does.  It doesn't feel much different from other places
> >     where
> >     >     apps
> >     >     use generated sources, and don't build them from the
> >     original source
> >     >     every time. eg "configure" is never re-generated from
> >     >     "configure.ac <http://configure.ac> <http://configure.ac>"
> >     >     by Fedora packagers, they just use the generated "configure"
> >     script
> >     >     as-is.
> >     >
> >     >     Regards,
> >     >     Daniel
> >     >     --
> >     >     |: https://berrange.com     -o-
> >     > https://www.flickr.com/photos/dberrange :|
> >     >     |: https://libvirt.org        -o-
> >     https://fstop138.berrange.com :|
> >     >     |: https://entangle-photo.org   -o-
> >     > https://www.instagram.com/dberrange :|
> >     >
> >
>
>
Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Yuri Benditovich 3 years, 6 months ago
On Wed, Nov 4, 2020 at 4:08 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> >
> >
> > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> > <mailto:jasowang@redhat.com>> wrote:
> >
> >
> >     On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> >     > Basic idea is to use eBPF to calculate and steer packets in TAP.
> >     > RSS(Receive Side Scaling) is used to distribute network packets
> >     to guest virtqueues
> >     > by calculating packet hash.
> >     > eBPF RSS allows us to use RSS with vhost TAP.
> >     >
> >     > This set of patches introduces the usage of eBPF for packet
> steering
> >     > and RSS hash calculation:
> >     > * RSS(Receive Side Scaling) is used to distribute network packets
> to
> >     > guest virtqueues by calculating packet hash
> >     > * eBPF RSS suppose to be faster than already existing 'software'
> >     > implementation in QEMU
> >     > * Additionally adding support for the usage of RSS with vhost
> >     >
> >     > Supported kernels: 5.8+
> >     >
> >     > Implementation notes:
> >     > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF
> program.
> >     > Added eBPF support to qemu directly through a system call, see the
> >     > bpf(2) for details.
> >     > The eBPF program is part of the qemu and presented as an array
> >     of bpf
> >     > instructions.
> >     > The program can be recompiled by provided Makefile.ebpf(need to
> >     adjust
> >     > 'linuxhdrs'),
> >     > although it's not required to build QEMU with eBPF support.
> >     > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> >     > 'Software' RSS used in the case of hash population and as a
> >     fallback option.
> >     > For vhost, the hash population feature is not reported to the
> guest.
> >     >
> >     > Please also see the documentation in PATCH 6/6.
> >     >
> >     > I am sending those patches as RFC to initiate the discussions
> >     and get
> >     > feedback on the following points:
> >     > * Fallback when eBPF is not supported by the kernel
> >
> >
> >     Yes, and it could also a lacking of CAP_BPF.
> >
> >
> >     > * Live migration to the kernel that doesn't have eBPF support
> >
> >
> >     Is there anything that we needs special treatment here?
> >
> > Possible case: rss=on, vhost=on, source system with kernel 5.8
> > (everything works) -> dest. system 5.6 (bpf does not work), the
> > adapter functions, but all the steering does not use proper queues.
>
>
> Right, I think we need to disable vhost on dest.
>
>
Is this acceptable to disable vhost at time of migration?


> >
> >
> >
> >     > * Integration with current QEMU build
> >
> >
> >     Yes, a question here:
> >
> >     1) Any reason for not using libbpf, e.g it has been shipped with some
> >     distros
> >
> >
> > We intentionally do not use libbpf, as it present only on some distros.
> > We can switch to libbpf, but this will disable bpf if libbpf is not
> > installed
>
>
> That's better I think.
>

We think the preferred way is to have an eBPF code built-in in QEMU (not
distribute it as a separate file).

Our initial idea was to not use the libbpf because it:
1. Does not create additional dependency during build time and during
run-time
2. Gives us smaller footprint of loadable eBPF blob inside qemu
3. Do not add too much code to QEMU

We can switch to libbpf, in this case:
1. Presence of dynamic library is not guaranteed on the target system
2. Static library is large
3. libbpf uses eBPF ELF which is significantly bigger than just the array
or instructions (May be we succeed to reduce the ELF to some suitable size
and still have it built-in)

Please let us know whether you still think libbpf is better and why.

Thanks


>
> >     2) It would be better if we can avoid shipping bytecodes
> >
> >
> >
> > This creates new dependencies: llvm + clang + ...
> > We would prefer byte code and ability to generate it if prerequisites
> > are installed.
>
>
> It's probably ok if we treat the bytecode as a kind of firmware.
>
> But in the long run, it's still worthwhile consider the qemu source is
> used for development and llvm/clang should be a common requirement for
> generating eBPF bytecode for host.
>
>
> >
> >
> >     > * Additional usage for eBPF for packet filtering
> >
> >
> >     Another interesting topics in to implement mac/vlan filters. And
> >     in the
> >     future, I plan to add mac based steering. All of these could be
> >     done via
> >     eBPF.
> >
> >
> > No problem, we can cooperate if needed
> >
> >
> >     >
> >     > Know issues:
> >     > * hash population not supported by eBPF RSS: 'software' RSS used
> >
> >
> >     Is this because there's not way to write to vnet header in
> >     STERRING BPF?
> >
> > Yes. We plan to submit changes for kernel to cooperate with BPF and
> > populate the hash, this work is in progress
>
>
> That would require a new type of eBPF program and may need some work on
> verifier.
>
>
May be need to allow loading of an additional type in tun.c, not only
socket filter (to use bpf_set_hash)
Also vhost and tun in kernel need to be aware of header extension for hash
population.


> Btw, macvtap is still lacking even steering ebpf program. Would you want
> to post a patch to support that?
>
>
Probably after we have full functioning BPF with TAP/TUN


>
> >
> >     > as a fallback, also, hash population feature is not reported to
> >     guests
> >     > with vhost.
> >     > * big-endian BPF support: for now, eBPF is disabled for
> >     big-endian systems.
> >
> >
> >     Are there any blocker for this?
> >
> >
> > No, can be added in v2
>
>
> Cool.
>
> Thanks
>
>
> >
> >     Just some quick questions after a glance of the codes. Will go
> >     through
> >     them tomorrow.
> >
> >     Thanks
> >
> >
> >     >
> >     > Andrew (6):
> >     >    Added SetSteeringEBPF method for NetClientState.
> >     >    ebpf: Added basic eBPF API.
> >     >    ebpf: Added eBPF RSS program.
> >     >    ebpf: Added eBPF RSS loader.
> >     >    virtio-net: Added eBPF RSS to virtio-net.
> >     >    docs: Added eBPF documentation.
> >     >
> >     >   MAINTAINERS                    |   6 +
> >     >   configure                      |  36 +++
> >     >   docs/ebpf.rst                  |  29 ++
> >     >   docs/ebpf_rss.rst              | 129 ++++++++
> >     >   ebpf/EbpfElf_to_C.py           |  67 ++++
> >     >   ebpf/Makefile.ebpf             |  38 +++
> >     >   ebpf/ebpf-stub.c               |  28 ++
> >     >   ebpf/ebpf.c                    | 107 +++++++
> >     >   ebpf/ebpf.h                    |  35 +++
> >     >   ebpf/ebpf_rss.c                | 178 +++++++++++
> >     >   ebpf/ebpf_rss.h                |  30 ++
> >     >   ebpf/meson.build               |   1 +
> >     >   ebpf/rss.bpf.c                 | 470 ++++++++++++++++++++++++++++
> >     >   ebpf/trace-events              |   4 +
> >     >   ebpf/trace.h                   |   2 +
> >     >   ebpf/tun_rss_steering.h        | 556
> >     +++++++++++++++++++++++++++++++++
> >     >   hw/net/vhost_net.c             |   2 +
> >     >   hw/net/virtio-net.c            | 120 ++++++-
> >     >   include/hw/virtio/virtio-net.h |   4 +
> >     >   include/net/net.h              |   2 +
> >     >   meson.build                    |   3 +
> >     >   net/tap-bsd.c                  |   5 +
> >     >   net/tap-linux.c                |  19 ++
> >     >   net/tap-solaris.c              |   5 +
> >     >   net/tap-stub.c                 |   5 +
> >     >   net/tap.c                      |   9 +
> >     >   net/tap_int.h                  |   1 +
> >     >   net/vhost-vdpa.c               |   2 +
> >     >   28 files changed, 1889 insertions(+), 4 deletions(-)
> >     >   create mode 100644 docs/ebpf.rst
> >     >   create mode 100644 docs/ebpf_rss.rst
> >     >   create mode 100644 ebpf/EbpfElf_to_C.py
> >     >   create mode 100755 ebpf/Makefile.ebpf
> >     >   create mode 100644 ebpf/ebpf-stub.c
> >     >   create mode 100644 ebpf/ebpf.c
> >     >   create mode 100644 ebpf/ebpf.h
> >     >   create mode 100644 ebpf/ebpf_rss.c
> >     >   create mode 100644 ebpf/ebpf_rss.h
> >     >   create mode 100644 ebpf/meson.build
> >     >   create mode 100644 ebpf/rss.bpf.c
> >     >   create mode 100644 ebpf/trace-events
> >     >   create mode 100644 ebpf/trace.h
> >     >   create mode 100644 ebpf/tun_rss_steering.h
> >     >
> >
>
>
Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Posted by Daniel P. Berrangé 3 years, 6 months ago
On Wed, Nov 04, 2020 at 01:49:05PM +0200, Yuri Benditovich wrote:
> On Wed, Nov 4, 2020 at 4:08 AM Jason Wang <jasowang@redhat.com> wrote:
> 
> >
> > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> > >
> > >
> > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> > > <mailto:jasowang@redhat.com>> wrote:
> > >
> > >
> > >     On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > >     > Basic idea is to use eBPF to calculate and steer packets in TAP.
> > >     > RSS(Receive Side Scaling) is used to distribute network packets
> > >     to guest virtqueues
> > >     > by calculating packet hash.
> > >     > eBPF RSS allows us to use RSS with vhost TAP.
> > >     >
> > >     > This set of patches introduces the usage of eBPF for packet
> > steering
> > >     > and RSS hash calculation:
> > >     > * RSS(Receive Side Scaling) is used to distribute network packets
> > to
> > >     > guest virtqueues by calculating packet hash
> > >     > * eBPF RSS suppose to be faster than already existing 'software'
> > >     > implementation in QEMU
> > >     > * Additionally adding support for the usage of RSS with vhost
> > >     >
> > >     > Supported kernels: 5.8+
> > >     >
> > >     > Implementation notes:
> > >     > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF
> > program.
> > >     > Added eBPF support to qemu directly through a system call, see the
> > >     > bpf(2) for details.
> > >     > The eBPF program is part of the qemu and presented as an array
> > >     of bpf
> > >     > instructions.
> > >     > The program can be recompiled by provided Makefile.ebpf(need to
> > >     adjust
> > >     > 'linuxhdrs'),
> > >     > although it's not required to build QEMU with eBPF support.
> > >     > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> > >     > 'Software' RSS used in the case of hash population and as a
> > >     fallback option.
> > >     > For vhost, the hash population feature is not reported to the
> > guest.
> > >     >
> > >     > Please also see the documentation in PATCH 6/6.
> > >     >
> > >     > I am sending those patches as RFC to initiate the discussions
> > >     and get
> > >     > feedback on the following points:
> > >     > * Fallback when eBPF is not supported by the kernel
> > >
> > >
> > >     Yes, and it could also a lacking of CAP_BPF.
> > >
> > >
> > >     > * Live migration to the kernel that doesn't have eBPF support
> > >
> > >
> > >     Is there anything that we needs special treatment here?
> > >
> > > Possible case: rss=on, vhost=on, source system with kernel 5.8
> > > (everything works) -> dest. system 5.6 (bpf does not work), the
> > > adapter functions, but all the steering does not use proper queues.
> >
> >
> > Right, I think we need to disable vhost on dest.
> >
> >
> Is this acceptable to disable vhost at time of migration?
> 
> 
> > >
> > >
> > >
> > >     > * Integration with current QEMU build
> > >
> > >
> > >     Yes, a question here:
> > >
> > >     1) Any reason for not using libbpf, e.g it has been shipped with some
> > >     distros
> > >
> > >
> > > We intentionally do not use libbpf, as it present only on some distros.
> > > We can switch to libbpf, but this will disable bpf if libbpf is not
> > > installed
> >
> >
> > That's better I think.
> >
> 
> We think the preferred way is to have an eBPF code built-in in QEMU (not
> distribute it as a separate file).
> 
> Our initial idea was to not use the libbpf because it:
> 1. Does not create additional dependency during build time and during
> run-time
> 2. Gives us smaller footprint of loadable eBPF blob inside qemu
> 3. Do not add too much code to QEMU
> 
> We can switch to libbpf, in this case:
> 1. Presence of dynamic library is not guaranteed on the target system

Again if a distro or users wants to use this feature in
QEMU they should be expected build the library.

> 2. Static library is large

QEMU doesn't support static linking for system emulators.  It may
happen to work at times but there's no expectations in this respect.

> 3. libbpf uses eBPF ELF which is significantly bigger than just the array
> or instructions (May be we succeed to reduce the ELF to some suitable size
> and still have it built-in)
> 
> Please let us know whether you still think libbpf is better and why.

It looks like both CLang and GCC compilers for BPF are moving towards
a world where they use BTF to get compile once, run everywhere portability
for the compiled bytecode. IIUC the libbpf is what is responsible for
processing the BTF data when loading it into the running kernel. This
all looks like a good thing in general. 

If we introduce BPF to QEMU without using libbpf, and then later decide
we absolutely need libbpf features, it creates an upgrade back compat
issue for existing deployments. It is better to use libbpf right from
the start, so we're set up to take full advantage of what it offers
long term.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|