[v2] implement cgroups v2 devices support

[libvirt] [PATCH v2 03/17] vircgroup: introduce virCgroupV2DevicesAttachProg

Posted by Pavel Hrdina 7 years ago

This function loads the BPF prog with prepared map into kernel and
attaches it into guest cgroup.  It can be also used to replace existing
program in the cgroup if we need to resize BPF map to store more rules
for devices. The old program will be closed and removed from kernel.

There are two possible ways how to create BPF program:

    - One way is to write simple C-like code which can by compiled into
      BPF object file which can be loaded into kernel using elfutils.

    - The second way is to define macros which looks like assembler
      instructions and can be used directly to create BPF program that
      can be directly loaded into kernel.

Since the program is not too complex we can use the second option.

If there is no program, all devices are allowed, if there is some
program it is executed and based on the exit status the access is
denied for 0 and allowed for 1.

Our program will follow these rules:

    - first it will try to look for the specific key using major and
      minor to see if there is any rule for that specific device

    - if there is no specific rule it will try to look for any rule that
      matches only major of the device

    - if there is no match with major it will try the same but with
      minor of the device

    - as the last attempt it will try to look for rule for all devices
      and if there is no match it will return 0 to deny that access

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
---
 src/libvirt_private.syms      |   1 +
 src/util/vircgrouppriv.h      |  10 +++
 src/util/vircgroupv2devices.c | 140 ++++++++++++++++++++++++++++++++++
 src/util/vircgroupv2devices.h |   5 ++
 4 files changed, 156 insertions(+)

diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 6a822f7d90..3fc91ce207 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -1612,6 +1612,7 @@ virCgroupV1Register;
 virCgroupV2Register;
 
 # util/vircgroupv2devices.h
+virCgroupV2DevicesAttachProg;
 virCgroupV2DevicesAvailable;
 
 # util/virclosecallbacks.h
diff --git a/src/util/vircgrouppriv.h b/src/util/vircgrouppriv.h
index a6fb3bb9f8..085fea375c 100644
--- a/src/util/vircgrouppriv.h
+++ b/src/util/vircgrouppriv.h
@@ -42,10 +42,20 @@ struct _virCgroupV1Controller {
 typedef struct _virCgroupV1Controller virCgroupV1Controller;
 typedef virCgroupV1Controller *virCgroupV1ControllerPtr;
 
+struct _virCgroupV2Devices {
+    int mapfd;
+    int progfd;
+    ssize_t count;
+    ssize_t max;
+};
+typedef struct _virCgroupV2Devices virCgroupV2Devices;
+typedef virCgroupV2Devices *virCgroupV2DevicesPtr;
+
 struct _virCgroupV2Controller {
     int controllers;
     char *mountPoint;
     char *placement;
+    virCgroupV2Devices devices;
 };
 typedef struct _virCgroupV2Controller virCgroupV2Controller;
 typedef virCgroupV2Controller *virCgroupV2ControllerPtr;
diff --git a/src/util/vircgroupv2devices.c b/src/util/vircgroupv2devices.c
index 10080d4fff..0b721a0aad 100644
--- a/src/util/vircgroupv2devices.c
+++ b/src/util/vircgroupv2devices.c
@@ -64,10 +64,150 @@ virCgroupV2DevicesAvailable(virCgroupPtr group)
     VIR_FORCE_CLOSE(cgroupfd);
     return ret;
 }
+
+
+static int
+virCgroupV2DevicesLoadProg(int mapfd)
+{
+# define VIR_CGROUP_BPF_LOOKUP \
+    /* prepare key param on stack */ \
+    VIR_BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -8), \
+    VIR_BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), \
+    VIR_BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), \
+    /* lookup key (major << 32) | minor in map */ \
+    VIR_BPF_LD_MAP_FD(BPF_REG_1, mapfd), \
+    VIR_BPF_CALL_INSN(BPF_FUNC_map_lookup_elem)
+
+# define VIR_CGROUP_BPF_CHECK_PERM \
+    /* if no key skip perm check */ \
+    VIR_BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6), \
+    /* get perms from map */ \
+    VIR_BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0), \
+    /* get perms from ctx */ \
+    VIR_BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_6, 0), \
+    /* (map perms) & (ctx perms) */ \
+    VIR_BPF_ALU64_REG(BPF_AND, BPF_REG_1, BPF_REG_2), \
+    /* if (map perms) & (ctx perms) == (ctx perms) exit, otherwise continue */ \
+    VIR_BPF_JMP_REG(BPF_JNE, BPF_REG_1, BPF_REG_2, 2), \
+    /* set ret 1 and exit */ \
+    VIR_BPF_MOV64_IMM(BPF_REG_0, 1), \
+    VIR_BPF_EXIT_INSN()
+
+    struct bpf_insn prog[] = {
+        /* save ctx, argument passed to BPF program */
+        VIR_BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+        /* get major from ctx and shift << 32 */
+        VIR_BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_6, 4),
+        VIR_BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 32),
+        /* get minor from ctx and | to shifted major */
+        VIR_BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_6, 8),
+        VIR_BPF_ALU64_REG(BPF_OR, BPF_REG_2, BPF_REG_3),
+        /* lookup ((major << 32) | minor) in map and check perms */
+        VIR_CGROUP_BPF_LOOKUP,
+        VIR_CGROUP_BPF_CHECK_PERM,
+
+        /* get major from ctx and shift << 32 */
+        VIR_BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_6, 4),
+        VIR_BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 32),
+        /* use -1 as minor and | to shifted major */
+        VIR_BPF_MOV32_IMM(BPF_REG_3, -1),
+        VIR_BPF_ALU64_REG(BPF_OR, BPF_REG_2, BPF_REG_3),
+        /* lookup ((major << 32) | -1) in map and check perms */
+        VIR_CGROUP_BPF_LOOKUP,
+        VIR_CGROUP_BPF_CHECK_PERM,
+
+        /* use -1 as major and shift << 32 */
+        VIR_BPF_MOV32_IMM(BPF_REG_2, -1),
+        VIR_BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 32),
+        /* get minor from ctx and | to shifted major */
+        VIR_BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_6, 8),
+        VIR_BPF_ALU64_REG(BPF_OR, BPF_REG_2, BPF_REG_3),
+        /* lookup ((-1 << 32) | minor) in map and check perms */
+        VIR_CGROUP_BPF_LOOKUP,
+        VIR_CGROUP_BPF_CHECK_PERM,
+
+        /* use -1 as key which means major = -1 and minor = -1 */
+        VIR_BPF_MOV64_IMM(BPF_REG_2, -1),
+        /* lookup -1 in map and check perms*/
+        VIR_CGROUP_BPF_LOOKUP,
+        VIR_CGROUP_BPF_CHECK_PERM,
+
+        /* no key was found, exit with 0 */
+        VIR_BPF_MOV64_IMM(BPF_REG_0, 0),
+        VIR_BPF_EXIT_INSN(),
+    };
+
+    return virBPFLoadProg(prog, BPF_PROG_TYPE_CGROUP_DEVICE, ARRAY_CARDINALITY(prog));
+}
+
+
+int
+virCgroupV2DevicesAttachProg(virCgroupPtr group,
+                             int mapfd,
+                             size_t max)
+{
+    int ret = -1;
+    int progfd = -1;
+    int cgroupfd = -1;
+    VIR_AUTOFREE(char *) path = NULL;
+
+    if (virCgroupPathOfController(group, VIR_CGROUP_CONTROLLER_DEVICES,
+                                  NULL, &path) < 0) {
+        goto cleanup;
+    }
+
+    progfd = virCgroupV2DevicesLoadProg(mapfd);
+    if (progfd < 0) {
+        virReportSystemError(errno, "%s", _("failed to load cgroup BPF prog"));
+        goto cleanup;
+    }
+
+    cgroupfd = open(path, O_RDONLY);
+    if (cgroupfd < 0) {
+        virReportSystemError(errno, _("unable to open '%s'"), path);
+        goto cleanup;
+    }
+
+    if (virBPFAttachProg(progfd, cgroupfd, BPF_CGROUP_DEVICE) < 0) {
+        virReportSystemError(errno, "%s", _("failed to attach cgroup BPF prog"));
+        goto cleanup;
+    }
+
+    if (group->unified.devices.progfd > 0) {
+        VIR_DEBUG("Closing existing program that was replaced by new one.");
+        VIR_FORCE_CLOSE(group->unified.devices.progfd);
+    }
+
+    group->unified.devices.progfd = progfd;
+    group->unified.devices.mapfd = mapfd;
+    group->unified.devices.max = max;
+    progfd = -1;
+    mapfd = -1;
+
+    ret = 0;
+ cleanup:
+    VIR_FORCE_CLOSE(cgroupfd);
+    VIR_FORCE_CLOSE(progfd);
+    VIR_FORCE_CLOSE(mapfd);
+    return ret;
+}
 #else /* !HAVE_DECL_BPF_CGROUP_DEVICE */
 bool
 virCgroupV2DevicesAvailable(virCgroupPtr group ATTRIBUTE_UNUSED)
 {
     return false;
 }
+
+
+int
+virCgroupV2DevicesAttachProg(virCgroupPtr group ATTRIBUTE_UNUSED,
+                             int mapfd ATTRIBUTE_UNUSED,
+                             size_t max ATTRIBUTE_UNUSED)
+{
+    virReportSystemError(ENOSYS, "%s",
+                         _("cgroups v2 BPF devices not supported "
+                           "with this kernel"));
+    return -1;
+}
 #endif /* !HAVE_DECL_BPF_CGROUP_DEVICE */
diff --git a/src/util/vircgroupv2devices.h b/src/util/vircgroupv2devices.h
index 2ab35681db..1ba87acb00 100644
--- a/src/util/vircgroupv2devices.h
+++ b/src/util/vircgroupv2devices.h
@@ -24,4 +24,9 @@
 bool
 virCgroupV2DevicesAvailable(virCgroupPtr group);
 
+int
+virCgroupV2DevicesAttachProg(virCgroupPtr group,
+                             int mapfd,
+                             size_t max);
+
 #endif /* LIBVIRT_VIRCGROUPV2DEVICES_H */
-- 
2.20.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 03/17] vircgroup: introduce virCgroupV2DevicesAttachProg

Posted by Daniel P. Berrangé 7 years ago

On Mon, Jan 14, 2019 at 04:47:37PM +0100, Pavel Hrdina wrote:
> This function loads the BPF prog with prepared map into kernel and
> attaches it into guest cgroup.  It can be also used to replace existing
> program in the cgroup if we need to resize BPF map to store more rules
> for devices. The old program will be closed and removed from kernel.
> 
> There are two possible ways how to create BPF program:
> 
>     - One way is to write simple C-like code which can by compiled into
>       BPF object file which can be loaded into kernel using elfutils.
> 
>     - The second way is to define macros which looks like assembler
>       instructions and can be used directly to create BPF program that
>       can be directly loaded into kernel.
> 
> Since the program is not too complex we can use the second option.

I'm really not liking this to be honest. Even for this "simple" program,
I struggle to understand what the code below is doing. It is basically
assembly language, which is inevitably hard to follow even for simple
things.

I'd like to see us use the BPF C compiler to build the loaded
program unless there's a compelling reason why it won't work

> +
> +
> +static int
> +virCgroupV2DevicesLoadProg(int mapfd)
> +{
> +# define VIR_CGROUP_BPF_LOOKUP \
> +    /* prepare key param on stack */ \
> +    VIR_BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -8), \
> +    VIR_BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), \
> +    VIR_BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), \
> +    /* lookup key (major << 32) | minor in map */ \
> +    VIR_BPF_LD_MAP_FD(BPF_REG_1, mapfd), \
> +    VIR_BPF_CALL_INSN(BPF_FUNC_map_lookup_elem)
> +
> +# define VIR_CGROUP_BPF_CHECK_PERM \
> +    /* if no key skip perm check */ \
> +    VIR_BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6), \
> +    /* get perms from map */ \
> +    VIR_BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0), \
> +    /* get perms from ctx */ \
> +    VIR_BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_6, 0), \
> +    /* (map perms) & (ctx perms) */ \
> +    VIR_BPF_ALU64_REG(BPF_AND, BPF_REG_1, BPF_REG_2), \
> +    /* if (map perms) & (ctx perms) == (ctx perms) exit, otherwise continue */ \
> +    VIR_BPF_JMP_REG(BPF_JNE, BPF_REG_1, BPF_REG_2, 2), \
> +    /* set ret 1 and exit */ \
> +    VIR_BPF_MOV64_IMM(BPF_REG_0, 1), \
> +    VIR_BPF_EXIT_INSN()
> +
> +    struct bpf_insn prog[] = {
> +        /* save ctx, argument passed to BPF program */
> +        VIR_BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
> +
> +        /* get major from ctx and shift << 32 */
> +        VIR_BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_6, 4),
> +        VIR_BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 32),
> +        /* get minor from ctx and | to shifted major */
> +        VIR_BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_6, 8),
> +        VIR_BPF_ALU64_REG(BPF_OR, BPF_REG_2, BPF_REG_3),
> +        /* lookup ((major << 32) | minor) in map and check perms */
> +        VIR_CGROUP_BPF_LOOKUP,
> +        VIR_CGROUP_BPF_CHECK_PERM,
> +
> +        /* get major from ctx and shift << 32 */
> +        VIR_BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_6, 4),
> +        VIR_BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 32),
> +        /* use -1 as minor and | to shifted major */
> +        VIR_BPF_MOV32_IMM(BPF_REG_3, -1),
> +        VIR_BPF_ALU64_REG(BPF_OR, BPF_REG_2, BPF_REG_3),
> +        /* lookup ((major << 32) | -1) in map and check perms */
> +        VIR_CGROUP_BPF_LOOKUP,
> +        VIR_CGROUP_BPF_CHECK_PERM,
> +
> +        /* use -1 as major and shift << 32 */
> +        VIR_BPF_MOV32_IMM(BPF_REG_2, -1),
> +        VIR_BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 32),
> +        /* get minor from ctx and | to shifted major */
> +        VIR_BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_6, 8),
> +        VIR_BPF_ALU64_REG(BPF_OR, BPF_REG_2, BPF_REG_3),
> +        /* lookup ((-1 << 32) | minor) in map and check perms */
> +        VIR_CGROUP_BPF_LOOKUP,
> +        VIR_CGROUP_BPF_CHECK_PERM,
> +
> +        /* use -1 as key which means major = -1 and minor = -1 */
> +        VIR_BPF_MOV64_IMM(BPF_REG_2, -1),
> +        /* lookup -1 in map and check perms*/
> +        VIR_CGROUP_BPF_LOOKUP,
> +        VIR_CGROUP_BPF_CHECK_PERM,
> +
> +        /* no key was found, exit with 0 */
> +        VIR_BPF_MOV64_IMM(BPF_REG_0, 0),
> +        VIR_BPF_EXIT_INSN(),
> +    };
> +
> +    return virBPFLoadProg(prog, BPF_PROG_TYPE_CGROUP_DEVICE, ARRAY_CARDINALITY(prog));


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 03/17] vircgroup: introduce virCgroupV2DevicesAttachProg

Posted by Pavel Hrdina 7 years ago

On Mon, Jan 14, 2019 at 04:43:34PM +0000, Daniel P. Berrangé wrote:
> On Mon, Jan 14, 2019 at 04:47:37PM +0100, Pavel Hrdina wrote:
> > This function loads the BPF prog with prepared map into kernel and
> > attaches it into guest cgroup.  It can be also used to replace existing
> > program in the cgroup if we need to resize BPF map to store more rules
> > for devices. The old program will be closed and removed from kernel.
> > 
> > There are two possible ways how to create BPF program:
> > 
> >     - One way is to write simple C-like code which can by compiled into
> >       BPF object file which can be loaded into kernel using elfutils.
> > 
> >     - The second way is to define macros which looks like assembler
> >       instructions and can be used directly to create BPF program that
> >       can be directly loaded into kernel.
> > 
> > Since the program is not too complex we can use the second option.
> 
> I'm really not liking this to be honest. Even for this "simple" program,
> I struggle to understand what the code below is doing. It is basically
> assembly language, which is inevitably hard to follow even for simple
> things.
> 
> I'd like to see us use the BPF C compiler to build the loaded
> program unless there's a compelling reason why it won't work

I personally don't like it as well, but the issue is that for the
compilation we would have to use clang/llvm, AFAIK there is no support
to compile BPF in GCC.  Another issue is the loading part, you need to
read the object file, extract correct sections which is not trivial and
in order to have different size of maps we would have to modify the
loaded data from the compiled object file.

I can try to prepare alternative patches for that but it might me even
more complex to understand than short assembly program.

Pavel

> > +
> > +
> > +static int
> > +virCgroupV2DevicesLoadProg(int mapfd)
> > +{
> > +# define VIR_CGROUP_BPF_LOOKUP \
> > +    /* prepare key param on stack */ \
> > +    VIR_BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -8), \
> > +    VIR_BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), \
> > +    VIR_BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), \
> > +    /* lookup key (major << 32) | minor in map */ \
> > +    VIR_BPF_LD_MAP_FD(BPF_REG_1, mapfd), \
> > +    VIR_BPF_CALL_INSN(BPF_FUNC_map_lookup_elem)
> > +
> > +# define VIR_CGROUP_BPF_CHECK_PERM \
> > +    /* if no key skip perm check */ \
> > +    VIR_BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6), \
> > +    /* get perms from map */ \
> > +    VIR_BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0), \
> > +    /* get perms from ctx */ \
> > +    VIR_BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_6, 0), \
> > +    /* (map perms) & (ctx perms) */ \
> > +    VIR_BPF_ALU64_REG(BPF_AND, BPF_REG_1, BPF_REG_2), \
> > +    /* if (map perms) & (ctx perms) == (ctx perms) exit, otherwise continue */ \
> > +    VIR_BPF_JMP_REG(BPF_JNE, BPF_REG_1, BPF_REG_2, 2), \
> > +    /* set ret 1 and exit */ \
> > +    VIR_BPF_MOV64_IMM(BPF_REG_0, 1), \
> > +    VIR_BPF_EXIT_INSN()
> > +
> > +    struct bpf_insn prog[] = {
> > +        /* save ctx, argument passed to BPF program */
> > +        VIR_BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
> > +
> > +        /* get major from ctx and shift << 32 */
> > +        VIR_BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_6, 4),
> > +        VIR_BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 32),
> > +        /* get minor from ctx and | to shifted major */
> > +        VIR_BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_6, 8),
> > +        VIR_BPF_ALU64_REG(BPF_OR, BPF_REG_2, BPF_REG_3),
> > +        /* lookup ((major << 32) | minor) in map and check perms */
> > +        VIR_CGROUP_BPF_LOOKUP,
> > +        VIR_CGROUP_BPF_CHECK_PERM,
> > +
> > +        /* get major from ctx and shift << 32 */
> > +        VIR_BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_6, 4),
> > +        VIR_BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 32),
> > +        /* use -1 as minor and | to shifted major */
> > +        VIR_BPF_MOV32_IMM(BPF_REG_3, -1),
> > +        VIR_BPF_ALU64_REG(BPF_OR, BPF_REG_2, BPF_REG_3),
> > +        /* lookup ((major << 32) | -1) in map and check perms */
> > +        VIR_CGROUP_BPF_LOOKUP,
> > +        VIR_CGROUP_BPF_CHECK_PERM,
> > +
> > +        /* use -1 as major and shift << 32 */
> > +        VIR_BPF_MOV32_IMM(BPF_REG_2, -1),
> > +        VIR_BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 32),
> > +        /* get minor from ctx and | to shifted major */
> > +        VIR_BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_6, 8),
> > +        VIR_BPF_ALU64_REG(BPF_OR, BPF_REG_2, BPF_REG_3),
> > +        /* lookup ((-1 << 32) | minor) in map and check perms */
> > +        VIR_CGROUP_BPF_LOOKUP,
> > +        VIR_CGROUP_BPF_CHECK_PERM,
> > +
> > +        /* use -1 as key which means major = -1 and minor = -1 */
> > +        VIR_BPF_MOV64_IMM(BPF_REG_2, -1),
> > +        /* lookup -1 in map and check perms*/
> > +        VIR_CGROUP_BPF_LOOKUP,
> > +        VIR_CGROUP_BPF_CHECK_PERM,
> > +
> > +        /* no key was found, exit with 0 */
> > +        VIR_BPF_MOV64_IMM(BPF_REG_0, 0),
> > +        VIR_BPF_EXIT_INSN(),
> > +    };
> > +
> > +    return virBPFLoadProg(prog, BPF_PROG_TYPE_CGROUP_DEVICE, ARRAY_CARDINALITY(prog));
> 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 03/17] vircgroup: introduce virCgroupV2DevicesAttachProg

Posted by Daniel P. Berrangé 7 years ago

On Tue, Jan 15, 2019 at 09:58:47AM +0100, Pavel Hrdina wrote:
> On Mon, Jan 14, 2019 at 04:43:34PM +0000, Daniel P. Berrangé wrote:
> > On Mon, Jan 14, 2019 at 04:47:37PM +0100, Pavel Hrdina wrote:
> > > This function loads the BPF prog with prepared map into kernel and
> > > attaches it into guest cgroup.  It can be also used to replace existing
> > > program in the cgroup if we need to resize BPF map to store more rules
> > > for devices. The old program will be closed and removed from kernel.
> > > 
> > > There are two possible ways how to create BPF program:
> > > 
> > >     - One way is to write simple C-like code which can by compiled into
> > >       BPF object file which can be loaded into kernel using elfutils.
> > > 
> > >     - The second way is to define macros which looks like assembler
> > >       instructions and can be used directly to create BPF program that
> > >       can be directly loaded into kernel.
> > > 
> > > Since the program is not too complex we can use the second option.
> > 
> > I'm really not liking this to be honest. Even for this "simple" program,
> > I struggle to understand what the code below is doing. It is basically
> > assembly language, which is inevitably hard to follow even for simple
> > things.
> > 
> > I'd like to see us use the BPF C compiler to build the loaded
> > program unless there's a compelling reason why it won't work
> 
> I personally don't like it as well, but the issue is that for the
> compilation we would have to use clang/llvm, AFAIK there is no support
> to compile BPF in GCC.  Another issue is the loading part, you need to
> read the object file, extract correct sections which is not trivial and
> in order to have different size of maps we would have to modify the
> loaded data from the compiled object file.

I don't see a problem with using clang for BPF.  Reading the DPDK lists,
I see you can do something like this:

  $ clang -I iproute2/include/ -O2 -emit-llvm -c tap_bpf_program.c -o - \
        | llc -march=bpf -filetype=obj -o tap_bpf_program.o
  $ objdump -j l3_l4 -s tap_bpf_program.o > tap_bpf_program.bin

where the tap_bpf_program.bin should be the raw instructions to be
loaded.

Supposedly they also have a script that turns the .bin file into a
C byte array which they then compile into their binary, so they
can just load the uint8[] directly, instead of from a separate file

The matter of dynamically sized device maps is more tricky. Assuming
the device map size is basically a list of (major,minor) numbers,
that's 2 bytes per entry. We could easily do a fixed size 50 elements
giving 100 bytes per guest if that makes it easier. I guess the hard
thing is actually getting the dynamic data into the code.

> I can try to prepare alternative patches for that but it might me even
> more complex to understand than short assembly program.

Maybe there's a compromise/hybrid that would work simpler ? eg is
it possible to structure things so all the checking code is
statically compiled, and then we dynamically generate the instructions
to define the list of devices and append that to the main static
code. ?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 03/17] vircgroup: introduce virCgroupV2DevicesAttachProg

Posted by Pavel Hrdina 7 years ago

On Tue, Jan 15, 2019 at 10:14:33AM +0000, Daniel P. Berrangé wrote:
> On Tue, Jan 15, 2019 at 09:58:47AM +0100, Pavel Hrdina wrote:
> > On Mon, Jan 14, 2019 at 04:43:34PM +0000, Daniel P. Berrangé wrote:
> > > On Mon, Jan 14, 2019 at 04:47:37PM +0100, Pavel Hrdina wrote:
> > > > This function loads the BPF prog with prepared map into kernel and
> > > > attaches it into guest cgroup.  It can be also used to replace existing
> > > > program in the cgroup if we need to resize BPF map to store more rules
> > > > for devices. The old program will be closed and removed from kernel.
> > > > 
> > > > There are two possible ways how to create BPF program:
> > > > 
> > > >     - One way is to write simple C-like code which can by compiled into
> > > >       BPF object file which can be loaded into kernel using elfutils.
> > > > 
> > > >     - The second way is to define macros which looks like assembler
> > > >       instructions and can be used directly to create BPF program that
> > > >       can be directly loaded into kernel.
> > > > 
> > > > Since the program is not too complex we can use the second option.
> > > 
> > > I'm really not liking this to be honest. Even for this "simple" program,
> > > I struggle to understand what the code below is doing. It is basically
> > > assembly language, which is inevitably hard to follow even for simple
> > > things.
> > > 
> > > I'd like to see us use the BPF C compiler to build the loaded
> > > program unless there's a compelling reason why it won't work
> > 
> > I personally don't like it as well, but the issue is that for the
> > compilation we would have to use clang/llvm, AFAIK there is no support
> > to compile BPF in GCC.  Another issue is the loading part, you need to
> > read the object file, extract correct sections which is not trivial and
> > in order to have different size of maps we would have to modify the
> > loaded data from the compiled object file.
> 
> I don't see a problem with using clang for BPF.  Reading the DPDK lists,
> I see you can do something like this:
> 
>   $ clang -I iproute2/include/ -O2 -emit-llvm -c tap_bpf_program.c -o - \
>         | llc -march=bpf -filetype=obj -o tap_bpf_program.o
>   $ objdump -j l3_l4 -s tap_bpf_program.o > tap_bpf_program.bin
> 
> where the tap_bpf_program.bin should be the raw instructions to be
> loaded.

There is no technical problem with using clang for BPF, the only issue
that I have is that we will require clang to compile libvirt from
source code.

> Supposedly they also have a script that turns the .bin file into a
> C byte array which they then compile into their binary, so they
> can just load the uint8[] directly, instead of from a separate file

This is not an issue, there is a sample code in kernel that uses
elfutils to load the bpf.obj file into C byte array, just different
method how the get the list of instructions.

> The matter of dynamically sized device maps is more tricky. Assuming
> the device map size is basically a list of (major,minor) numbers,
> that's 2 bytes per entry. We could easily do a fixed size 50 elements
> giving 100 bytes per guest if that makes it easier. I guess the hard
> thing is actually getting the dynamic data into the code.

50 elements per guest will not be good enough.  Yes, in most cases it's
perfectly OK, but there are some users having hundreds of disks inside
a guest and we would break their usage and since we don't have any
written limitation of how many disk/devices can be inside a guest we
have to be able to store unlimited number of devices into the map until
we run out of memory.

> > I can try to prepare alternative patches for that but it might me even
> > more complex to understand than short assembly program.
> 
> Maybe there's a compromise/hybrid that would work simpler ? eg is
> it possible to structure things so all the checking code is
> statically compiled, and then we dynamically generate the instructions
> to define the list of devices and append that to the main static
> code. ?

This is not an issue, the list of devices is stored into the BPF map
from user-space (libvirt) and the BPF map is used by the BPF program
in the kernel.  The limits are that you cannot reallocate the BPF map
(this is solved by creating new program with bigger map) and you cannot
change the BPF map FD while the program is loaded into kernel.

Pavel
 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 03/17] vircgroup: introduce virCgroupV2DevicesAttachProg

Posted by Pavel Hrdina 7 years ago

On Tue, Jan 15, 2019 at 01:07:55PM +0100, Pavel Hrdina wrote:
> On Tue, Jan 15, 2019 at 10:14:33AM +0000, Daniel P. Berrangé wrote:
> > On Tue, Jan 15, 2019 at 09:58:47AM +0100, Pavel Hrdina wrote:
> > > On Mon, Jan 14, 2019 at 04:43:34PM +0000, Daniel P. Berrangé wrote:
> > > > On Mon, Jan 14, 2019 at 04:47:37PM +0100, Pavel Hrdina wrote:
> > > > > This function loads the BPF prog with prepared map into kernel and
> > > > > attaches it into guest cgroup.  It can be also used to replace existing
> > > > > program in the cgroup if we need to resize BPF map to store more rules
> > > > > for devices. The old program will be closed and removed from kernel.
> > > > > 
> > > > > There are two possible ways how to create BPF program:
> > > > > 
> > > > >     - One way is to write simple C-like code which can by compiled into
> > > > >       BPF object file which can be loaded into kernel using elfutils.
> > > > > 
> > > > >     - The second way is to define macros which looks like assembler
> > > > >       instructions and can be used directly to create BPF program that
> > > > >       can be directly loaded into kernel.
> > > > > 
> > > > > Since the program is not too complex we can use the second option.
> > > > 
> > > > I'm really not liking this to be honest. Even for this "simple" program,
> > > > I struggle to understand what the code below is doing. It is basically
> > > > assembly language, which is inevitably hard to follow even for simple
> > > > things.
> > > > 
> > > > I'd like to see us use the BPF C compiler to build the loaded
> > > > program unless there's a compelling reason why it won't work
> > > 
> > > I personally don't like it as well, but the issue is that for the
> > > compilation we would have to use clang/llvm, AFAIK there is no support
> > > to compile BPF in GCC.  Another issue is the loading part, you need to
> > > read the object file, extract correct sections which is not trivial and
> > > in order to have different size of maps we would have to modify the
> > > loaded data from the compiled object file.
> > 
> > I don't see a problem with using clang for BPF.  Reading the DPDK lists,
> > I see you can do something like this:
> > 
> >   $ clang -I iproute2/include/ -O2 -emit-llvm -c tap_bpf_program.c -o - \
> >         | llc -march=bpf -filetype=obj -o tap_bpf_program.o
> >   $ objdump -j l3_l4 -s tap_bpf_program.o > tap_bpf_program.bin
> > 
> > where the tap_bpf_program.bin should be the raw instructions to be
> > loaded.
> 
> There is no technical problem with using clang for BPF, the only issue
> that I have is that we will require clang to compile libvirt from
> source code.
> 
> > Supposedly they also have a script that turns the .bin file into a
> > C byte array which they then compile into their binary, so they
> > can just load the uint8[] directly, instead of from a separate file
> 
> This is not an issue, there is a sample code in kernel that uses
> elfutils to load the bpf.obj file into C byte array, just different
> method how the get the list of instructions.
> 
> > The matter of dynamically sized device maps is more tricky. Assuming
> > the device map size is basically a list of (major,minor) numbers,
> > that's 2 bytes per entry. We could easily do a fixed size 50 elements
> > giving 100 bytes per guest if that makes it easier. I guess the hard
> > thing is actually getting the dynamic data into the code.
> 
> 50 elements per guest will not be good enough.  Yes, in most cases it's
> perfectly OK, but there are some users having hundreds of disks inside
> a guest and we would break their usage and since we don't have any
> written limitation of how many disk/devices can be inside a guest we
> have to be able to store unlimited number of devices into the map until
> we run out of memory.
> 
> > > I can try to prepare alternative patches for that but it might me even
> > > more complex to understand than short assembly program.
> > 
> > Maybe there's a compromise/hybrid that would work simpler ? eg is
> > it possible to structure things so all the checking code is
> > statically compiled, and then we dynamically generate the instructions
> > to define the list of devices and append that to the main static
> > code. ?

This compromise would not work.  The dynamic data are stored in the BPF
map and the program uses only the map FD which is hardcoded into the
program itself.  The map FD is placed into correct register before we
call bpf_map_lookup function.  The only dynamic part of the program is
the map FD which is used on 4 places in the program.

I was also looking into the possibility of using elfutils as kernel is
using in one of the samples and that feels more complex than this simple
assembler like program.  We would have to open the binary file generated
by clang, load map data and instruction data, update the map size in the
loaded map data and do syscall in order to create the map, get the map
FD returned by syscall, inject that FD into the loaded instructions and
do syscall to create the program.  This IMHO sounds more complex and
than the "simple" assembler like program.

As a compromise we could have the C code as a comment in our source
files right before the actual assembler like version with instructions
how to use clang and llvm-objdump to get the assembler instructions.
That's what I've done when I was implementing the program.  The only
optimization was to add two macros and change some jump logic but we
can remove that optimization and make it 1:1 to what we get from
llvm-objdump.

Pavel

> This is not an issue, the list of devices is stored into the BPF map
> from user-space (libvirt) and the BPF map is used by the BPF program
> in the kernel.  The limits are that you cannot reallocate the BPF map
> (this is solved by creating new program with bigger map) and you cannot
> change the BPF map FD while the program is loaded into kernel.
> 
> Pavel
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list