From nobody Mon Feb 9 11:31:58 2026 Received: from out-171.mta0.migadu.com (out-171.mta0.migadu.com [91.218.175.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B510385EE0; Wed, 4 Feb 2026 09:01:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770195710; cv=none; b=rKNtL7YIOS+Dt0dZFwvGteBM3GCKB509FI7zoycPrC8yj6qlnwTQdLb+8A2n1X4QBKj/9EtwJbChLuODptu1hVjYJHUNGHSJFXTAmhadJ7D7NmNdHHZpW0z9qapT+6g0s90ThCF2mmrLre/EnExcexLDRWjuUyvKCrYpGjVRgb0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770195710; c=relaxed/simple; bh=rSELx/yGAtIEiQZKzWnq0QaCaJUPJvz7cKvLECa8iSE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IBPSnezvp9ppX5X53N2+s03ExRo1LNeQXisAQFRJ7X7HFB1AnhTq4YHQTbpHcKmYYmAozrMGK4exMxvh/nqtX2+btJF322UvHr7+F6e2czxF14/CUD3g1qfpr4s8AyxDYRE/2ryhFwwXlqeaROYKNVYSeycCC8z+iTmbJSArMzQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=bQCrR7X2; arc=none smtp.client-ip=91.218.175.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="bQCrR7X2" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1770195707; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rv/fHxt4+X3oWhp4filMRhQl39d1cdrW/sfgMsICQzE=; b=bQCrR7X2r1KY38L6/IQKxYBfiDuKDtwn8zePZYYGVObft9dokGTHhq00IwOtCQOBzXAy0E 3go5mBFxuE3UJGHhTlaHiMvm0tBMLeRTw40+mBYvuelnAfblGCQ9aHZUb5343daZa9L+VK LAjM8u3aYrM/2yP9ACSLMHTyTQmH9pE= From: Hui Zhu To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Shuah Khan , Peter Zijlstra , Miguel Ojeda , Nathan Chancellor , Kees Cook , Tejun Heo , Jeff Xu , mkoutny@suse.com, Jan Hendrik Farr , Christian Brauner , Randy Dunlap , Brian Gerst , Masahiro Yamada , davem@davemloft.net, Jakub Kicinski , Jesper Dangaard Brouer , JP Kobryn , Willem de Bruijn , Jason Xing , Paul Chaignon , Anton Protopopov , Amery Hung , Chen Ridong , Lance Yang , Jiayuan Chen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Hui Zhu , Geliang Tang Subject: [RFC PATCH bpf-next v6 12/12] samples/bpf: Add memcg priority control example Date: Wed, 4 Feb 2026 17:00:08 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Hui Zhu Add a sample program to demonstrate a practical use case for the `memcg_bpf_ops` feature: priority-based memory throttling. The sample consists of a BPF program and a userspace loader: 1. memcg.bpf.c: A BPF program that monitors PGFAULT events on a high-priority cgroup. When activity exceeds a threshold, it uses the `get_high_delay_ms`, `below_low`, or `below_min` hooks to apply pressure on a low-priority cgroup. 2. memcg.c: A userspace loader that configures and attaches the BPF program. It takes command-line arguments for the high and low priority cgroup paths, a pressure threshold, and the desired throttling delay (`over_high_ms`). This provides a clear, working example of how to implement a dynamic, priority-aware memory management policy. A user can create two cgroups, run workloads of different priorities, and observe the low-priority workload being throttled to protect the high-priority one. Example usage: # ./memcg --low_path /sys/fs/cgroup/low \ # --high_path /sys/fs/cgroup/high \ # --threshold 100 --over_high_ms 1024 Signed-off-by: Geliang Tang Signed-off-by: Hui Zhu --- MAINTAINERS | 2 + samples/bpf/.gitignore | 1 + samples/bpf/Makefile | 8 +- samples/bpf/memcg.bpf.c | 130 +++++++++++++++ samples/bpf/memcg.c | 343 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 483 insertions(+), 1 deletion(-) create mode 100644 samples/bpf/memcg.bpf.c create mode 100644 samples/bpf/memcg.c diff --git a/MAINTAINERS b/MAINTAINERS index 7e07bb330eae..819ef271e011 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6470,6 +6470,8 @@ F: mm/memcontrol-v1.c F: mm/memcontrol-v1.h F: mm/page_counter.c F: mm/swap_cgroup.c +F: samples/bpf/memcg.bpf.c +F: samples/bpf/memcg.c F: samples/cgroup/* F: tools/testing/selftests/bpf/prog_tests/memcg_ops.c F: tools/testing/selftests/bpf/progs/memcg_ops.c diff --git a/samples/bpf/.gitignore b/samples/bpf/.gitignore index 0002cd359fb1..0de6569cdefd 100644 --- a/samples/bpf/.gitignore +++ b/samples/bpf/.gitignore @@ -49,3 +49,4 @@ iperf.* /vmlinux.h /bpftool/ /libbpf/ +memcg diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 95a4fa1f1e44..b00698bdc53b 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -37,6 +37,7 @@ tprogs-y +=3D xdp_fwd tprogs-y +=3D task_fd_query tprogs-y +=3D ibumad tprogs-y +=3D hbm +tprogs-y +=3D memcg =20 # Libbpf dependencies LIBBPF_SRC =3D $(TOOLS_PATH)/lib/bpf @@ -122,6 +123,7 @@ always-y +=3D task_fd_query_kern.o always-y +=3D ibumad_kern.o always-y +=3D hbm_out_kern.o always-y +=3D hbm_edt_kern.o +always-y +=3D memcg.bpf.o =20 COMMON_CFLAGS =3D $(TPROGS_USER_CFLAGS) TPROGS_LDFLAGS =3D $(TPROGS_USER_LDFLAGS) @@ -289,6 +291,8 @@ $(obj)/hbm_out_kern.o: $(src)/hbm.h $(src)/hbm_kern.h $(obj)/hbm.o: $(src)/hbm.h $(obj)/hbm_edt_kern.o: $(src)/hbm.h $(src)/hbm_kern.h =20 +memcg: $(obj)/memcg.skel.h + # Override includes for xdp_sample_user.o because $(srctree)/usr/include in # TPROGS_CFLAGS causes conflicts XDP_SAMPLE_CFLAGS +=3D -Wall -O2 \ @@ -347,11 +351,13 @@ $(obj)/%.bpf.o: $(src)/%.bpf.c $(obj)/vmlinux.h $(src= )/xdp_sample.bpf.h $(src)/x -I$(LIBBPF_INCLUDE) $(CLANG_SYS_INCLUDES) \ -c $(filter %.bpf.c,$^) -o $@ =20 -LINKED_SKELS :=3D xdp_router_ipv4.skel.h +LINKED_SKELS :=3D xdp_router_ipv4.skel.h memcg.skel.h clean-files +=3D $(LINKED_SKELS) =20 xdp_router_ipv4.skel.h-deps :=3D xdp_router_ipv4.bpf.o xdp_sample.bpf.o =20 +memcg.skel.h-deps :=3D memcg.bpf.o + LINKED_BPF_SRCS :=3D $(patsubst %.bpf.o,%.bpf.c,$(foreach skel,$(LINKED_SK= ELS),$($(skel)-deps))) =20 BPF_SRCS_LINKED :=3D $(notdir $(wildcard $(src)/*.bpf.c)) diff --git a/samples/bpf/memcg.bpf.c b/samples/bpf/memcg.bpf.c new file mode 100644 index 000000000000..97c5897933c7 --- /dev/null +++ b/samples/bpf/memcg.bpf.c @@ -0,0 +1,130 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include + +#define ONE_SECOND_NS 1000000000 + +struct local_config { + u64 threshold; + u64 high_cgroup_id; + bool use_below_low; + bool use_below_min; + unsigned int over_high_ms; +} local_config; + +struct AggregationData { + u64 sum; + u64 window_start_ts; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, u32); + __type(value, struct AggregationData); +} aggregation_map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, u32); + __type(value, u64); +} trigger_ts_map SEC(".maps"); + +SEC("tp/memcg/count_memcg_events") +int +handle_count_memcg_events(struct trace_event_raw_memcg_rstat_events *ctx) +{ + u32 key =3D 0; + struct AggregationData *data; + u64 current_ts; + + if (ctx->id !=3D local_config.high_cgroup_id || + (ctx->item !=3D PGFAULT)) + goto out; + + data =3D bpf_map_lookup_elem(&aggregation_map, &key); + if (!data) + goto out; + + current_ts =3D bpf_ktime_get_ns(); + + if (current_ts - data->window_start_ts < ONE_SECOND_NS) { + data->sum +=3D ctx->val; + } else { + data->window_start_ts =3D current_ts; + data->sum =3D ctx->val; + } + + if (data->sum > local_config.threshold) { + bpf_map_update_elem(&trigger_ts_map, &key, ¤t_ts, + BPF_ANY); + data->sum =3D 0; + data->window_start_ts =3D current_ts; + } + +out: + return 0; +} + +static bool need_threshold(void) +{ + u32 key =3D 0; + u64 *trigger_ts; + bool ret =3D false; + u64 current_ts; + + trigger_ts =3D bpf_map_lookup_elem(&trigger_ts_map, &key); + if (!trigger_ts || *trigger_ts =3D=3D 0) + goto out; + + current_ts =3D bpf_ktime_get_ns(); + + if (current_ts - *trigger_ts < ONE_SECOND_NS) + ret =3D true; + +out: + return ret; +} + +SEC("struct_ops/below_low") +bool below_low_impl(struct mem_cgroup *memcg) +{ + if (!local_config.use_below_low) + return false; + + return need_threshold(); +} + +SEC("struct_ops/below_min") +bool below_min_impl(struct mem_cgroup *memcg) +{ + if (!local_config.use_below_min) + return false; + + return need_threshold(); +} + +SEC("struct_ops/get_high_delay_ms") +unsigned int get_high_delay_ms_impl(struct mem_cgroup *memcg) +{ + if (local_config.over_high_ms && need_threshold()) + return local_config.over_high_ms; + + return 0; +} + +SEC(".struct_ops.link") +struct memcg_bpf_ops high_mcg_ops =3D { + .below_low =3D (void *)below_low_impl, + .below_min =3D (void *)below_min_impl, +}; + +SEC(".struct_ops.link") +struct memcg_bpf_ops low_mcg_ops =3D { + .get_high_delay_ms =3D (void *)get_high_delay_ms_impl, +}; + +char LICENSE[] SEC("license") =3D "GPL"; diff --git a/samples/bpf/memcg.c b/samples/bpf/memcg.c new file mode 100644 index 000000000000..0ed174608a15 --- /dev/null +++ b/samples/bpf/memcg.c @@ -0,0 +1,343 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef __MEMCG_RSTAT_SIMPLE_BPF_SKEL_H__ +#define u64 uint64_t +#endif + +struct local_config { + u64 threshold; + u64 high_cgroup_id; + bool use_below_low; + bool use_below_min; + unsigned int over_high_ms; +} local_config; + +#include "memcg.skel.h" + +static bool exiting; + +static void sig_handler(int sig) +{ + exiting =3D true; +} + +static void usage(const char *name) +{ + fprintf(stderr, + "Usage: %s --low_path=3D --high_path=3D \\\n" + " --threshold=3D [OPTIONS]\n\n", + name); + fprintf(stderr, "Required arguments:\n"); + fprintf(stderr, + " -l, --low_path=3DPATH Low priority memcgroup path\n"); + fprintf(stderr, + " -g, --high_path=3DPATH High priority memcgroup path\n"); + fprintf(stderr, + " -t, --threshold=3DVALUE The sum of 'val' PGFAULT of\n"); + fprintf(stderr, + " high priority memcgroup in\n"); + fprintf(stderr, + " 1 sec to trigger low priority\n"); + fprintf(stderr, + " cgroup over_high\n\n"); + fprintf(stderr, "Optional arguments:\n"); + fprintf(stderr, " -o, --over_high_ms=3DVALUE\n"); + fprintf(stderr, + " Low_path over_high_ms value\n"); + fprintf(stderr, + " (default: 0)\n"); + fprintf(stderr, " -L, --use_below_low Enable use_below_low flag\n"); + fprintf(stderr, " -M, --use_below_min Enable use_below_min flag\n"); + fprintf(stderr, + " -O, --allow_override Enable BPF_F_ALLOW_OVERRIDE\n"); + fprintf(stderr, + " flag\n"); + fprintf(stderr, " -h, --help Show this help message\n\n"); + fprintf(stderr, "Examples:\n"); + fprintf(stderr, " # Using long options:\n"); + fprintf(stderr, " %s --low_path=3D/sys/fs/cgroup/low \\\n", name); + fprintf(stderr, " --high_path=3D/sys/fs/cgroup/high \\\n"); + fprintf(stderr, " --threshold=3D1000 --over_high_ms=3D500 \\\n" + " --use_below_low\n\n"); + fprintf(stderr, " # Using short options:\n"); + fprintf(stderr, " %s -l /sys/fs/cgroup/low \\\n" + " -g /sys/fs/cgroup/high \\\n", + name); + fprintf(stderr, " -t 1000 -o 500 -L -M\n"); +} + +static uint64_t get_cgroup_id(const char *cgroup_path) +{ + struct stat st; + + if (cgroup_path =3D=3D NULL) { + fprintf(stderr, "Error: cgroup_path is NULL\n"); + return 0; + } + + if (stat(cgroup_path, &st) < 0) { + fprintf(stderr, "Error: stat(%s) failed: %d\n", + cgroup_path, errno); + return 0; + } + + return (uint64_t)st.st_ino; +} + +static uint64_t parse_u64(const char *str, const char *name) +{ + uint64_t value; + + errno =3D 0; + value =3D strtoull(str, NULL, 10); + + if (errno !=3D 0) { + fprintf(stderr, + "ERROR: strtoull '%s' failed: %d\n", + str, errno); + usage(name); + exit(-errno); + } + + return value; +} + +int main(int argc, char **argv) +{ + int low_cgroup_fd =3D -1, high_cgroup_fd =3D -1; + uint64_t threshold =3D 0, high_cgroup_id; + unsigned int over_high_ms =3D 0; + bool use_below_low =3D false, use_below_min =3D false; + __u32 opts_flags =3D 0; + const char *low_path =3D NULL; + const char *high_path =3D NULL; + const char *bpf_obj_file =3D "memcg.bpf.o"; + struct bpf_object *obj =3D NULL; + struct bpf_program *prog =3D NULL; + struct bpf_link *link =3D NULL, *link_low =3D NULL, *link_high =3D NULL; + struct bpf_map *map; + struct memcg__bss *bss_data; + DECLARE_LIBBPF_OPTS(bpf_struct_ops_opts, opts); + int err =3D -EINVAL; + int map_fd; + int opt; + int option_index =3D 0; + + static struct option long_options[] =3D { + {"low_path", required_argument, 0, 'l'}, + {"high_path", required_argument, 0, 'g'}, + {"threshold", required_argument, 0, 't'}, + {"over_high_ms", required_argument, 0, 'o'}, + {"use_below_low", no_argument, 0, 'L'}, + {"use_below_min", no_argument, 0, 'M'}, + {"allow_override", no_argument, 0, 'O'}, + {"help", no_argument, 0, 'h'}, + {0, 0, 0, 0 } + }; + + while ((opt =3D getopt_long(argc, argv, "l:g:t:o:LMOh", + long_options, &option_index)) !=3D -1) { + switch (opt) { + case 'l': + low_path =3D optarg; + break; + case 'g': + high_path =3D optarg; + break; + case 't': + threshold =3D parse_u64(optarg, argv[0]); + break; + case 'o': + over_high_ms =3D (unsigned int)parse_u64(optarg, argv[0]); + break; + case 'L': + use_below_low =3D true; + break; + case 'M': + use_below_min =3D true; + break; + case 'O': + opts_flags =3D BPF_F_ALLOW_OVERRIDE; + break; + case 'h': + usage(argv[0]); + return 0; + default: + usage(argv[0]); + return -EINVAL; + } + } + + if (!low_path || !high_path || !threshold) { + fprintf(stderr, + "ERROR: Missing required arguments\n\n"); + usage(argv[0]); + goto out; + } + + low_cgroup_fd =3D open(low_path, O_RDONLY); + if (low_cgroup_fd < 0) { + fprintf(stderr, + "ERROR: open low cgroup '%s' failed: %d\n", + low_path, errno); + err =3D -errno; + goto out; + } + + high_cgroup_id =3D get_cgroup_id(high_path); + if (!high_cgroup_id) + goto out; + high_cgroup_fd =3D open(high_path, O_RDONLY); + if (high_cgroup_fd < 0) { + fprintf(stderr, + "ERROR: open high cgroup '%s' failed: %d\n", + high_path, errno); + err =3D -errno; + goto out; + } + + obj =3D bpf_object__open_file(bpf_obj_file, NULL); + err =3D libbpf_get_error(obj); + if (err) { + fprintf(stderr, + "ERROR: opening BPF object file '%s' failed: %d\n", + bpf_obj_file, err); + goto out; + } + + map =3D bpf_object__find_map_by_name(obj, ".bss"); + if (!map) { + fprintf(stderr, "ERROR: Failed to find .bss map\n"); + err =3D -ESRCH; + goto out; + } + + err =3D bpf_object__load(obj); + if (err) { + fprintf(stderr, + "ERROR: loading BPF object file failed: %d\n", + err); + goto out; + } + + map_fd =3D bpf_map__fd(map); + bss_data =3D calloc(1, bpf_map__value_size(map)); + if (bss_data) { + __u32 key =3D 0; + + bss_data->local_config.high_cgroup_id =3D high_cgroup_id; + bss_data->local_config.threshold =3D threshold; + bss_data->local_config.over_high_ms =3D over_high_ms; + bss_data->local_config.use_below_low =3D use_below_low; + bss_data->local_config.use_below_min =3D use_below_min; + + err =3D bpf_map_update_elem(map_fd, &key, bss_data, BPF_EXIST); + free(bss_data); + if (err) { + fprintf(stderr, + "ERROR: update config failed: %d\n", + err); + goto out; + } + } else { + fprintf(stderr, + "ERROR: allocate memory failed\n"); + err =3D -ENOMEM; + goto out; + } + + prog =3D bpf_object__find_program_by_name(obj, + "handle_count_memcg_events"); + if (!prog) { + fprintf(stderr, + "ERROR: finding a prog in BPF object file failed\n"); + goto out; + } + + link =3D bpf_program__attach(prog); + err =3D libbpf_get_error(link); + if (err) { + fprintf(stderr, + "ERROR: bpf_program__attach failed: %d\n", + err); + goto out; + } + + if (over_high_ms) { + map =3D bpf_object__find_map_by_name(obj, "low_mcg_ops"); + if (!map) { + fprintf(stderr, + "ERROR: Failed to find low_mcg_ops map\n"); + err =3D -ESRCH; + goto out; + } + LIBBPF_OPTS_RESET(opts, + .flags =3D opts_flags, + .relative_fd =3D low_cgroup_fd, + ); + link_low =3D bpf_map__attach_struct_ops_opts(map, &opts); + err =3D libbpf_get_error(link_low); + if (err) { + fprintf(stderr, + "Failed to attach struct ops low_mcg_ops: %d\n", + err); + goto out; + } + } + + if (use_below_low || use_below_min) { + map =3D bpf_object__find_map_by_name(obj, "high_mcg_ops"); + if (!map) { + fprintf(stderr, + "ERROR: Failed to find high_mcg_ops map\n"); + err =3D -ESRCH; + goto out; + } + LIBBPF_OPTS_RESET(opts, + .flags =3D opts_flags, + .relative_fd =3D high_cgroup_fd, + ); + link_high =3D bpf_map__attach_struct_ops_opts(map, &opts); + err =3D libbpf_get_error(link_high); + if (err) { + fprintf(stderr, + "Failed to attach struct ops high_mcg_ops: %d\n", + err); + goto out; + } + } + + printf("Successfully attached!\n"); + + signal(SIGINT, sig_handler); + signal(SIGTERM, sig_handler); + + while (!exiting) + pause(); + + printf("Exiting...\n"); + +out: + bpf_link__destroy(link); + bpf_link__destroy(link_low); + bpf_link__destroy(link_high); + bpf_object__close(obj); + close(low_cgroup_fd); + close(high_cgroup_fd); + return err; +} --=20 2.43.0