From nobody Sun Feb 8 08:27:30 2026 Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17EA918E02A for ; Mon, 28 Apr 2025 03:36:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811400; cv=none; b=l5HfO3zIsD7RzubvFBxD7NCs+3TvNjKCKTXegRA3Qb765MMieECMcpPlkKnvlgtUe0r+4a7BYaf6SZyjzyie74KcAmfaqacXtmoInH+wLfBEMvVUydF0Z8F/c+b4lsglpjOe7DPA6tgp82SMBzrMs9VBBFhpqvChJCyJdwgezeA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811400; c=relaxed/simple; bh=0jxF6XE1cAAhERd7I3gZluVPfQ7SYQN1hU2Tv5Ku2+k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZmNiwBNC3CkTSzAnfnyseI2I/2FqfDL4+yoDd16NdPDFwPQ8zGnONFEdj5rHzAO0M70cEcj+M4wC3uahW623AyZ2U7mKjDFwOJ9lMz2uyN4iITjoZK5+S2epSs6n35i6aNHim3HEDBy2uVF+2+lppqjjqKT0D1p9XS/8DtzThUg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Za8NqbMH; arc=none smtp.client-ip=95.215.58.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Za8NqbMH" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811396; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Yq8ss53Ru7Jhn38FBmHilN70R4el49iAxHid+VDVZdc=; b=Za8NqbMHlD/frdFKXQJ5cvdJMNbcCS/DRf5Jb1yVC4ngasstbn9VZgMIgtWblQFur+Y235 RN3ru9mjapI3Yt9K1ImzzZaqvTSrJu9HtMPrd8jZPA3hcXgk9TZ+vffZlihzedD7x2HEhK G7bxkMHe5bocYEvWA6a+Ku/DHw1wj0Q= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 01/12] mm: introduce a bpf hook for OOM handling Date: Mon, 28 Apr 2025 03:36:06 +0000 Message-ID: <20250428033617.3797686-2-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Introduce a bpf hook for implementing custom OOM handling policies. The hook is int bpf_handle_out_of_memory(struct oom_control *oc) function, which expected to return 1 if it was able to free some memory and 0 otherwise. In the latter case it's guaranteed that the in-kernel OOM killer will be invoked. Otherwise the kernel also checks the bpf_memory_freed field of the oom_control structure, which is expected to be set by kfuncs suitable for releasing memory. It's a safety mechanism which prevents a bpf program to claim forward progress without actually releasing memory. The hook program is sleepable to enable using iterators, e.g. cgroup iterators. The hook is executed just before the kernel victim task selection algorithm, so all heuristics and sysctls like panic on oom, sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task are respected. Signed-off-by: Roman Gushchin --- include/linux/oom.h | 5 ++++ mm/oom_kill.c | 68 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 73 insertions(+) diff --git a/include/linux/oom.h b/include/linux/oom.h index 1e0fc6931ce9..cc14aac9742c 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -51,6 +51,11 @@ struct oom_control { =20 /* Used to print the constraint info. */ enum oom_constraint constraint; + +#ifdef CONFIG_BPF_SYSCALL + /* Used by the bpf oom implementation to mark the forward progress */ + bool bpf_memory_freed; +#endif }; =20 extern struct mutex oom_lock; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 25923cfec9c6..d00776b63c0a 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -45,6 +45,7 @@ #include #include #include +#include =20 #include #include "internal.h" @@ -1100,6 +1101,30 @@ int unregister_oom_notifier(struct notifier_block *n= b) } EXPORT_SYMBOL_GPL(unregister_oom_notifier); =20 +#ifdef CONFIG_BPF_SYSCALL +int bpf_handle_out_of_memory(struct oom_control *oc); + +/* + * Returns true if the bpf oom program returns 1 and some memory was + * freed. + */ +static bool bpf_handle_oom(struct oom_control *oc) +{ + if (WARN_ON_ONCE(oc->chosen)) + oc->chosen =3D NULL; + + oc->bpf_memory_freed =3D false; + + return bpf_handle_out_of_memory(oc) && oc->bpf_memory_freed; +} + +#else +static inline bool bpf_handle_oom(struct oom_control *oc) +{ + return 0; +} +#endif + /** * out_of_memory - kill the "best" process when we run out of memory * @oc: pointer to struct oom_control @@ -1161,6 +1186,13 @@ bool out_of_memory(struct oom_control *oc) return true; } =20 + /* + * Let bpf handle the OOM first. If it was able to free up some memory, + * bail out. Otherwise fall back to the kernel OOM killer. + */ + if (bpf_handle_oom(oc)) + return true; + select_bad_process(oc); /* Found nothing?!?! */ if (!oc->chosen) { @@ -1264,3 +1296,39 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsign= ed int, flags) return -ENOSYS; #endif /* CONFIG_MMU */ } + +#ifdef CONFIG_BPF_SYSCALL + +__bpf_hook_start(); + +/* + * Bpf hook to customize the oom handling policy. + */ +__weak noinline int bpf_handle_out_of_memory(struct oom_control *oc) +{ + return 0; +} + +__bpf_hook_end(); + +BTF_KFUNCS_START(bpf_oom_hooks) +BTF_ID_FLAGS(func, bpf_handle_out_of_memory, KF_SLEEPABLE) +BTF_KFUNCS_END(bpf_oom_hooks) + +static const struct btf_kfunc_id_set bpf_oom_hook_set =3D { + .owner =3D THIS_MODULE, + .set =3D &bpf_oom_hooks, +}; +static int __init bpf_oom_init(void) +{ + int err; + + err =3D register_btf_fmodret_id_set(&bpf_oom_hook_set); + if (err) + pr_warn("error while registering bpf oom hooks: %d", err); + + return err; +} +late_initcall(bpf_oom_init); + +#endif --=20 2.49.0.901.g37484f566f-goog From nobody Sun Feb 8 08:27:30 2026 Received: from out-187.mta1.migadu.com (out-187.mta1.migadu.com [95.215.58.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C2671A7253 for ; Mon, 28 Apr 2025 03:36:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.187 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811402; cv=none; b=LosoG17mZmzU6gZdimkihihls6rM7kweuR8R0wMHLSWBEohUwdK0bCjkzD++PdvYJ52F9VfeBx7mWHPPDU5ioff66q8XQC10Q3EwbdtjDOPRoO7GS+5/AtC0f/bS2ALmn9XHlJUWDKa80ufT1fDM0zfbcfcb5yrOQ7BELH80cfg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811402; c=relaxed/simple; bh=kJw6iLbzvnaJTqf9ImsAV2CPxj5CSyBuIfj13anM9xs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sBU5alhXGZNu3biTkIb0rqGGugOXm0bU9hA/8/B7JUH+k/FXEOSOUkrgLL+7qz4+ShIZ/Grq4Ei1w+wzl/dLL3MLTV4B7eM2r3WX/1oL91+XIszJ/DDH+BBLhxj+5fychuv5AUN6gJUgjRzwwuipriUKeoU+zyogUSoaNMg06S8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=dygpIniM; arc=none smtp.client-ip=95.215.58.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="dygpIniM" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811398; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dQGxdvGT6JUQEBkk90cx9D02VWeJqf+rh/ZdIhrKaXY=; b=dygpIniMefoq4jZCfd010aoExiKB9GnWfMLUmOO4XeAp2wa9ymmL7HiLJo0/JdjtYNsIss +dcL75/+JuBbYlv56lbx/GalFSwZGuY/9SY2yPHoXok1nJgtRmCFY3836KmJ1HRQg76nkO CgIlBVTGbZHsNtf3WEx0fJZw2Ayn4QY= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 02/12] bpf: mark struct oom_control's memcg field as TRUSTED_OR_NULL Date: Mon, 28 Apr 2025 03:36:07 +0000 Message-ID: <20250428033617.3797686-3-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Struct oom_control is used to describe the OOM context. It's memcg field defines the scope of OOM: it's NULL for global OOMs and a valid memcg pointer for memcg-scoped OOMs. Teach bpf verifier to recognize it as trusted or NULL pointer. It will provide the bpf OOM handler a trusted memcg pointer, which for example is required for iterating the memcg's subtree. Signed-off-by: Roman Gushchin --- kernel/bpf/verifier.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 54c6953a8b84..d2d9f9b87065 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7047,6 +7047,10 @@ BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct socket) { struct sock *sk; }; =20 +BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct oom_control) { + struct mem_cgroup *memcg; +}; + static bool type_is_rcu(struct bpf_verifier_env *env, struct bpf_reg_state *reg, const char *field_name, u32 btf_id) @@ -7087,6 +7091,7 @@ static bool type_is_trusted_or_null(struct bpf_verifi= er_env *env, const char *field_name, u32 btf_id) { BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct socket)); + BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct oom_control)); =20 return btf_nested_type_is_trusted(&env->log, reg, field_name, btf_id, "__safe_trusted_or_null"); --=20 2.49.0.901.g37484f566f-goog From nobody Sun Feb 8 08:27:30 2026 Received: from out-186.mta1.migadu.com (out-186.mta1.migadu.com [95.215.58.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D146F1BFE00 for ; Mon, 28 Apr 2025 03:36:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.186 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811405; cv=none; b=mWJWW1q0c5Bf0ogw3p0S0RVmhJJDUU9Kx7vlb0xRYsUhkSSkFFtlsTjPKp+toChzgyhsG8HgjObZwc2fYtkZIT1msYhy7dxz/i7QJ/aB11BTA1+mhtwNeJEzbSO90YiK4rY4obuqNoPpDcvDKujUtG/K+2AlDu3X0lJ5rtKdvOc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811405; c=relaxed/simple; bh=2bUbxHD5Dxg5eYui3jDltoOsngL8VyFix5jEfVdSLG0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XER4BsF2I3vFBv+8BmIHdrCPDr+HWXiAhoB7EQGdxvehfZ9gxGn+wASL9ijgiI9oF2n0YIldTbfuFSBzl7VjpkD/K0pzg7pAC9LpWFQCIjP46ybbIMxl83TkU+J7okuq3pvPeyQp61iXqJ7+q67nxnCax5BDwm0P7uwEi4njrow= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=rkb+LBSM; arc=none smtp.client-ip=95.215.58.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="rkb+LBSM" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811401; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pVLyMgxEgIhuDOYAYLLDexoPQY+fmhXjZ45o9+pIZwY=; b=rkb+LBSMsHwLB7IGdALJ3Zig+Z0TfDZlFUKQ3AjaYNm2NIPMmXkltcH/lgWAMD6KlkYQIQ vOIfLeVlKGKzeNULNK80f3sawYAGn4pFstkoWF6WU0/xMrhgdcSO9i9SJg7Pr8tMV0ovUl pvDbVFOWTc4TPMf3lqk1xTij1KeWzeA= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 03/12] bpf: treat fmodret tracing program's arguments as trusted Date: Mon, 28 Apr 2025 03:36:08 +0000 Message-ID: <20250428033617.3797686-4-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" *** DO NOT MERGE! *** This is a temporarily workaround, which will be fixed/replaced in the next version. -- Bpf oom handler hook has to: 1) have a trusted pointer to the oom_control structure, 2) return a value, 3) be sleepable to use cgroup iterator functions. fmodret tracing programs fulfill 2) and 3). This patch enables 1), however this change contradicts the commit c6b0337f0120 ("bpf: Don't mark arguments to fentry/fexit programs as trusted."). Signed-off-by: Roman Gushchin --- kernel/bpf/btf.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index a91822bae043..aa86c4eabfa0 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6424,7 +6424,14 @@ static bool prog_args_trusted(const struct bpf_prog = *prog) =20 switch (prog->type) { case BPF_PROG_TYPE_TRACING: - return atype =3D=3D BPF_TRACE_RAW_TP || atype =3D=3D BPF_TRACE_ITER; + switch (atype) { + case BPF_TRACE_RAW_TP: + case BPF_TRACE_ITER: + case BPF_MODIFY_RETURN: + return true; + default: + return false; + } case BPF_PROG_TYPE_LSM: return bpf_lsm_is_trusted(prog); case BPF_PROG_TYPE_STRUCT_OPS: --=20 2.49.0.901.g37484f566f-goog From nobody Sun Feb 8 08:27:30 2026 Received: from out-173.mta1.migadu.com (out-173.mta1.migadu.com [95.215.58.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9BB51C84D6 for ; Mon, 28 Apr 2025 03:36:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811408; cv=none; b=rN87sDL5O69Emo9WVxNLl0UCSA/i2ii1X/q6ZpOftX0vekGzAprdPe0pH7HHrjjVELMTMzkiYBZf6sl9F5SI7P77y3rqkyiMpV/P3mEWdaRyej8RL1kJhZh7opot2FzYVNhgs/umFk19TgxehAt5mZQxYw8rSusDdyQMKYtRRos= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811408; c=relaxed/simple; bh=nnI0SEo0cc59qBaxo7ivnHDqA/l1fQ+Yk4RuYsretf0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XdjlMyEfZtxM/nH8Ks2CtHgnKLd9TEW+8+djrZ3myQuYwgwHGipOR/p46dbO07diyBXBbIg8Q/W1REf2VXn4FtV75E7PkJBq8cEezDL4aRPZzuLfkdpATVVuBEeqQId3y1W0NRfNWy79eIOfgifhFU59bMxN+yzPXbN1nSp1jo8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=MV+ZBc7Y; arc=none smtp.client-ip=95.215.58.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="MV+ZBc7Y" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811404; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5UgujTTC+M44uRBA0W/V1VZG3lLjwg8UiEHcHCfP0z0=; b=MV+ZBc7Yde26EYnylyMbUZ4BNiFwo3uT9N1TpI8VmbA/Ah8V8Fhi2wG/cC+Q08x/MxWW0A woaT+KT1s0n4iTFKfJkQlQ+8pS7YpRR9DPLXhD/olJGa6ANeaafug3Do0+6zCTJcGxgaHv OSNyU6xp9Da+pSYHz2mbSZx1NDX37zc= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 04/12] mm: introduce bpf_oom_kill_process() bpf kfunc Date: Mon, 28 Apr 2025 03:36:09 +0000 Message-ID: <20250428033617.3797686-5-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Introduce bpf_oom_kill_process() bpf kfunc, which is supposed to be used by bpf OOM programs. It allows to kill a process in exactly the same way the OOM killer does: using the OOM reaper, bumping corresponding memcg and global statistics, respecting memory.oom.group etc. On success, it sets om_control's bpf_memory_freed field to true, enabling the bpf program to bypass the kernel OOM killer. Signed-off-by: Roman Gushchin --- mm/oom_kill.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index d00776b63c0a..2e922e75a9df 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1299,6 +1299,42 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsign= ed int, flags) =20 #ifdef CONFIG_BPF_SYSCALL =20 +__bpf_kfunc_start_defs(); +/* + * Kill a process in a way similar to the kernel OOM killer. + * This means dump the necessary information to dmesg, adjust memcg + * statistics, leverage the oom reaper, respect memory.oom.group etc. + * + * bpf_oom_kill_process() marks the forward progress by setting + * oc->bpf_memory_freed. If the progress was made, the bpf program + * is free to decide if the kernel oom killer should be invoked. + * Otherwise it's enforced, so that a bad bpf program can't + * deadlock the machine on memory. + */ +__bpf_kfunc int bpf_oom_kill_process(struct oom_control *oc, + struct task_struct *task, + const char *message__str) +{ + if (oom_unkillable_task(task)) + return -EPERM; + + /* paired with put_task_struct() in oom_kill_process() */ + task =3D tryget_task_struct(task); + if (!task) + return -EINVAL; + + oc->chosen =3D task; + + oom_kill_process(oc, message__str); + + oc->chosen =3D NULL; + oc->bpf_memory_freed =3D true; + + return 0; +} + +__bpf_kfunc_end_defs(); + __bpf_hook_start(); =20 /* @@ -1319,6 +1355,16 @@ static const struct btf_kfunc_id_set bpf_oom_hook_se= t =3D { .owner =3D THIS_MODULE, .set =3D &bpf_oom_hooks, }; + +BTF_KFUNCS_START(bpf_oom_kfuncs) +BTF_ID_FLAGS(func, bpf_oom_kill_process, KF_SLEEPABLE | KF_TRUSTED_ARGS) +BTF_KFUNCS_END(bpf_oom_kfuncs) + +static const struct btf_kfunc_id_set bpf_oom_kfunc_set =3D { + .owner =3D THIS_MODULE, + .set =3D &bpf_oom_kfuncs, +}; + static int __init bpf_oom_init(void) { int err; @@ -1326,6 +1372,10 @@ static int __init bpf_oom_init(void) err =3D register_btf_fmodret_id_set(&bpf_oom_hook_set); if (err) pr_warn("error while registering bpf oom hooks: %d", err); + err =3D register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, + &bpf_oom_kfunc_set); + if (err) + pr_warn("error while registering bpf oom kfuncs: %d", err); =20 return err; } --=20 2.49.0.901.g37484f566f-goog From nobody Sun Feb 8 08:27:30 2026 Received: from out-170.mta1.migadu.com (out-170.mta1.migadu.com [95.215.58.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0B9D1D5165 for ; Mon, 28 Apr 2025 03:36:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811411; cv=none; b=nBWhaSAC/5ItVYNhmwNumo6HqQHPhIGjxsD6l2FuBAJGJ7bLFPtmKtq/6l8A7xPIVF58dgIrtYeNt0z0JbFLOwWCuZCyCLOeIBq8PxdLPWylN14OjC6h9sinBd+weadTp5yHKsW3yVPeAIbWRQOpnszmox1l1X0NRIHZsskxTBY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811411; c=relaxed/simple; bh=6OUYSTeu++AfIl4Gcj++rgBVmfJEJojMdQW2+FM4XIs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Lmnce6bSNNMbe4WYtoLuuqqIYgpIPMVmpR9yyL1k9RFtlXYjiOf1Z4kThoXbudhbu0LRRTCTaimScc6/xoXtI+59OBqv7eh1wiFP+IWzpQKvUeahehQ3wosm3kaDy8GiHmkfnR5PApV5PhWiC4zem5e1sfI49W2GN4fE/QGOX6I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=BzNdfSQO; arc=none smtp.client-ip=95.215.58.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="BzNdfSQO" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811407; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6cq3axG+82N0/3Jqyh4bOxzur9jfJxES+/ChQNHJI1A=; b=BzNdfSQODHW9rGTR8urNbCRUaXaCdcBN8wO7z4a9fsNytWmNoBhUr5ItTnxyymIIp3zkDv DiJJv5G0JaBfEE25qmuiGofwvLam4KGIWiAB5naNyvr+J34xIjVbkqjvUZ3/DBkBJIY40/ hDpA2TgnZb/q4LtMRPvg7UFQfW0MHpI= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 05/12] mm: introduce bpf kfuncs to deal with memcg pointers Date: Mon, 28 Apr 2025 03:36:10 +0000 Message-ID: <20250428033617.3797686-6-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" To effectively operate with memory cgroups in bpf there is a need to convert css pointers to memcg pointers. A simple container_of cast which is used in the kernel code can't be used in bpf because from the verifier's point of view that's a out-of-bounds memory access. Introduce helper get/put kfuncs which can be used to get a refcounted memcg pointer from the css pointer: - bpf_get_mem_cgroup, - bpf_put_mem_cgroup. bpf_get_mem_cgroup() can take both memcg's css and the corresponding cgroup's "self" css. It allows it to be used with the existing cgroup iterator which iterates over cgroup tree, not memcg tree. Signed-off-by: Roman Gushchin --- include/linux/memcontrol.h | 2 + mm/Makefile | 3 ++ mm/bpf_memcontrol.c | 101 +++++++++++++++++++++++++++++++++++++ 3 files changed, 106 insertions(+) create mode 100644 mm/bpf_memcontrol.c diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 53364526d877..a2ecd9caacfb 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -932,6 +932,8 @@ static inline void mod_memcg_page_state(struct page *pa= ge, rcu_read_unlock(); } =20 +unsigned long memcg_events(struct mem_cgroup *memcg, int event); +unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx); unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item= idx); unsigned long lruvec_page_state_local(struct lruvec *lruvec, diff --git a/mm/Makefile b/mm/Makefile index e7f6bbf8ae5f..3eedba68e8cb 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -105,6 +105,9 @@ obj-$(CONFIG_MEMCG) +=3D memcontrol.o vmpressure.o ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) +=3D swap_cgroup.o endif +ifdef CONFIG_BPF_SYSCALL +obj-$(CONFIG_MEMCG) +=3D bpf_memcontrol.o +endif obj-$(CONFIG_CGROUP_HUGETLB) +=3D hugetlb_cgroup.o obj-$(CONFIG_GUP_TEST) +=3D gup_test.o obj-$(CONFIG_DMAPOOL_TEST) +=3D dmapool_test.o diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c new file mode 100644 index 000000000000..dacdf53735e5 --- /dev/null +++ b/mm/bpf_memcontrol.c @@ -0,0 +1,101 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Memory Controller-related BPF kfuncs and auxiliary code + * + * Author: Roman Gushchin + */ + +#include +#include + +__bpf_kfunc_start_defs(); + +__bpf_kfunc struct mem_cgroup * +bpf_get_mem_cgroup(struct cgroup_subsys_state *css) +{ + struct mem_cgroup *memcg =3D NULL; + bool rcu_unlock =3D false; + + if (!root_mem_cgroup) + return NULL; + + if (root_mem_cgroup->css.ss !=3D css->ss) { + struct cgroup *cgroup =3D css->cgroup; + int ssid =3D root_mem_cgroup->css.ss->id; + + rcu_read_lock(); + rcu_unlock =3D true; + css =3D rcu_dereference_raw(cgroup->subsys[ssid]); + } + + if (css && css_tryget(css)) + memcg =3D container_of(css, struct mem_cgroup, css); + + if (rcu_unlock) + rcu_read_unlock(); + + return memcg; +} + +__bpf_kfunc void bpf_put_mem_cgroup(struct mem_cgroup *memcg) +{ + css_put(&memcg->css); +} + +__bpf_kfunc unsigned long bpf_mem_cgroup_events(struct mem_cgroup *memcg, = int event) +{ + + if (event < 0 || event >=3D NR_VM_EVENT_ITEMS) + return (unsigned long)-1; + + return memcg_events(memcg, event); +} + +__bpf_kfunc unsigned long bpf_mem_cgroup_usage(struct mem_cgroup *memcg, b= ool swap) +{ + return mem_cgroup_usage(memcg, swap); +} + +__bpf_kfunc unsigned long bpf_mem_cgroup_page_state(struct mem_cgroup *mem= cg, int idx) +{ + if (idx < 0 || idx >=3D MEMCG_NR_STAT) + return (unsigned long)-1; + + return memcg_page_state(memcg, idx); +} + +__bpf_kfunc void bpf_mem_cgroup_flush_stats(struct mem_cgroup *memcg) +{ + mem_cgroup_flush_stats(memcg); +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(bpf_memcontrol_kfuncs) +BTF_ID_FLAGS(func, bpf_get_mem_cgroup, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_put_mem_cgroup, KF_RELEASE) + +BTF_ID_FLAGS(func, bpf_mem_cgroup_events, KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_mem_cgroup_usage, KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_mem_cgroup_page_state, KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_mem_cgroup_flush_stats, KF_TRUSTED_ARGS) + +BTF_KFUNCS_END(bpf_memcontrol_kfuncs) + +static const struct btf_kfunc_id_set bpf_memcontrol_kfunc_set =3D { + .owner =3D THIS_MODULE, + .set =3D &bpf_memcontrol_kfuncs, +}; + +static int __init bpf_memcontrol_init(void) +{ + int err; + + err =3D register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, + &bpf_memcontrol_kfunc_set); + if (err) + pr_warn("error while registering bpf memcontrol kfuncs: %d", err); + + return err; +} +late_initcall(bpf_memcontrol_init); --=20 2.49.0.901.g37484f566f-goog From nobody Sun Feb 8 08:27:30 2026 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42A6E1D6DC5 for ; Mon, 28 Apr 2025 03:36:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811414; cv=none; b=aPWEJZemI5F1Jdf1uIfLgNaL4Mz2MfXrPc0lssCWC+ER0/yDvq33OFGLX99QGlRHdZcyBHnCFOZrC6AjrP+nz5WdaQyyuHmUFuNnmI9F7BDgKHHnOkon95WSrM1GdDenWW041iyxgz9OXQhL/qtiMzVyRS+a15PVa3dUed4cPSE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811414; c=relaxed/simple; bh=OfO+TfkpMaagCuKgUBXobHt0NDGxWYQksshltvK2YyQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pyuKm08zx5s4wwwUP6GGBWJp7DMIc9yadL2zTlKdQPdoJbtf9P+ecjZNfz6ynlQdmmU5VBsbM5XYTQsP4E4dhQyzw6rRVSS0XPy9MK23IS+OflQzzkNSIvb62Mi4BbT+hhyr6GH+26kFHYodYyHc7/KLLqCy3Afrzbf59qrN2Jw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=UhML/02x; arc=none smtp.client-ip=95.215.58.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="UhML/02x" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811411; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qfqCE3hqvHS+xiXy15ifFZ17rwxpSXuqOJ5f+GS0Txw=; b=UhML/02xIM7DqvIPdJnhpoz8FvXU+vUJ3076QF2Bjl8xAs4sHviG4NWvY3TA/Iaktg77gS UnzlxPWUWfB/AMcFRYJ1/Fk7r9XaTgqUF7FVAkeKd41yMUa2h8UWiWu2HaHPsT6vXSK9xT dxuqZuQBnCjYC5nbqtW+gYFEomHsXZY= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 06/12] mm: introduce bpf_get_root_mem_cgroup() bpf kfunc Date: Mon, 28 Apr 2025 03:36:11 +0000 Message-ID: <20250428033617.3797686-7-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Introduce a bpf kfunc to get a trusted pointer to the root memory cgroup. It's very handy to traverse the full memcg tree, e.g. for handling a system-wide OOM. It's possible to obtain this pointer by traversing the memcg tree up from any known memcg, but it's sub-optimal and makes bpf programs more complex and less efficient. bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics, however in reality it's not necessarily to bump the corresponding reference counter - root memory cgroup is immortal, reference counting is skipped, see css_get(). Once set, root_mem_cgroup is always a valid memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op. Signed-off-by: Roman Gushchin --- mm/bpf_memcontrol.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c index dacdf53735e5..94bc6c17d80b 100644 --- a/mm/bpf_memcontrol.c +++ b/mm/bpf_memcontrol.c @@ -10,6 +10,12 @@ =20 __bpf_kfunc_start_defs(); =20 +__bpf_kfunc struct mem_cgroup *bpf_get_root_mem_cgroup(void) +{ + /* css_get() is not needed */ + return root_mem_cgroup; +} + __bpf_kfunc struct mem_cgroup * bpf_get_mem_cgroup(struct cgroup_subsys_state *css) { @@ -72,6 +78,7 @@ __bpf_kfunc void bpf_mem_cgroup_flush_stats(struct mem_cg= roup *memcg) __bpf_kfunc_end_defs(); =20 BTF_KFUNCS_START(bpf_memcontrol_kfuncs) +BTF_ID_FLAGS(func, bpf_get_root_mem_cgroup, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_get_mem_cgroup, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_put_mem_cgroup, KF_RELEASE) =20 --=20 2.49.0.901.g37484f566f-goog From nobody Sun Feb 8 08:27:30 2026 Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 584DD1D7985; Mon, 28 Apr 2025 03:36:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811418; cv=none; b=NrCOoNxB6mfuTEGXE1uDzDHPb4aQKkPSkKltnrZ+w1FMg7NGTpndwqCPoAonsuUPbZ9kIHVH/58hzBiVRRp/NtHNaJHV47lp9IJjAOS+bxF6XfwCe5xXckdj4zRjQn/dmBKtwH+BC6cZsXx4RSEEGDjiVpJ0/SVDVCL28jme4A8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811418; c=relaxed/simple; bh=p9u1+q/UAqQxfKSlhOraLhTIMrQENFQhj5zOREccz9c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eWgjsK43ip9QRu3XB2CrYzgAEXvQIrc+iS+QXuPtSnRnruVFpXb1y+0III8L1VZqT1Lug8VixCwZywpEwaMQCLtR745OpFWsSLCOGhaotJsDbwRca/4u4gyU4GeJlxuKtjaknS5CZt9eLv8EV0TEcXkY5hNQwYv48uHqNDm0ZQY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=lrtnBPLN; arc=none smtp.client-ip=95.215.58.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="lrtnBPLN" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811414; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=32V32g0Ec/6L0UXIreJ3hArx8mY8wFKy2VxIfcAawIw=; b=lrtnBPLNXgwRxE1ROADeMOVv/joeCnNWdE2cgUBZ3JDih4j+ujB3VSZxNkwMdMJuK4gH9H nRnU9Y8dG1fSA0AJ5otgKGfEDeJpapwRWBeLLpLMP9GfKHOHKBZoFx3AGGmjOq4+i9Owtl PE1H4gahuHDezIQUEP649NIu+NInKaY= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 07/12] bpf: selftests: introduce read_cgroup_file() helper Date: Mon, 28 Apr 2025 03:36:12 +0000 Message-ID: <20250428033617.3797686-8-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Implement read_cgroup_file() helper to read from cgroup control files, e.g. statistics. Signed-off-by: Roman Gushchin --- tools/testing/selftests/bpf/cgroup_helpers.c | 39 ++++++++++++++++++++ tools/testing/selftests/bpf/cgroup_helpers.h | 2 + 2 files changed, 41 insertions(+) diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/s= elftests/bpf/cgroup_helpers.c index e4535451322e..3ffd4b764f91 100644 --- a/tools/testing/selftests/bpf/cgroup_helpers.c +++ b/tools/testing/selftests/bpf/cgroup_helpers.c @@ -125,6 +125,45 @@ int enable_controllers(const char *relative_path, cons= t char *controllers) return __enable_controllers(cgroup_path, controllers); } =20 +static size_t __read_cgroup_file(const char *cgroup_path, const char *file, + char *buf, size_t size) +{ + char file_path[PATH_MAX + 1]; + size_t ret; + int fd; + + snprintf(file_path, sizeof(file_path), "%s/%s", cgroup_path, file); + fd =3D open(file_path, O_RDONLY); + if (fd < 0) { + log_err("Opening %s", file_path); + return -1; + } + + ret =3D read(fd, buf, size); + close(fd); + return ret; +} + +/** + * read_cgroup_file() - Read to a cgroup file + * @relative_path: The cgroup path, relative to the workdir + * @file: The name of the file in cgroupfs to read to + * @buf: Buffer to read from the file + * @size: Size of the buffer + * + * Read to a file in the given cgroup's directory. + * + * If successful, the number of read bytes is returned. + */ +size_t read_cgroup_file(const char *relative_path, const char *file, + char *buf, size_t size) +{ + char cgroup_path[PATH_MAX - 24]; + + format_cgroup_path(cgroup_path, relative_path); + return __read_cgroup_file(cgroup_path, file, buf, size); +} + static int __write_cgroup_file(const char *cgroup_path, const char *file, const char *buf) { diff --git a/tools/testing/selftests/bpf/cgroup_helpers.h b/tools/testing/s= elftests/bpf/cgroup_helpers.h index 502845160d88..821cb76db1f7 100644 --- a/tools/testing/selftests/bpf/cgroup_helpers.h +++ b/tools/testing/selftests/bpf/cgroup_helpers.h @@ -11,6 +11,8 @@ =20 /* cgroupv2 related */ int enable_controllers(const char *relative_path, const char *controllers); +size_t read_cgroup_file(const char *relative_path, const char *file, + char *buf, size_t size); int write_cgroup_file(const char *relative_path, const char *file, const char *buf); int write_cgroup_file_parent(const char *relative_path, const char *file, --=20 2.49.0.901.g37484f566f-goog From nobody Sun Feb 8 08:27:30 2026 Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 284301DA617 for ; Mon, 28 Apr 2025 03:36:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811421; cv=none; b=u53Vf4Rla0irDT9G8OQnpkwnWIb6jW41hYOkLfu6lNtGo+LgnWawQiPcU2o2pXeOXF13QXfoq4Csib9cwb+2RIpMdTd64jH4z62fig1/qy0Evd41HqjdMq9uvsmCYXRe0VR6I1OPtPFqKxew6NJ3u1Nlqgid3oUPb/RDwr+nRgU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811421; c=relaxed/simple; bh=Y2/ZjSBqhtmstk1n4jW6jLq1ecahkiy7PdRXoDx12zI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GsXe5Lftawbs6ViOld4LX/6D6XVDqF6kbTwZLHdAh302WoszMvwLV2q4KHIvHXrJeIy2h6TLh3Biq09Tjx++MAHV0h3DiIghCcMW0mJ+RDftlGm60KJ1MrtOXmMDAN/M73uwVbJKbqzusQkiM3j89fKhqu7dxm8/u/t0FB1BN4Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=b4qM1b0v; arc=none smtp.client-ip=95.215.58.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="b4qM1b0v" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811417; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jDg9cWJDtBRFaLTipYCw/9zZK5hSUuCOB7afaAZtBZU=; b=b4qM1b0vHDaVKMDQkiweM3oerzIpfr+kT4u23fRV2Aj7X2MD0OkhU9s8ac2wjFSF/r0x5x uyPxww42H+ZN2hHtpOstR51Iv3G7vORPb5dxhqBlHfHO1OTT0Tfxwa7Qg/GgBrFPVV2/p6 BhQtCsd6CgUO88vCscH04YN+7nozXf0= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 08/12] bpf: selftests: bpf OOM handler test Date: Mon, 28 Apr 2025 03:36:13 +0000 Message-ID: <20250428033617.3797686-9-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Implement a pseudo-realistic test for the OOM handling functionality. The OOM handling policy which is implemented in bpf is to kill all tasks belonging to the biggest leaf cgroup, which doesn't contain unkillable tasks (tasks with oom_score_adj set to -1000). Pagecache size is excluded from the accounting. The test creates a hierarchy of memory cgroups, causes an OOM at the top level, checks that the expected process will be killed and checks memcg's oom statistics. Signed-off-by: Roman Gushchin --- tools/testing/selftests/bpf/prog_tests/oom.c | 227 +++++++++++++++++++ tools/testing/selftests/bpf/progs/test_oom.c | 103 +++++++++ 2 files changed, 330 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/oom.c create mode 100644 tools/testing/selftests/bpf/progs/test_oom.c diff --git a/tools/testing/selftests/bpf/prog_tests/oom.c b/tools/testing/s= elftests/bpf/prog_tests/oom.c new file mode 100644 index 000000000000..224c25334385 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/oom.c @@ -0,0 +1,227 @@ +// SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "cgroup_helpers.h" +#include "test_oom.skel.h" + +struct cgroup_desc { + const char *path; + int fd; + unsigned long long id; + int pid; + size_t target; + size_t max; + int oom_score_adj; + bool victim; +}; + +#define MB (1024 * 1024) +#define OOM_SCORE_ADJ_MIN (-1000) +#define OOM_SCORE_ADJ_MAX 1000 + +static struct cgroup_desc cgroups[] =3D { + { .path =3D "/oom_test", .max =3D 80 * MB}, + { .path =3D "/oom_test/cg1", .target =3D 10 * MB, + .oom_score_adj =3D OOM_SCORE_ADJ_MAX }, + { .path =3D "/oom_test/cg2", .target =3D 40 * MB, + .oom_score_adj =3D OOM_SCORE_ADJ_MIN }, + { .path =3D "/oom_test/cg3" }, + { .path =3D "/oom_test/cg3/cg4", .target =3D 30 * MB, + .victim =3D true }, + { .path =3D "/oom_test/cg3/cg5", .target =3D 20 * MB }, +}; + +static int spawn_task(struct cgroup_desc *desc) +{ + char *ptr; + int pid; + + pid =3D fork(); + if (pid < 0) + return pid; + + if (pid > 0) { + /* parent */ + desc->pid =3D pid; + return 0; + } + + /* child */ + if (desc->oom_score_adj) { + char buf[64]; + int fd =3D open("/proc/self/oom_score_adj", O_WRONLY); + + if (fd < 0) + return -1; + + snprintf(buf, sizeof(buf), "%d", desc->oom_score_adj); + write(fd, buf, sizeof(buf)); + close(fd); + } + + ptr =3D (char *)malloc(desc->target); + if (!ptr) + return -ENOMEM; + + memset(ptr, 'a', desc->target); + + while (1) + sleep(1000); + + return 0; +} + +static void setup_environment(void) +{ + int i, err; + + err =3D setup_cgroup_environment(); + if (!ASSERT_OK(err, "setup_cgroup_environment")) + goto cleanup; + + for (i =3D 0; i < ARRAY_SIZE(cgroups); i++) { + cgroups[i].fd =3D create_and_get_cgroup(cgroups[i].path); + if (!ASSERT_GE(cgroups[i].fd, 0, "create_and_get_cgroup")) + goto cleanup; + + cgroups[i].id =3D get_cgroup_id(cgroups[i].path); + if (!ASSERT_GT(cgroups[i].id, 0, "get_cgroup_id")) + goto cleanup; + + if (i =3D=3D 0) { + /* Freeze the top-level cgroup */ + err =3D write_cgroup_file(cgroups[i].path, "cgroup.freeze", "1"); + if (!ASSERT_OK(err, "freeze cgroup")) + goto cleanup; + } + + if (!cgroups[i].target) { + /* Recursively enable the memory controller */ + err =3D write_cgroup_file(cgroups[i].path, "cgroup.subtree_control", + "+memory"); + if (!ASSERT_OK(err, "enable memory controller")) + goto cleanup; + } + + if (cgroups[i].max) { + char buf[256]; + + snprintf(buf, sizeof(buf), "%lu", cgroups[i].max); + err =3D write_cgroup_file(cgroups[i].path, "memory.max", buf); + if (!ASSERT_OK(err, "set memory.max")) + goto cleanup; + + snprintf(buf, sizeof(buf), "0"); + write_cgroup_file(cgroups[i].path, "memory.swap.max", buf); + + } + + if (cgroups[i].target) { + char buf[256]; + + err =3D spawn_task(&cgroups[i]); + if (!ASSERT_OK(err, "spawn task")) + goto cleanup; + + snprintf(buf, sizeof(buf), "%d", cgroups[i].pid); + err =3D write_cgroup_file(cgroups[i].path, "cgroup.procs", buf); + if (!ASSERT_OK(err, "put child into a cgroup")) + goto cleanup; + } + } + + return; + +cleanup: + cleanup_cgroup_environment(); +} + +static int run_and_wait_for_oom(void) +{ + int ret =3D -1; + bool first =3D true; + char buf[4096] =3D {}; + size_t size; + + ret =3D write_cgroup_file(cgroups[0].path, "cgroup.freeze", "0"); + if (!ASSERT_OK(ret, "freeze cgroup")) + return -1; + + for (;;) { + int i, status; + pid_t pid =3D wait(&status); + + if (pid =3D=3D -1) { + if (errno =3D=3D EINTR) + continue; + /* ECHILD */ + break; + } + + if (!first) + continue; + + first =3D false; + + for (i =3D 0; i < ARRAY_SIZE(cgroups); i++) { + if (!ASSERT_OK(cgroups[i].victim !=3D + (pid =3D=3D cgroups[i].pid), + "correct process was killed")) { + ret =3D -1; + break; + } + + if (!cgroups[i].victim) + continue; + + size =3D read_cgroup_file(cgroups[i].path, + "memory.events", + buf, sizeof(buf)); + if (!ASSERT_OK(size <=3D 0, "read memory.events")) { + ret =3D -1; + break; + } + + if (!ASSERT_OK(strstr(buf, "oom_kill 1") =3D=3D NULL, + "oom_kill count check")) { + ret =3D -1; + break; + } + } + + for (i =3D 0; i < ARRAY_SIZE(cgroups); i++) + if (cgroups[i].pid && cgroups[i].pid !=3D pid) + kill(cgroups[i].pid, SIGKILL); + } + + return ret; +} + +void test_oom(void) +{ + struct test_oom *skel; + int err; + + skel =3D test_oom__open_and_load(); + err =3D test_oom__attach(skel); + if (!ASSERT_OK(err, "test_oom__attach")) + goto cleanup; + + setup_environment(); + + run_and_wait_for_oom(); + + cleanup_cgroup_environment(); +cleanup: + test_oom__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/test_oom.c b/tools/testing/s= elftests/bpf/progs/test_oom.c new file mode 100644 index 000000000000..a8224d7c3fed --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_oom.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include "vmlinux.h" +#include +#include +#include "bpf_misc.h" +#include "bpf_experimental.h" + +char _license[] SEC("license") =3D "GPL"; + +#define OOM_SCORE_ADJ_MIN (-1000) + +void bpf_rcu_read_lock(void) __ksym; +void bpf_rcu_read_unlock(void) __ksym; +struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; +struct mem_cgroup *bpf_get_root_mem_cgroup(void) __ksym; +struct mem_cgroup *bpf_get_mem_cgroup(struct cgroup_subsys_state *css) __k= sym; +void bpf_put_mem_cgroup(struct mem_cgroup *memcg) __ksym; +int bpf_oom_kill_process(struct oom_control *oc, struct task_struct *task, + const char *message__str) __ksym; + +bool mem_cgroup_killable(struct mem_cgroup *memcg) +{ + struct task_struct *task; + bool ret =3D true; + + bpf_for_each(css_task, task, &memcg->css, CSS_TASK_ITER_PROCS) + if (task->signal->oom_score_adj =3D=3D OOM_SCORE_ADJ_MIN) + return false; + + return ret; +} + +/* + * Find the largest leaf cgroup (ignoring page cache) without unkillable t= asks + * and kill all belonging tasks. + */ +SEC("fmod_ret.s/bpf_handle_out_of_memory") +int BPF_PROG(test_bpf_out_of_memory, struct oom_control *oc) +{ + struct task_struct *task; + struct mem_cgroup *root_memcg =3D oc->memcg; + struct mem_cgroup *memcg, *victim =3D NULL; + struct cgroup_subsys_state *css_pos; + unsigned long usage, max_usage =3D 0; + unsigned long pagecache =3D 0; + int ret =3D 0; + + if (root_memcg) + root_memcg =3D bpf_get_mem_cgroup(&root_memcg->css); + else + root_memcg =3D bpf_get_root_mem_cgroup(); + + if (!root_memcg) + return 0; + + bpf_rcu_read_lock(); + bpf_for_each(css, css_pos, &root_memcg->css, BPF_CGROUP_ITER_DESCENDANTS_= POST) { + if (css_pos->cgroup->nr_descendants + css_pos->cgroup->nr_dying_descenda= nts) + continue; + + memcg =3D bpf_get_mem_cgroup(css_pos); + if (!memcg) + continue; + + usage =3D bpf_mem_cgroup_usage(memcg, false); + pagecache =3D bpf_mem_cgroup_page_state(memcg, NR_FILE_PAGES); + + if (usage > pagecache) + usage -=3D pagecache; + else + usage =3D 0; + + if ((usage > max_usage) && mem_cgroup_killable(memcg)) { + max_usage =3D usage; + if (victim) + bpf_put_mem_cgroup(victim); + victim =3D bpf_get_mem_cgroup(&memcg->css); + } + + bpf_put_mem_cgroup(memcg); + } + bpf_rcu_read_unlock(); + + if (!victim) + goto exit; + + bpf_for_each(css_task, task, &victim->css, CSS_TASK_ITER_PROCS) { + struct task_struct *t =3D bpf_task_acquire(task); + + if (t) { + bpf_oom_kill_process(oc, task, "bpf oom test"); + bpf_task_release(t); + ret =3D 1; + } + } + + bpf_put_mem_cgroup(victim); +exit: + bpf_put_mem_cgroup(root_memcg); + + return ret; +} --=20 2.49.0.901.g37484f566f-goog From nobody Sun Feb 8 08:27:30 2026 Received: from out-188.mta1.migadu.com (out-188.mta1.migadu.com [95.215.58.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44ECC1E98F3 for ; Mon, 28 Apr 2025 03:37:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811425; cv=none; b=SaVZ1ht7vehIRv4mkxrRgfkS3L4StUQ86e/aIxPV5qfVCmW1CL+GA6ySNuXa37GMGkeycTN0usJw05iv+3RpSB/rdZDEryb/2sAGZddBbnPZkgsGtmPCV8oXJAG5ZphbCMc4a33ctw7WRDCFubvx9ixOpO4Rt0ZMaVsL3YzQQ6E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811425; c=relaxed/simple; bh=MSuCn/IFWqQlylKlbdwaS8cTvPrEiuR18JIMnVpqPEA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZPar1TZptCbtJclPWJXLNpizxGpMVzNHjGvTZNHArNt2E/OkdK+EuQdsWQX9lUakMZTsm1r6J25iA2MPoh2KvYFozSFdo2lGWinXldzYfvsn1QxEiufqNyP16vsDDua2auknV14MwkaE3N7EOuYw/A2/cgDQrTxFvNALTOJ2aU0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Gf5dRr3T; arc=none smtp.client-ip=95.215.58.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Gf5dRr3T" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811420; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5mgjhUpiTMjjN/y1DN45zRe/y1CtVNZ7gASyknJ+oe4=; b=Gf5dRr3TIZDIRx58GflQe6YhjrDUw+0tzYVZbiKzXUNkPSh7HUd7ek/cBcTFcLuHz3q+G4 96GXvmnprLNKAGeRu8Zk/E1HlHIvJXfcWn8DjkRrxGuKHhy7GYP+ik89uT0QxKhw9T72Vp oJoSXAS2Ltc8+hfooGk4XMHgY8G/unk= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 09/12] sched: psi: bpf hook to handle psi events Date: Mon, 28 Apr 2025 03:36:14 +0000 Message-ID: <20250428033617.3797686-10-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Introduce a bpf hook to handle psi events. The primary intended purpose of this hook is to declare OOM events based on the reaching a certain memory pressure level, similar to what systemd-oomd and oomd are doing in userspace. Signed-off-by: Roman Gushchin --- kernel/sched/psi.c | 36 +++++++++++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 1396674fa722..4c4eb4ead8f6 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -176,6 +176,32 @@ static void psi_avgs_work(struct work_struct *work); =20 static void poll_timer_fn(struct timer_list *t); =20 +#ifdef CONFIG_BPF_SYSCALL +__bpf_hook_start(); + +__weak noinline int bpf_handle_psi_event(struct psi_trigger *t) +{ + return 0; +} + +__bpf_hook_end(); + +BTF_KFUNCS_START(bpf_psi_hooks) +BTF_ID_FLAGS(func, bpf_handle_psi_event, KF_SLEEPABLE) +BTF_KFUNCS_END(bpf_psi_hooks) + +static const struct btf_kfunc_id_set bpf_psi_hook_set =3D { + .owner =3D THIS_MODULE, + .set =3D &bpf_psi_hooks, +}; + +#else +static inline int bpf_handle_psi_event(struct psi_trigger *t) +{ + return 0; +} +#endif + static void group_init(struct psi_group *group) { int cpu; @@ -489,6 +515,7 @@ static void update_triggers(struct psi_group *group, u6= 4 now, =20 /* Generate an event */ if (cmpxchg(&t->event, 0, 1) =3D=3D 0) { + bpf_handle_psi_event(t); if (t->of) kernfs_notify(t->of->kn); else @@ -1655,6 +1682,8 @@ static const struct proc_ops psi_irq_proc_ops =3D { =20 static int __init psi_proc_init(void) { + int err =3D 0; + if (psi_enable) { proc_mkdir("pressure", NULL); proc_create("pressure/io", 0666, NULL, &psi_io_proc_ops); @@ -1662,9 +1691,14 @@ static int __init psi_proc_init(void) proc_create("pressure/cpu", 0666, NULL, &psi_cpu_proc_ops); #ifdef CONFIG_IRQ_TIME_ACCOUNTING proc_create("pressure/irq", 0666, NULL, &psi_irq_proc_ops); +#endif +#ifdef CONFIG_BPF_SYSCALL + err =3D register_btf_fmodret_id_set(&bpf_psi_hook_set); + if (err) + pr_err("error while registering bpf psi hooks: %d", err); #endif } - return 0; + return err; } module_init(psi_proc_init); =20 --=20 2.49.0.901.g37484f566f-goog From nobody Sun Feb 8 08:27:30 2026 Received: from out-178.mta1.migadu.com (out-178.mta1.migadu.com [95.215.58.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46CB1149C7B for ; Mon, 28 Apr 2025 03:37:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811429; cv=none; b=uO1fuq4Hx07LYpHt/tr8FRDxdt4e1SV41MmSs1rx5Pkvv37RPxCw9sY3eyVAtugTfiEXfVq2gXwXil4TBCI4d6hIjTkV8qYHCer1zaOB0E+/rQh168mtniISPi/z1Y+DMJxrKyjmPxOZSR66jLfRN5T5PHwKwdbeHgann+Pxw8E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811429; c=relaxed/simple; bh=yMb5GqcrV1LWSU+1rW5hRyuHFkaaCjPtgv7KT1ltykU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=els1//BR0YZkKMHxqSM4rqt5EITccn3GFKBF/ma9qLLXmoaFQ1kclUhwY961HFqDpnj2xzU+9JBYGSRCg3xId1fQhlPWj76phVHqXPuN15n1WP4kZOzZgaLjoxO9olo/kZxlER8VeqOOnhghYBrYfEYXZo689jcmw56ONVh7SrA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=c/+BhYCM; arc=none smtp.client-ip=95.215.58.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="c/+BhYCM" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811423; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8WtdDCF/fOjk7l1oSxAg6r0dFrFtCENHh8DzkFeOJww=; b=c/+BhYCMrRSBr4yTd5mv7I5T0rvQOkVKbFl9XgTAeowasGX83nsPkgUVRynyVYaolN9nj8 Wc3beDU42dsEqeUnOao909cO5CNBY1mhRNOzYIyFs52IGCEiGXH2XiE+X/xtR9CO4B/eMp PeX97/KKF4tPd4+/I2wRyhQiGvbphn8= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 10/12] mm: introduce bpf_out_of_memory() bpf kfunc Date: Mon, 28 Apr 2025 03:36:15 +0000 Message-ID: <20250428033617.3797686-11-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Introduce bpf_out_of_memory() bpf kfunc, which allows to declare an out of memory events and trigger the corresponding kernel OOM handling mechanism. It takes a trusted memcg pointer (or NULL for system-wide OOMs) as an argument, as well as the page order. Only one OOM can be declared and handled in the system at once, so if the function is called in parallel to another OOM handling, it bails out with -EBUSY. The function is declared as sleepable. It guarantees that it won't be called from an atomic context. It's required by the OOM handling code, which is not guaranteed to work in a non-blocking context. Handling of a memcg OOM almost always requires taking of the css_set_lock spinlock. The fact that bpf_out_of_memory() is sleepable also guarantees that it can't be called with acquired css_set_lock, so the kernel can't deadlock on it. Signed-off-by: Roman Gushchin --- mm/oom_kill.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 2e922e75a9df..246510572e34 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1333,6 +1333,27 @@ __bpf_kfunc int bpf_oom_kill_process(struct oom_cont= rol *oc, return 0; } =20 +__bpf_kfunc int bpf_out_of_memory(struct mem_cgroup *memcg__nullable, int = order) +{ + struct oom_control oc =3D { + .memcg =3D memcg__nullable, + .order =3D order, + }; + int ret =3D -EINVAL; + + if (oc.order < 0 || oc.order > MAX_PAGE_ORDER) + goto out; + + ret =3D -EBUSY; + if (mutex_trylock(&oom_lock)) { + ret =3D out_of_memory(&oc); + mutex_unlock(&oom_lock); + } + +out: + return ret; +} + __bpf_kfunc_end_defs(); =20 __bpf_hook_start(); @@ -1358,6 +1379,7 @@ static const struct btf_kfunc_id_set bpf_oom_hook_set= =3D { =20 BTF_KFUNCS_START(bpf_oom_kfuncs) BTF_ID_FLAGS(func, bpf_oom_kill_process, KF_SLEEPABLE | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_out_of_memory, KF_SLEEPABLE | KF_TRUSTED_ARGS) BTF_KFUNCS_END(bpf_oom_kfuncs) =20 static const struct btf_kfunc_id_set bpf_oom_kfunc_set =3D { --=20 2.49.0.901.g37484f566f-goog From nobody Sun Feb 8 08:27:30 2026 Received: from out-178.mta1.migadu.com (out-178.mta1.migadu.com [95.215.58.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01E961FF7B3 for ; Mon, 28 Apr 2025 03:37:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811429; cv=none; b=A4k1YrKFNKm7L6czS1WhnlcSVocg0//L5zGNDxcm2YAuk36cQy/7hguufzHRiUpeolhM+qi6kcE+KgIXgqo5At4LVJmxxIJxskn/zwuRZfKnQUkGDcrZpQiB4LhapTlvErcQKuQ83Mw1X+g0YdwlZ8ZNoI2nekeSSZkaT/ltCf8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811429; c=relaxed/simple; bh=on/5upKDQwIlgDrR9xUu6MT+jGsYtVSbHMqkx2gvLKw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IpbJYhFZ8B5VuEAr+jeLWBq94bgg64qcGDdodQ9zgq/KnzQf3ngLf34cG7uyO17IK3HYl9cM60h1vi155OJhdU30CoSEOx1ld6HViSBoW/ntObdSy1kHgIbG8Tqavp8oRXSkK1d90QWLdSr8lCrfVGZoyRY0t74XMxuYlBYGqF4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=GFoHq5dc; arc=none smtp.client-ip=95.215.58.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="GFoHq5dc" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811426; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=msMOLWNhdsitkHxfB8eybUZIVkvsiA7kEk7v+r38KUo=; b=GFoHq5dcQ1uvGTDy7BA4uNFas/fMqvlMu3PTWRSM5Xpm+2VVDbazr2X5JguDEvw4nzBsJN m3P7yw1F2qUugoTIVwS1rrY6sr8wbZiP0L4rPyw7+TYng0tBEI1wSGnIn1L49RE86nT0xA khTj52Zjw6SCGhFYo3s7wNtbwSdohYY= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 11/12] bpf: selftests: introduce open_cgroup_file() helper Date: Mon, 28 Apr 2025 03:36:16 +0000 Message-ID: <20250428033617.3797686-12-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Implement the open_cgroup_file() helper which opens a cgroup control file with the given flags and returns a file descriptor. It's useful when a test needs to do something more sophisticated than read/write, e.g. listen for poll events or keep the file descriptor open. Signed-off-by: Roman Gushchin --- tools/testing/selftests/bpf/cgroup_helpers.c | 28 ++++++++++++++++++++ tools/testing/selftests/bpf/cgroup_helpers.h | 1 + 2 files changed, 29 insertions(+) diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/s= elftests/bpf/cgroup_helpers.c index 3ffd4b764f91..50dbe4f45cb1 100644 --- a/tools/testing/selftests/bpf/cgroup_helpers.c +++ b/tools/testing/selftests/bpf/cgroup_helpers.c @@ -125,6 +125,34 @@ int enable_controllers(const char *relative_path, cons= t char *controllers) return __enable_controllers(cgroup_path, controllers); } =20 +static int __open_cgroup_file(const char *cgroup_path, const char *file, + int flags) +{ + char file_path[PATH_MAX + 1]; + + snprintf(file_path, sizeof(file_path), "%s/%s", cgroup_path, file); + return open(file_path, flags); +} + +/** + * open_cgroup_file() - Open a cgroup file + * @relative_path: The cgroup path, relative to the workdir + * @file: The name of the file in cgroupfs to open to + * @flags: Flags + * + * Open a file in the given cgroup's directory. + * + * If successful, fd is returned. + */ +int open_cgroup_file(const char *relative_path, const char *file, + int flags) +{ + char cgroup_path[PATH_MAX - 24]; + + format_cgroup_path(cgroup_path, relative_path); + return __open_cgroup_file(cgroup_path, file, flags); +} + static size_t __read_cgroup_file(const char *cgroup_path, const char *file, char *buf, size_t size) { diff --git a/tools/testing/selftests/bpf/cgroup_helpers.h b/tools/testing/s= elftests/bpf/cgroup_helpers.h index 821cb76db1f7..f45007d5fea5 100644 --- a/tools/testing/selftests/bpf/cgroup_helpers.h +++ b/tools/testing/selftests/bpf/cgroup_helpers.h @@ -11,6 +11,7 @@ =20 /* cgroupv2 related */ int enable_controllers(const char *relative_path, const char *controllers); +int open_cgroup_file(const char *relative_path, const char *file, int flag= s); size_t read_cgroup_file(const char *relative_path, const char *file, char *buf, size_t size); int write_cgroup_file(const char *relative_path, const char *file, --=20 2.49.0.901.g37484f566f-goog From nobody Sun Feb 8 08:27:30 2026 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E32D020A5EB for ; Mon, 28 Apr 2025 03:37:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811432; cv=none; b=c0y4u6PtFkY3es1LaG9f2wBzgZNNcVjD2t6d4llaSyAjL0PgiZ5JrgAY6IoxMzQNTyCt/Eo3XYVxfmMBKaYxezDi2GQv+llO1T3KgW+CcYO2On12EQhEdY+xYWM3m/T2bnaWL1jA0BJGEInObwtGneMFFVY4J0NNcVriYav+xmY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811432; c=relaxed/simple; bh=9m6dGm3hgSNWG+2WOSHP2KyLWYLAjePv7Lo4oUqmktU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YCfq4MrijBMkJ51Rv55s1k8vSyYKYToSpetBKicwIFuvNEmvug3AyifHYrUnEMC3RcR0mQIz9t7mkVJVqboTJtStaw/lJRfyZMlaJyuVflEPYRi5amtOV802o9NV4S8iTnUCs6h16KYfklwH9MdMzJPU1OFDH1uYQkdzK2E6nSs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=WoHk90vS; arc=none smtp.client-ip=95.215.58.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="WoHk90vS" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811429; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PMD4alSuF/CQrdJat1BcZoxvyxgjdLtWthYldLJYf1E=; b=WoHk90vSsCT6HqUIwr7zPvav48u2HbF9Fj+LO5seyFrEcCAO/yMDiQn7IpkGImXxA5Cx3W jN5br5GN6WKkrp+RIO/MIO4P0oJHADYQOhd7umOi69q0P/NDHaILQKKI3xxLRvkpkvHRIX Rab1/AQR2q1H72iYJeg+PrWNcLnEGGE= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 12/12] bpf: selftests: psi handler test Date: Mon, 28 Apr 2025 03:36:17 +0000 Message-ID: <20250428033617.3797686-13-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Add a psi handler test. The test creates a cgroup with two child sub-cgroups, sets up memory.high for one of those and puts memory hungry processes in each of them. Then it sets up a psi trigger for one of cgroups and waits till the process in this cgroup will be killed by the OOM killer. To make sure there was indeed an OOM event, it checks the corresponding memcg statistics. Signed-off-by: Roman Gushchin --- tools/testing/selftests/bpf/prog_tests/psi.c | 234 +++++++++++++++++++ tools/testing/selftests/bpf/progs/test_psi.c | 43 ++++ 2 files changed, 277 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/psi.c create mode 100644 tools/testing/selftests/bpf/progs/test_psi.c diff --git a/tools/testing/selftests/bpf/prog_tests/psi.c b/tools/testing/s= elftests/bpf/prog_tests/psi.c new file mode 100644 index 000000000000..99d68bc20eee --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/psi.c @@ -0,0 +1,234 @@ +// SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "cgroup_helpers.h" +#include "test_psi.skel.h" + +struct cgroup_desc { + const char *path; + int fd; + unsigned long long id; + int pid; + size_t target; + size_t high; + bool victim; + bool psi; +}; + +#define MB (1024 * 1024) + +static struct cgroup_desc cgroups[] =3D { + { .path =3D "/oom_test" }, + { .path =3D "/oom_test/cg1", .target =3D 100 * MB }, + { .path =3D "/oom_test/cg2", .target =3D 500 * MB, + .high =3D 40 * MB, .psi =3D true, .victim =3D true }, +}; + +static int spawn_task(struct cgroup_desc *desc) +{ + char *ptr; + int pid; + + pid =3D fork(); + if (pid < 0) + return pid; + + if (pid > 0) { + /* parent */ + desc->pid =3D pid; + return 0; + } + + /* child */ + ptr =3D (char *)malloc(desc->target); + if (!ptr) + return -ENOMEM; + + memset(ptr, 'a', desc->target); + + while (1) + sleep(1000); + + return 0; +} + +static int setup_psi_alert(struct cgroup_desc *desc) +{ + const char *trig =3D "some 100000 1000000"; + int fd; + + fd =3D open_cgroup_file(desc->path, "memory.pressure", O_RDWR); + if (fd < 0) { + printf("memory.pressure open error: %s\n", strerror(errno)); + return 1; + } + + if (write(fd, trig, strlen(trig) + 1) < 0) { + printf("memory.pressure write error: %s\n", strerror(errno)); + return 1; + } + + /* keep fd open, otherwise the psi trigger will be deleted */ + return 0; +} + +static void setup_environment(void) +{ + int i, err; + + err =3D setup_cgroup_environment(); + if (!ASSERT_OK(err, "setup_cgroup_environment")) + goto cleanup; + + for (i =3D 0; i < ARRAY_SIZE(cgroups); i++) { + cgroups[i].fd =3D create_and_get_cgroup(cgroups[i].path); + if (!ASSERT_GE(cgroups[i].fd, 0, "create_and_get_cgroup")) + goto cleanup; + + cgroups[i].id =3D get_cgroup_id(cgroups[i].path); + if (!ASSERT_GT(cgroups[i].id, 0, "get_cgroup_id")) + goto cleanup; + + if (i =3D=3D 0) { + /* Freeze the top-level cgroup */ + err =3D write_cgroup_file(cgroups[i].path, "cgroup.freeze", "1"); + if (!ASSERT_OK(err, "freeze cgroup")) + goto cleanup; + } + + if (!cgroups[i].target) { + /* Recursively enable the memory controller */ + err =3D write_cgroup_file(cgroups[i].path, "cgroup.subtree_control", + "+memory"); + if (!ASSERT_OK(err, "enable memory controller")) + goto cleanup; + } + + if (cgroups[i].high) { + char buf[256]; + + snprintf(buf, sizeof(buf), "%lu", cgroups[i].high); + err =3D write_cgroup_file(cgroups[i].path, "memory.high", buf); + if (!ASSERT_OK(err, "set memory.high")) + goto cleanup; + + snprintf(buf, sizeof(buf), "0"); + write_cgroup_file(cgroups[i].path, "memory.swap.max", buf); + } + + if (cgroups[i].target) { + char buf[256]; + + err =3D spawn_task(&cgroups[i]); + if (!ASSERT_OK(err, "spawn task")) + goto cleanup; + + snprintf(buf, sizeof(buf), "%d", cgroups[i].pid); + err =3D write_cgroup_file(cgroups[i].path, "cgroup.procs", buf); + if (!ASSERT_OK(err, "put child into a cgroup")) + goto cleanup; + } + + if (cgroups[i].psi) { + err =3D setup_psi_alert(&cgroups[i]); + if (!ASSERT_OK(err, "create psi trigger")) + goto cleanup; + } + } + + return; + +cleanup: + cleanup_cgroup_environment(); +} + +static int run_and_wait_for_oom(void) +{ + int ret =3D -1; + bool first =3D true; + char buf[4096] =3D {}; + size_t size; + + ret =3D write_cgroup_file(cgroups[0].path, "cgroup.freeze", "0"); + if (!ASSERT_OK(ret, "freeze cgroup")) + return -1; + + for (;;) { + int i, status; + pid_t pid =3D wait(&status); + + if (pid =3D=3D -1) { + if (errno =3D=3D EINTR) + continue; + /* ECHILD */ + break; + } + + if (!first) + continue; + + first =3D false; + + for (i =3D 0; i < ARRAY_SIZE(cgroups); i++) { + if (!ASSERT_OK(cgroups[i].victim !=3D + (pid =3D=3D cgroups[i].pid), + "correct process was killed")) { + ret =3D -1; + break; + } + + if (!cgroups[i].victim) + continue; + + size =3D read_cgroup_file(cgroups[i].path, "memory.events", + buf, sizeof(buf)); + if (!ASSERT_OK(size <=3D 0, "read memory.events")) { + ret =3D -1; + break; + } + + if (!ASSERT_OK(strstr(buf, "oom_kill 1") =3D=3D NULL, + "oom_kill count check")) { + ret =3D -1; + break; + } + } + + for (i =3D 0; i < ARRAY_SIZE(cgroups); i++) + if (cgroups[i].pid && cgroups[i].pid !=3D pid) + kill(cgroups[i].pid, SIGKILL); + } + + return ret; +} + +void test_psi(void) +{ + struct test_psi *skel; + int err; + + skel =3D test_psi__open_and_load(); + err =3D test_psi__attach(skel); + if (!ASSERT_OK(err, "test_psi__attach")) + goto cleanup; + + setup_environment(); + + run_and_wait_for_oom(); + + cleanup_cgroup_environment(); +cleanup: + test_psi__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/test_psi.c b/tools/testing/s= elftests/bpf/progs/test_psi.c new file mode 100644 index 000000000000..8cbc1e0a5b24 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_psi.c @@ -0,0 +1,43 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include "vmlinux.h" +#include +#include +#include "bpf_misc.h" +#include "bpf_experimental.h" + +char _license[] SEC("license") =3D "GPL"; + +struct mem_cgroup *bpf_get_mem_cgroup(struct cgroup_subsys_state *css) __k= sym; +void bpf_put_mem_cgroup(struct mem_cgroup *memcg) __ksym; +int bpf_out_of_memory(struct mem_cgroup *memcg, int order) __ksym; + +SEC("fmod_ret.s/bpf_handle_psi_event") +int BPF_PROG(test_psi_event, struct psi_trigger *t) +{ + struct cgroup *cgroup =3D NULL; + struct mem_cgroup *memcg; + u64 cgroup_id; + + if (!t->of || !t->of->kn) { + bpf_out_of_memory(NULL, 0); + return 1; + } + + cgroup_id =3D t->of->kn->__parent->id; + cgroup =3D bpf_cgroup_from_id(cgroup_id); + if (!cgroup) + return 0; + + memcg =3D bpf_get_mem_cgroup(&cgroup->self); + if (!memcg) { + bpf_cgroup_release(cgroup); + return 0; + } + + bpf_out_of_memory(memcg, 0); + + bpf_put_mem_cgroup(memcg); + bpf_cgroup_release(cgroup); + + return 1; +} --=20 2.49.0.901.g37484f566f-goog