From nobody Tue Feb 10 19:01:20 2026 Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17EA918E02A for ; Mon, 28 Apr 2025 03:36:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811400; cv=none; b=l5HfO3zIsD7RzubvFBxD7NCs+3TvNjKCKTXegRA3Qb765MMieECMcpPlkKnvlgtUe0r+4a7BYaf6SZyjzyie74KcAmfaqacXtmoInH+wLfBEMvVUydF0Z8F/c+b4lsglpjOe7DPA6tgp82SMBzrMs9VBBFhpqvChJCyJdwgezeA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745811400; c=relaxed/simple; bh=0jxF6XE1cAAhERd7I3gZluVPfQ7SYQN1hU2Tv5Ku2+k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZmNiwBNC3CkTSzAnfnyseI2I/2FqfDL4+yoDd16NdPDFwPQ8zGnONFEdj5rHzAO0M70cEcj+M4wC3uahW623AyZ2U7mKjDFwOJ9lMz2uyN4iITjoZK5+S2epSs6n35i6aNHim3HEDBy2uVF+2+lppqjjqKT0D1p9XS/8DtzThUg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Za8NqbMH; arc=none smtp.client-ip=95.215.58.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Za8NqbMH" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811396; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Yq8ss53Ru7Jhn38FBmHilN70R4el49iAxHid+VDVZdc=; b=Za8NqbMHlD/frdFKXQJ5cvdJMNbcCS/DRf5Jb1yVC4ngasstbn9VZgMIgtWblQFur+Y235 RN3ru9mjapI3Yt9K1ImzzZaqvTSrJu9HtMPrd8jZPA3hcXgk9TZ+vffZlihzedD7x2HEhK G7bxkMHe5bocYEvWA6a+Ku/DHw1wj0Q= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 01/12] mm: introduce a bpf hook for OOM handling Date: Mon, 28 Apr 2025 03:36:06 +0000 Message-ID: <20250428033617.3797686-2-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Introduce a bpf hook for implementing custom OOM handling policies. The hook is int bpf_handle_out_of_memory(struct oom_control *oc) function, which expected to return 1 if it was able to free some memory and 0 otherwise. In the latter case it's guaranteed that the in-kernel OOM killer will be invoked. Otherwise the kernel also checks the bpf_memory_freed field of the oom_control structure, which is expected to be set by kfuncs suitable for releasing memory. It's a safety mechanism which prevents a bpf program to claim forward progress without actually releasing memory. The hook program is sleepable to enable using iterators, e.g. cgroup iterators. The hook is executed just before the kernel victim task selection algorithm, so all heuristics and sysctls like panic on oom, sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task are respected. Signed-off-by: Roman Gushchin --- include/linux/oom.h | 5 ++++ mm/oom_kill.c | 68 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 73 insertions(+) diff --git a/include/linux/oom.h b/include/linux/oom.h index 1e0fc6931ce9..cc14aac9742c 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -51,6 +51,11 @@ struct oom_control { =20 /* Used to print the constraint info. */ enum oom_constraint constraint; + +#ifdef CONFIG_BPF_SYSCALL + /* Used by the bpf oom implementation to mark the forward progress */ + bool bpf_memory_freed; +#endif }; =20 extern struct mutex oom_lock; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 25923cfec9c6..d00776b63c0a 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -45,6 +45,7 @@ #include #include #include +#include =20 #include #include "internal.h" @@ -1100,6 +1101,30 @@ int unregister_oom_notifier(struct notifier_block *n= b) } EXPORT_SYMBOL_GPL(unregister_oom_notifier); =20 +#ifdef CONFIG_BPF_SYSCALL +int bpf_handle_out_of_memory(struct oom_control *oc); + +/* + * Returns true if the bpf oom program returns 1 and some memory was + * freed. + */ +static bool bpf_handle_oom(struct oom_control *oc) +{ + if (WARN_ON_ONCE(oc->chosen)) + oc->chosen =3D NULL; + + oc->bpf_memory_freed =3D false; + + return bpf_handle_out_of_memory(oc) && oc->bpf_memory_freed; +} + +#else +static inline bool bpf_handle_oom(struct oom_control *oc) +{ + return 0; +} +#endif + /** * out_of_memory - kill the "best" process when we run out of memory * @oc: pointer to struct oom_control @@ -1161,6 +1186,13 @@ bool out_of_memory(struct oom_control *oc) return true; } =20 + /* + * Let bpf handle the OOM first. If it was able to free up some memory, + * bail out. Otherwise fall back to the kernel OOM killer. + */ + if (bpf_handle_oom(oc)) + return true; + select_bad_process(oc); /* Found nothing?!?! */ if (!oc->chosen) { @@ -1264,3 +1296,39 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsign= ed int, flags) return -ENOSYS; #endif /* CONFIG_MMU */ } + +#ifdef CONFIG_BPF_SYSCALL + +__bpf_hook_start(); + +/* + * Bpf hook to customize the oom handling policy. + */ +__weak noinline int bpf_handle_out_of_memory(struct oom_control *oc) +{ + return 0; +} + +__bpf_hook_end(); + +BTF_KFUNCS_START(bpf_oom_hooks) +BTF_ID_FLAGS(func, bpf_handle_out_of_memory, KF_SLEEPABLE) +BTF_KFUNCS_END(bpf_oom_hooks) + +static const struct btf_kfunc_id_set bpf_oom_hook_set =3D { + .owner =3D THIS_MODULE, + .set =3D &bpf_oom_hooks, +}; +static int __init bpf_oom_init(void) +{ + int err; + + err =3D register_btf_fmodret_id_set(&bpf_oom_hook_set); + if (err) + pr_warn("error while registering bpf oom hooks: %d", err); + + return err; +} +late_initcall(bpf_oom_init); + +#endif --=20 2.49.0.901.g37484f566f-goog