From nobody Sat Oct 4 11:13:48 2025 Received: from out-189.mta1.migadu.com (out-189.mta1.migadu.com [95.215.58.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D3C23570C4 for ; Mon, 18 Aug 2025 17:02:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755536551; cv=none; b=SculmN0vqps0QcbvIxOYfD7JAdWFkOhLcpl1bGitYVL5uVqtZzfmy0iV2pcLSH0lgfyBp/ewRw7IIHYDqAKqHP5oYg+H/oN7jH6IYzfd9RSNCxx+17tl+QepTw4rjKjJif2QncOjPJ5XB6lZ5h5BzMKigScEvWTkLslyLADlGFU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755536551; c=relaxed/simple; bh=LpUvZYfAyvYsY1hyOXGDHMlV4+yV+Q9IiwaMAqbSsNE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=q5zaIPrGJ6vT2a7W+Eiwc+F03+4I7LT/SR74BHllcXPLEebZtH9++uYd/Q7ifQXjEuG1zZ2jl+A1GqiPUcpsWZPfGVLqTS1onJubBpPnYN4eZr6eb6hj2vsuY2/6CUZvQzTVOy+6n52c0NA4AqpBCrJToHLNoKSTKPYUuHliEWQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=V99YPHhZ; arc=none smtp.client-ip=95.215.58.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="V99YPHhZ" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1755536548; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HMXbpPyZn28wNi81Xy52IQ6/8O/P3CwcUXlLMg8zaio=; b=V99YPHhZQ+9Z8uCLGLL9YKz0afO/2QEMrxwXJ5U+Jz6YIUa/0QAzy1yphb3RyW4jRP64q8 BtPn2FKzKyGcaSeG+ltwX+S0XBJlvC7zRp5ZIN/+P0BbBr/JsoZeZDsv0ToPKeLFKTpxv5 yDjDkKAMKljjERe5aBtchFJSAkvxJuo= From: Roman Gushchin To: linux-mm@kvack.org, bpf@vger.kernel.org Cc: Suren Baghdasaryan , Johannes Weiner , Michal Hocko , David Rientjes , Matt Bobrowski , Song Liu , Kumar Kartikeya Dwivedi , Alexei Starovoitov , Andrew Morton , linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 11/14] sched: psi: refactor psi_trigger_create() Date: Mon, 18 Aug 2025 10:01:33 -0700 Message-ID: <20250818170136.209169-12-roman.gushchin@linux.dev> In-Reply-To: <20250818170136.209169-1-roman.gushchin@linux.dev> References: <20250818170136.209169-1-roman.gushchin@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Currently psi_trigger_create() does a lot of things: parses the user text input, allocates and initializes the psi_trigger structure and turns on the trigger. It does it slightly different for two existing types of psi_triggers: system-wide and cgroup-wide. In order to support a new type of psi triggers, which will be owned by a bpf program and won't have a user's text description, let's refactor psi_trigger_create(). 1. Introduce psi_trigger_type enum: currently PSI_SYSTEM and PSI_CGROUP are valid values. 2. Introduce psi_trigger_params structure to avoid passing a large number of parameters to psi_trigger_create(). 3. Move out the user's input parsing into the new psi_trigger_parse() helper. 4. Move out the capabilities check into the new psi_file_privileged() helper. 5. Stop relying on t->of for detecting trigger type. Signed-off-by: Roman Gushchin --- include/linux/psi.h | 15 +++++-- include/linux/psi_types.h | 33 ++++++++++++++- kernel/cgroup/cgroup.c | 14 ++++++- kernel/sched/psi.c | 87 +++++++++++++++++++++++++-------------- 4 files changed, 112 insertions(+), 37 deletions(-) diff --git a/include/linux/psi.h b/include/linux/psi.h index e0745873e3f2..8178e998d94b 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -23,14 +23,23 @@ void psi_memstall_enter(unsigned long *flags); void psi_memstall_leave(unsigned long *flags); =20 int psi_show(struct seq_file *s, struct psi_group *group, enum psi_res res= ); -struct psi_trigger *psi_trigger_create(struct psi_group *group, char *buf, - enum psi_res res, struct file *file, - struct kernfs_open_file *of); +int psi_trigger_parse(struct psi_trigger_params *params, const char *buf); +struct psi_trigger *psi_trigger_create(struct psi_group *group, + const struct psi_trigger_params *param); void psi_trigger_destroy(struct psi_trigger *t); =20 __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, poll_table *wait); =20 +static inline bool psi_file_privileged(struct file *file) +{ + /* + * Checking the privilege here on file->f_cred implies that a privileged = user + * could open the file and delegate the write to an unprivileged one. + */ + return cap_raised(file->f_cred->cap_effective, CAP_SYS_RESOURCE); +} + #ifdef CONFIG_CGROUPS static inline struct psi_group *cgroup_psi(struct cgroup *cgrp) { diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index f1fd3a8044e0..cea54121d9b9 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -121,7 +121,38 @@ struct psi_window { u64 prev_growth; }; =20 +enum psi_trigger_type { + PSI_SYSTEM, + PSI_CGROUP, +}; + +struct psi_trigger_params { + /* Trigger type */ + enum psi_trigger_type type; + + /* Resources that workloads could be stalled on */ + enum psi_res res; + + /* True if all threads should be stalled to trigger */ + bool full; + + /* Threshold in us */ + u32 threshold_us; + + /* Window in us */ + u32 window_us; + + /* Privileged triggers are treated differently */ + bool privileged; + + /* Link to kernfs open file, only for PSI_CGROUP */ + struct kernfs_open_file *of; +}; + struct psi_trigger { + /* Trigger type */ + enum psi_trigger_type type; + /* PSI state being monitored by the trigger */ enum psi_states state; =20 @@ -137,7 +168,7 @@ struct psi_trigger { /* Wait queue for polling */ wait_queue_head_t event_wait; =20 - /* Kernfs file for cgroup triggers */ + /* Kernfs file for PSI_CGROUP triggers */ struct kernfs_open_file *of; =20 /* Pending event flag */ diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index a723b7dc6e4e..9cd3c3a52c21 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3872,6 +3872,12 @@ static ssize_t pressure_write(struct kernfs_open_fil= e *of, char *buf, struct psi_trigger *new; struct cgroup *cgrp; struct psi_group *psi; + struct psi_trigger_params params; + int err; + + err =3D psi_trigger_parse(¶ms, buf); + if (err) + return err; =20 cgrp =3D cgroup_kn_lock_live(of->kn, false); if (!cgrp) @@ -3887,7 +3893,13 @@ static ssize_t pressure_write(struct kernfs_open_fil= e *of, char *buf, } =20 psi =3D cgroup_psi(cgrp); - new =3D psi_trigger_create(psi, buf, res, of->file, of); + + params.type =3D PSI_CGROUP; + params.res =3D res; + params.privileged =3D psi_file_privileged(of->file); + params.of =3D of; + + new =3D psi_trigger_create(psi, ¶ms); if (IS_ERR(new)) { cgroup_put(cgrp); return PTR_ERR(new); diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index ad04a5c3162a..e1d8eaeeff17 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -489,7 +489,7 @@ static void update_triggers(struct psi_group *group, u6= 4 now, =20 /* Generate an event */ if (cmpxchg(&t->event, 0, 1) =3D=3D 0) { - if (t->of) + if (t->type =3D=3D PSI_CGROUP) kernfs_notify(t->of->kn); else wake_up_interruptible(&t->event_wait); @@ -1281,74 +1281,87 @@ int psi_show(struct seq_file *m, struct psi_group *= group, enum psi_res res) return 0; } =20 -struct psi_trigger *psi_trigger_create(struct psi_group *group, char *buf, - enum psi_res res, struct file *file, - struct kernfs_open_file *of) +int psi_trigger_parse(struct psi_trigger_params *params, const char *buf) { - struct psi_trigger *t; - enum psi_states state; - u32 threshold_us; - bool privileged; - u32 window_us; + u32 threshold_us, window_us; =20 if (static_branch_likely(&psi_disabled)) - return ERR_PTR(-EOPNOTSUPP); - - /* - * Checking the privilege here on file->f_cred implies that a privileged = user - * could open the file and delegate the write to an unprivileged one. - */ - privileged =3D cap_raised(file->f_cred->cap_effective, CAP_SYS_RESOURCE); + return -EOPNOTSUPP; =20 if (sscanf(buf, "some %u %u", &threshold_us, &window_us) =3D=3D 2) - state =3D PSI_IO_SOME + res * 2; + params->full =3D false; else if (sscanf(buf, "full %u %u", &threshold_us, &window_us) =3D=3D 2) - state =3D PSI_IO_FULL + res * 2; + params->full =3D true; else - return ERR_PTR(-EINVAL); + return -EINVAL; + + params->threshold_us =3D threshold_us; + params->window_us =3D window_us; + return 0; +} + +struct psi_trigger *psi_trigger_create(struct psi_group *group, + const struct psi_trigger_params *params) +{ + struct psi_trigger *t; + enum psi_states state; + + if (static_branch_likely(&psi_disabled)) + return ERR_PTR(-EOPNOTSUPP); + + state =3D params->full ? PSI_IO_FULL : PSI_IO_SOME; + state +=3D params->res * 2; =20 #ifdef CONFIG_IRQ_TIME_ACCOUNTING - if (res =3D=3D PSI_IRQ && --state !=3D PSI_IRQ_FULL) + if (params->res =3D=3D PSI_IRQ && --state !=3D PSI_IRQ_FULL) return ERR_PTR(-EINVAL); #endif =20 if (state >=3D PSI_NONIDLE) return ERR_PTR(-EINVAL); =20 - if (window_us =3D=3D 0 || window_us > WINDOW_MAX_US) + if (params->window_us =3D=3D 0 || params->window_us > WINDOW_MAX_US) return ERR_PTR(-EINVAL); =20 /* * Unprivileged users can only use 2s windows so that averages aggregation * work is used, and no RT threads need to be spawned. */ - if (!privileged && window_us % 2000000) + if (!params->privileged && params->window_us % 2000000) return ERR_PTR(-EINVAL); =20 /* Check threshold */ - if (threshold_us =3D=3D 0 || threshold_us > window_us) + if (params->threshold_us =3D=3D 0 || params->threshold_us > params->windo= w_us) return ERR_PTR(-EINVAL); =20 t =3D kmalloc(sizeof(*t), GFP_KERNEL); if (!t) return ERR_PTR(-ENOMEM); =20 + t->type =3D params->type; t->group =3D group; t->state =3D state; - t->threshold =3D threshold_us * NSEC_PER_USEC; - t->win.size =3D window_us * NSEC_PER_USEC; + t->threshold =3D params->threshold_us * NSEC_PER_USEC; + t->win.size =3D params->window_us * NSEC_PER_USEC; window_reset(&t->win, sched_clock(), group->total[PSI_POLL][t->state], 0); =20 t->event =3D 0; t->last_event_time =3D 0; - t->of =3D of; - if (!of) + + switch (params->type) { + case PSI_SYSTEM: init_waitqueue_head(&t->event_wait); + break; + case PSI_CGROUP: + t->of =3D params->of; + break; + } + t->pending_event =3D false; - t->aggregator =3D privileged ? PSI_POLL : PSI_AVGS; + t->aggregator =3D params->privileged ? PSI_POLL : PSI_AVGS; =20 - if (privileged) { + if (params->privileged) { mutex_lock(&group->rtpoll_trigger_lock); =20 if (!rcu_access_pointer(group->rtpoll_task)) { @@ -1401,7 +1414,7 @@ void psi_trigger_destroy(struct psi_trigger *t) * being accessed later. Can happen if cgroup is deleted from under a * polling process. */ - if (t->of) + if (t->type =3D=3D PSI_CGROUP) kernfs_notify(t->of->kn); else wake_up_interruptible(&t->event_wait); @@ -1481,7 +1494,7 @@ __poll_t psi_trigger_poll(void **trigger_ptr, if (!t) return DEFAULT_POLLMASK | EPOLLERR | EPOLLPRI; =20 - if (t->of) + if (t->type =3D=3D PSI_CGROUP) kernfs_generic_poll(t->of, wait); else poll_wait(file, &t->event_wait, wait); @@ -1530,6 +1543,8 @@ static ssize_t psi_write(struct file *file, const cha= r __user *user_buf, size_t buf_size; struct seq_file *seq; struct psi_trigger *new; + struct psi_trigger_params params; + int err; =20 if (static_branch_likely(&psi_disabled)) return -EOPNOTSUPP; @@ -1543,6 +1558,10 @@ static ssize_t psi_write(struct file *file, const ch= ar __user *user_buf, =20 buf[buf_size - 1] =3D '\0'; =20 + err =3D psi_trigger_parse(¶ms, buf); + if (err) + return err; + seq =3D file->private_data; =20 /* Take seq->lock to protect seq->private from concurrent writes */ @@ -1554,7 +1573,11 @@ static ssize_t psi_write(struct file *file, const ch= ar __user *user_buf, return -EBUSY; } =20 - new =3D psi_trigger_create(&psi_system, buf, res, file, NULL); + params.type =3D PSI_SYSTEM; + params.res =3D res; + params.privileged =3D psi_file_privileged(file); + + new =3D psi_trigger_create(&psi_system, ¶ms); if (IS_ERR(new)) { mutex_unlock(&seq->lock); return PTR_ERR(new); --=20 2.50.1