From nobody Wed Dec 17 15:55:38 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2DDD919B3CE; Mon, 24 Jun 2024 15:27:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719242861; cv=none; b=OmMqdo+vhJzuaNVESZW1/+3NEDjAQJJfN++aG/VjywQuxzz/1enrQHYukwnyIPXfKN/1z3RRW+hy6d/zMDVe6BT42llhx8c9x1OwCEgej5PpqvpqX3s3S8Y49Se/LUulVnRbv1uXgp4NDDIiXQb1fv+1t2tuZ30t6BobTTaoJsE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719242861; c=relaxed/simple; bh=kC5TCeCXVnotUYnZW4koAmgq/beJ1ja5YqoIVqHEq4o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Cp8MYxtytY0YhH29c5AxjHq+TNjJSMBjwHxbCc3ZRMR4Up+gcFPKomdqZdvGF04Y6uLswY733heQExRR4/49AHM1KRjNRNpbSCffZy81eLiYee9CEfUjTxRnh1gvs++GB1dCANf4ozvMSUiszt+C6O+5sf2qb8VgUSXIFGAqOIU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=l2UD3NRX; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=odwh5+KR; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="l2UD3NRX"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="odwh5+KR" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1719242857; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=31fHSUjnkfqCuDeXZGlbP04Jz3JzNZ4WYRhzRxDpIFo=; b=l2UD3NRXBxB0yYObGI8FrUHvymKtHRWQgany4CSk+ETMF+inUBXnvuSBAtuonwfrK8Scbp UZYEB9+qeW+OFAJpwEdHQEhWNvHDB1pu9w3/FLJZf7gTs/7IXCc36rpQeAv4IAR5/Vw6PB YuKmu2gJ65Xb6mBYCSgeIFuM3ue0ep8jqOLmREOIuiJ2/7/JI/yqNAfSumBaCisZUdNGlk N8eZtDLtTu5CYsglp7s3oFn3TSAb4ccOfMJJnnwziP7uTl8LYPPMQ9uK4aG3z2Dc/9Hwu+ ryIslLHfVgkUkZW0bVtDtyLGOiC3/4tYEC4hGntwrDTuoEx0zd8knv8AyEYnRg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1719242857; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=31fHSUjnkfqCuDeXZGlbP04Jz3JzNZ4WYRhzRxDpIFo=; b=odwh5+KR0F4tcN1Xm+KDepmvbkFaZ3WBZ5fggnyRuKUwnFCZbvixg/FRsjSXMKKPFWHrca Bk9mp1zQoUnilkBw== To: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Daniel Bristot de Oliveira , Frederic Weisbecker , Ian Rogers , Ingo Molnar , Jiri Olsa , Kan Liang , Marco Elver , Mark Rutland , Namhyung Kim , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior , Arnaldo Carvalho de Melo Subject: [PATCH v4 1/6] perf: Move irq_work_queue() where the event is prepared. Date: Mon, 24 Jun 2024 17:15:14 +0200 Message-ID: <20240624152732.1231678-2-bigeasy@linutronix.de> In-Reply-To: <20240624152732.1231678-1-bigeasy@linutronix.de> References: <20240624152732.1231678-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Only if perf_event::pending_sigtrap is zero, the irq_work accounted by increminging perf_event::nr_pending. The member perf_event::pending_addr might be overwritten by a subsequent event if the signal was not yet delivered and is expected. The irq_work will not be enqeueued again because it has a check to be only enqueued once. Move irq_work_queue() to where the counter is incremented and perf_event::pending_sigtrap is set to make it more obvious that the irq_work is scheduled once. Tested-by: Marco Elver Tested-by: Arnaldo Carvalho de Melo Reported-by: Arnaldo Carvalho de Melo Signed-off-by: Sebastian Andrzej Siewior --- kernel/events/core.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 586d4f3676240..647abeeaeeb02 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9738,6 +9738,11 @@ static int __perf_event_overflow(struct perf_event *= event, if (!event->pending_sigtrap) { event->pending_sigtrap =3D pending_id; local_inc(&event->ctx->nr_pending); + + event->pending_addr =3D 0; + if (valid_sample && (data->sample_flags & PERF_SAMPLE_ADDR)) + event->pending_addr =3D data->addr; + irq_work_queue(&event->pending_irq); } else if (event->attr.exclude_kernel && valid_sample) { /* * Should not be able to return to user space without @@ -9753,11 +9758,6 @@ static int __perf_event_overflow(struct perf_event *= event, */ WARN_ON_ONCE(event->pending_sigtrap !=3D pending_id); } - - event->pending_addr =3D 0; - if (valid_sample && (data->sample_flags & PERF_SAMPLE_ADDR)) - event->pending_addr =3D data->addr; - irq_work_queue(&event->pending_irq); } =20 READ_ONCE(event->overflow_handler)(event, data, regs); --=20 2.45.2 From nobody Wed Dec 17 15:55:38 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 89B5D19B5A5; Mon, 24 Jun 2024 15:27:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719242861; cv=none; b=dKNoO0fNg6BqtgZDN+80OsmNupPbIvYh/Dzig4tgN5wynM0DxpdsIuvUjRcxaFgo06jBpa8HxgWOI5jBUH9RR2K2tC6wkActHT/+BJVLuiq05xmin7w21x4Ysl7TCJnso5Vm+rCBBBaWLcFFhdkxVlas01055OG/oY3oHeLHIKs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719242861; c=relaxed/simple; bh=wfrZOAkXJz9f9Lfg42t9i9udnKgxHzF8ML/6Tpd7HJA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tTO2MUQaHt6kuekqAZrJlsrEcIFBX/jq/YuTYKbcm8OBuKE++AeOqJbiuFS1dGd1p/tF7hcHXrx7UYIU9Xsu/ZeLRxFgBvW/aDWOhxjB/j4HmDg/HO/ieUgd3rFxvyt44pHJR68kw5R4DvnxecR97UQTneQ0OOb42pSC5sEQaKw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=ipLTj0pf; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=j3IKRUKL; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="ipLTj0pf"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="j3IKRUKL" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1719242857; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lxZaiy3N5qW+TR11v/Lgl5tgIwAatTer0ZvBOD6wEk4=; b=ipLTj0pfs6Tgsf2A0QftwMz8aZxQ1P7tbKzrit+YCNUZEZnjthMgbvonRy7nMNr4SBOgFh FRvhNRkSTZSmJAP3p92gqylbmxVQQ21fv5RjdV+uEpvzMR9LtGiAZV7N+0vnB1ZHqTowXM +lEd2qI/BJSki3xgpnNxnx7/7qFXygYysqnjoh1F/j1MhFkGnRDDLU7iNMJIvSOkYnTBas 2oz18E1TOwL3xyOEViFKEat57+eBLpd6kbeLO8dQhPDel1g6HSYDIDoodrKXd7Jr7XbbmG dwuFOw8GAj58rXUY+mnTiMFvUzJRxFVmHM7e/xireMi/pTQxa63gg6Hyp7vKkg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1719242857; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lxZaiy3N5qW+TR11v/Lgl5tgIwAatTer0ZvBOD6wEk4=; b=j3IKRUKLgfoVmBj3IodqxBCgMnHcdoqA9pWr9PjFQuomqiQfc/lX0s55ASwZp/42Y4XhwG 7rDarw9OjQUUlTBg== To: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Daniel Bristot de Oliveira , Frederic Weisbecker , Ian Rogers , Ingo Molnar , Jiri Olsa , Kan Liang , Marco Elver , Mark Rutland , Namhyung Kim , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior , Arnaldo Carvalho de Melo Subject: [PATCH v4 2/6] perf: Enqueue SIGTRAP always via task_work. Date: Mon, 24 Jun 2024 17:15:15 +0200 Message-ID: <20240624152732.1231678-3-bigeasy@linutronix.de> In-Reply-To: <20240624152732.1231678-1-bigeasy@linutronix.de> References: <20240624152732.1231678-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A signal is delivered by raising irq_work() which works from any context including NMI. irq_work() can be delayed if the architecture does not provide an interrupt vector. In order not to lose a signal, the signal is injected via task_work during event_sched_out(). Instead going via irq_work, the signal could be added directly via task_work. The signal is sent to current and can be enqueued on its return path to userland instead of triggering irq_work. A dummy IRQ is required in the NMI case to ensure the task_work is handled before returning to user land. For this irq_work is used. An alternative would be just raising an interrupt like arch_send_call_function_single_ipi(). During testing with `remove_on_exec' it become visible that the event can be enqueued via NMI during execve(). The task_work must not be kept because free_event() will complain later. Also the new task will not have a sighandler installed. Queue signal via task_work. Remove perf_event::pending_sigtrap and and use perf_event::pending_work instead. Raise irq_work in the NMI case for a dummy interrupt. Remove the task_work if the event is freed. Tested-by: Marco Elver Tested-by: Arnaldo Carvalho de Melo Reported-by: Arnaldo Carvalho de Melo Signed-off-by: Sebastian Andrzej Siewior --- include/linux/perf_event.h | 3 +-- kernel/events/core.c | 36 +++++++++++++++--------------------- 2 files changed, 16 insertions(+), 23 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 393fb13733b02..ea0d82418d854 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -781,7 +781,6 @@ struct perf_event { unsigned int pending_wakeup; unsigned int pending_kill; unsigned int pending_disable; - unsigned int pending_sigtrap; unsigned long pending_addr; /* SIGTRAP */ struct irq_work pending_irq; struct callback_head pending_task; @@ -963,7 +962,7 @@ struct perf_event_context { struct rcu_head rcu_head; =20 /* - * Sum (event->pending_sigtrap + event->pending_work) + * Sum (event->pending_work + event->pending_work) * * The SIGTRAP is targeted at ctx->task, as such it won't do changing * that until the signal is delivered. diff --git a/kernel/events/core.c b/kernel/events/core.c index 647abeeaeeb02..6256a9593c3da 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2283,17 +2283,6 @@ event_sched_out(struct perf_event *event, struct per= f_event_context *ctx) state =3D PERF_EVENT_STATE_OFF; } =20 - if (event->pending_sigtrap) { - event->pending_sigtrap =3D 0; - if (state !=3D PERF_EVENT_STATE_OFF && - !event->pending_work && - !task_work_add(current, &event->pending_task, TWA_RESUME)) { - event->pending_work =3D 1; - } else { - local_dec(&event->ctx->nr_pending); - } - } - perf_event_set_state(event, state); =20 if (!is_software_event(event)) @@ -6787,11 +6776,6 @@ static void __perf_pending_irq(struct perf_event *ev= ent) * Yay, we hit home and are in the context of the event. */ if (cpu =3D=3D smp_processor_id()) { - if (event->pending_sigtrap) { - event->pending_sigtrap =3D 0; - perf_sigtrap(event); - local_dec(&event->ctx->nr_pending); - } if (event->pending_disable) { event->pending_disable =3D 0; perf_event_disable_local(event); @@ -9735,18 +9719,28 @@ static int __perf_event_overflow(struct perf_event = *event, =20 if (regs) pending_id =3D hash32_ptr((void *)instruction_pointer(regs)) ?: 1; - if (!event->pending_sigtrap) { - event->pending_sigtrap =3D pending_id; + + if (!event->pending_work && + !task_work_add(current, &event->pending_task, TWA_RESUME)) { + event->pending_work =3D pending_id; local_inc(&event->ctx->nr_pending); =20 event->pending_addr =3D 0; if (valid_sample && (data->sample_flags & PERF_SAMPLE_ADDR)) event->pending_addr =3D data->addr; - irq_work_queue(&event->pending_irq); + /* + * The NMI path returns directly to userland. The + * irq_work is raised as a dummy interrupt to ensure + * regular return path to user is taken and task_work + * is processed. + */ + if (in_nmi()) + irq_work_queue(&event->pending_irq); + } else if (event->attr.exclude_kernel && valid_sample) { /* * Should not be able to return to user space without - * consuming pending_sigtrap; with exceptions: + * consuming pending_work; with exceptions: * * 1. Where !exclude_kernel, events can overflow again * in the kernel without returning to user space. @@ -9756,7 +9750,7 @@ static int __perf_event_overflow(struct perf_event *e= vent, * To approximate progress (with false negatives), * check 32-bit hash of the current IP. */ - WARN_ON_ONCE(event->pending_sigtrap !=3D pending_id); + WARN_ON_ONCE(event->pending_work !=3D pending_id); } } =20 --=20 2.45.2 From nobody Wed Dec 17 15:55:38 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFECE19B5A7; Mon, 24 Jun 2024 15:27:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719242861; cv=none; b=aFAdgdtVAZL4eZpExKGmhT0epUVOLKBzq5S4l8+E9Vj8uw9sVvCti1dL2A549O3tqjpUs+hsbmN4ZcdaGx4+/wTeKJ3vbCiuDuj15KGqpxwjr1bDUzvJmTs/1UAdIks3BpaSOlbxyfr65SqXrKdclZr7BAHQD3bXknjdXC51U6U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719242861; c=relaxed/simple; bh=3JEi2mm9pBJD+RvTJk9tdhJuESSkGlA9UW7AIj5AYEw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TPouE2VoRIpuvRj67vWf+MyGQPoLuNRtXR+sgJqu6mn3PD9diStCOEGnAhEqVyhO4PbmNfYX/5+XJlF7ry3zhMWS7hDVgX7TCndE8RDvNZ4ZaMQRy84IWr6fdWq2bYtLnfHRdlChkBsazyVE/nIs4Gpq+l+C4s6QYabQj5myYtY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=afqkAZcT; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=tzrEguQF; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="afqkAZcT"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="tzrEguQF" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1719242858; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bi83GkckFWEiz6/C9dwTxNe7sCPp8ns2flKgLfGxAk8=; b=afqkAZcTx6EFALmeS6d2tGXgkFryA/WJFi3Sa3LZludr/pd9tiSbzUbl390EdeNWr2XMcS VSyQypsRFaycQ2/XuGRjt7+9Jcm4vfKLRRJtuF9QZHaPdBqBzjDjdhSQ9HRwGJj8/b9jjj D2MZBdTBWgNd/oAjNpIB0VYkGwqLGMShA0R+Ws+nNf0e53bkyVZrGaLPjJgfb3H0CMzuU3 Bv8jD3NREr6YMtYmv40KIKH4wjyMugoQ1h5ajeu22GML6TMiy91eQCCOyNV9adYSh1oSJw ymrrVkyv8/RbAH1Rzh7cKqSWi8ihp84Ts5XQcqdO2fDW4xLhe4atWbqXABut8w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1719242858; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bi83GkckFWEiz6/C9dwTxNe7sCPp8ns2flKgLfGxAk8=; b=tzrEguQFAZ2jrCuKLsYv9ZE1dkVSa+bIkCS0alM6TIr6IujyWgpjBdmUC24XF+pRj+6WA8 nL3niHoPU1KMVACg== To: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Daniel Bristot de Oliveira , Frederic Weisbecker , Ian Rogers , Ingo Molnar , Jiri Olsa , Kan Liang , Marco Elver , Mark Rutland , Namhyung Kim , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH v4 3/6] perf: Shrink the size of the recursion counter. Date: Mon, 24 Jun 2024 17:15:16 +0200 Message-ID: <20240624152732.1231678-4-bigeasy@linutronix.de> In-Reply-To: <20240624152732.1231678-1-bigeasy@linutronix.de> References: <20240624152732.1231678-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There are four recursion counter, one for each context. The type of the counter is `int' but the counter is used as `bool' since it is only incremented if zero. Reduce the type of the recursion counter to an unsigned char, keep the increment/ decrement operation. Signed-off-by: Sebastian Andrzej Siewior Tested-by: Marco Elver --- kernel/events/callchain.c | 2 +- kernel/events/core.c | 2 +- kernel/events/internal.h | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 1273be84392cf..ad57944b6c40e 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -29,7 +29,7 @@ static inline size_t perf_callchain_entry__sizeof(void) sysctl_perf_event_max_contexts_per_stack)); } =20 -static DEFINE_PER_CPU(int, callchain_recursion[PERF_NR_CONTEXTS]); +static DEFINE_PER_CPU(u8, callchain_recursion[PERF_NR_CONTEXTS]); static atomic_t nr_callchain_events; static DEFINE_MUTEX(callchain_mutex); static struct callchain_cpus_entries *callchain_cpus_entries; diff --git a/kernel/events/core.c b/kernel/events/core.c index 6256a9593c3da..f48ce05907042 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9781,7 +9781,7 @@ struct swevent_htable { int hlist_refcount; =20 /* Recursion avoidance in each contexts */ - int recursion[PERF_NR_CONTEXTS]; + u8 recursion[PERF_NR_CONTEXTS]; }; =20 static DEFINE_PER_CPU(struct swevent_htable, swevent_htable); diff --git a/kernel/events/internal.h b/kernel/events/internal.h index 5150d5f84c033..f9a3244206b20 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -208,7 +208,7 @@ arch_perf_out_copy_user(void *dst, const void *src, uns= igned long n) =20 DEFINE_OUTPUT_COPY(__output_copy_user, arch_perf_out_copy_user) =20 -static inline int get_recursion_context(int *recursion) +static inline int get_recursion_context(u8 *recursion) { unsigned char rctx =3D interrupt_context_level(); =20 @@ -221,7 +221,7 @@ static inline int get_recursion_context(int *recursion) return rctx; } =20 -static inline void put_recursion_context(int *recursion, int rctx) +static inline void put_recursion_context(u8 *recursion, int rctx) { barrier(); recursion[rctx]--; --=20 2.45.2 From nobody Wed Dec 17 15:55:38 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3253419B5AA; Mon, 24 Jun 2024 15:27:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719242861; cv=none; b=RCpSFL9iFWMyigi9SDbbjbrdbkYNQzCupukNsCHLCvkl9dLTTumhdAVQFwf4W4L3jPt0kQRCGAR8ezLRbxtd5pLUc7gtMf9uhh3rB4b82lDhK6lb4dX8UjqD2L+WTAnh2cMdvKSncQQCWIUGNOoveHTlrV2OUJhO4RAQkHggy50= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719242861; c=relaxed/simple; bh=W8z1M30YTNFFf1unU9Yf1sBCRXmAExyUCe/qYLJPfm0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bE1WuhkJJYPF+7F1wLgasUkBfPr9V+Mken7TBDDwCd85vtYp0ItIiJz3OJVX673TsUqKa0uQitEexKxJq43lu2b0KuN1c/HM7MTPnpBL/ePkGgfr0v4FDG5agKR6E0FS5glcdwrVIEi3+DRCqSKv0BEus8Hx+LdmSgKEAVZs+P8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=q1YoTxMD; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=qB8oFL22; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="q1YoTxMD"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="qB8oFL22" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1719242858; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dl6TCnLAxo924LJNzlZpsS2eoCMgyhzA3J0zNzcmQnI=; b=q1YoTxMDqWv7BxRziKVe9+hFV2CvOjNiKEG4PAW6ylNudnymdcqTl2KfRe5Bfhj9XZPjAu 53A5Dc2v+/dVVCioPmmHriIeIpg8z3/DSQO6VbzlRhhuTNI0YT2xFfg3ovrG4WwWdF1mnw LniFylIJMdY05oG1fBHxMUQKqhpOrsAytp1qpg9lji6T4I7VqBV8emhIHOczNf5bzT+tuG Z+YtUj4/eJ3p8VXbB20o++5FIM4T5ZFDrISR7EoWk0bMATllcyCqlrmRAWw1FHll2z8m6J F07HaXKmXP4u5hDOOy+Bp1mZmv7jCdt6mAzEVc++eRCIV97tE74QUO/66UzROg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1719242858; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dl6TCnLAxo924LJNzlZpsS2eoCMgyhzA3J0zNzcmQnI=; b=qB8oFL221LfRSsPlrxob8jJF4cuP7g0nNQw85RApItPhJy495NeFiY+WQa0dP04mQewUIq Ap5HQGBMLK11x/Cw== To: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Daniel Bristot de Oliveira , Frederic Weisbecker , Ian Rogers , Ingo Molnar , Jiri Olsa , Kan Liang , Marco Elver , Mark Rutland , Namhyung Kim , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH v4 4/6] perf: Move swevent_htable::recursion into task_struct. Date: Mon, 24 Jun 2024 17:15:17 +0200 Message-ID: <20240624152732.1231678-5-bigeasy@linutronix.de> In-Reply-To: <20240624152732.1231678-1-bigeasy@linutronix.de> References: <20240624152732.1231678-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The swevent_htable::recursion counter is used to avoid creating an swevent while an event is processed to avoid recursion. The counter is per-CPU and preemption must be disabled to have a stable counter. perf_pending_task() disables preemption to access the counter and then signal. This is problematic on PREEMPT_RT because sending a signal uses a spinlock_t which must not be acquired in atomic on PREEMPT_RT because it becomes a sleeping lock. The atomic context can be avoided by moving the counter into the task_struct. There is a 4 byte hole between futex_state (usually always on) and the following perf pointer (perf_event_ctxp). After the recursion lost some weight it fits perfectly. Move swevent_htable::recursion into task_struct. Signed-off-by: Sebastian Andrzej Siewior Tested-by: Marco Elver --- include/linux/perf_event.h | 6 ------ include/linux/sched.h | 7 +++++++ kernel/events/core.c | 13 +++---------- kernel/events/internal.h | 2 +- 4 files changed, 11 insertions(+), 17 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index ea0d82418d854..99a7ea1d29ed5 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -970,12 +970,6 @@ struct perf_event_context { local_t nr_pending; }; =20 -/* - * Number of contexts where an event can trigger: - * task, softirq, hardirq, nmi. - */ -#define PERF_NR_CONTEXTS 4 - struct perf_cpu_pmu_context { struct perf_event_pmu_context epc; struct perf_event_pmu_context *task_epc; diff --git a/include/linux/sched.h b/include/linux/sched.h index 61591ac6eab6d..afb1087f5831b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -734,6 +734,12 @@ enum perf_event_task_context { perf_nr_task_contexts, }; =20 +/* + * Number of contexts where an event can trigger: + * task, softirq, hardirq, nmi. + */ +#define PERF_NR_CONTEXTS 4 + struct wake_q_node { struct wake_q_node *next; }; @@ -1256,6 +1262,7 @@ struct task_struct { unsigned int futex_state; #endif #ifdef CONFIG_PERF_EVENTS + u8 perf_recursion[PERF_NR_CONTEXTS]; struct perf_event_context *perf_event_ctxp; struct mutex perf_event_mutex; struct list_head perf_event_list; diff --git a/kernel/events/core.c b/kernel/events/core.c index f48ce05907042..fc9a78e1fb4aa 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9779,11 +9779,7 @@ struct swevent_htable { struct swevent_hlist *swevent_hlist; struct mutex hlist_mutex; int hlist_refcount; - - /* Recursion avoidance in each contexts */ - u8 recursion[PERF_NR_CONTEXTS]; }; - static DEFINE_PER_CPU(struct swevent_htable, swevent_htable); =20 /* @@ -9981,17 +9977,13 @@ DEFINE_PER_CPU(struct pt_regs, __perf_regs[4]); =20 int perf_swevent_get_recursion_context(void) { - struct swevent_htable *swhash =3D this_cpu_ptr(&swevent_htable); - - return get_recursion_context(swhash->recursion); + return get_recursion_context(current->perf_recursion); } EXPORT_SYMBOL_GPL(perf_swevent_get_recursion_context); =20 void perf_swevent_put_recursion_context(int rctx) { - struct swevent_htable *swhash =3D this_cpu_ptr(&swevent_htable); - - put_recursion_context(swhash->recursion, rctx); + put_recursion_context(current->perf_recursion, rctx); } =20 void ___perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr) @@ -13658,6 +13650,7 @@ int perf_event_init_task(struct task_struct *child,= u64 clone_flags) { int ret; =20 + memset(child->perf_recursion, 0, sizeof(child->perf_recursion)); child->perf_event_ctxp =3D NULL; mutex_init(&child->perf_event_mutex); INIT_LIST_HEAD(&child->perf_event_list); diff --git a/kernel/events/internal.h b/kernel/events/internal.h index f9a3244206b20..f0daaa6f2a33b 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -221,7 +221,7 @@ static inline int get_recursion_context(u8 *recursion) return rctx; } =20 -static inline void put_recursion_context(u8 *recursion, int rctx) +static inline void put_recursion_context(u8 *recursion, unsigned char rctx) { barrier(); recursion[rctx]--; --=20 2.45.2 From nobody Wed Dec 17 15:55:38 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBE9019D8A6; Mon, 24 Jun 2024 15:27:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719242863; cv=none; b=VF8a1bZOx38T/x4wElYgYdu9FvL+BSJ0cUs8UWdNpmKQMQJo/jCkwrkH0cyCTd3I7/hZy/eSCVw5Az6osZK/LTtx1KTHPbk4SYtz2wYfCJS4M4wpebvyhxyMvXwAcnUo52Z1UtWnd/okblZtslPI+umKn1mp3ESVqZzW0X4xwec= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719242863; c=relaxed/simple; bh=bvo/lpGRvbF3WWnb+rVZuJyzxfj90xjeHxN0eiTEoms=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qv/gGBM5v9aqhoELvVJaPKTAs+zqTI2kAPAflv6WGIXGte2va4DTcqr3U/ftXMeqUKsFZpxaP5xWa5eIL1wQ70D6LMKjjQLhXnOeosNMGsFgfJQbRMIFhDi3iCyliMfkdjrmtex0HioJF0cpFd/bK1pDSgXCfyKIHmJdrzXOlyo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=e0iBgufD; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=JsaLTd0r; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="e0iBgufD"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="JsaLTd0r" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1719242858; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oHgQQtbqN3zK19tb9I6f8KrfnGMC54dQwVOa6Td0Tyo=; b=e0iBgufDeySo0z0BWC7TPcjQjDYiPR0SZOa/f2+Hdg3sIkhWpgRMDfM/HTWik037Tv59c/ W81QJNmOaMVNqYuJiRNt++rTyCQcOgU7KVplvZEoqzO6q7SN0YgVBnlVOU5DXdlsRcOoki g+II/pWpF2sltgoWB6iXQbMdz1Wa35aX0WAC0LVVoJctEaZuvo6jmvEhjZe4vWfU8CtGVl CogY+UWcOu0J6R2f0jn7tXz8QHOcGx94aI70eEFbG6M+jZrcgOMGqsJL03EcBpkoDipDkE +iO+pUuT3OKoeaEmND6XpQbEY5oEvgTrHvAB//o79tLAeu0eoAwR3rjnerpTxQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1719242858; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oHgQQtbqN3zK19tb9I6f8KrfnGMC54dQwVOa6Td0Tyo=; b=JsaLTd0r1Z0+Dy8LfMOIamfn/0Y0jRpz+RJeKfkpBIOkDZCyyEyFx+eKK4t++BzOnnz5T0 WOCSnOYY8IfO0CBg== To: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Daniel Bristot de Oliveira , Frederic Weisbecker , Ian Rogers , Ingo Molnar , Jiri Olsa , Kan Liang , Marco Elver , Mark Rutland , Namhyung Kim , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH v4 5/6] perf: Don't disable preemption in perf_pending_task(). Date: Mon, 24 Jun 2024 17:15:18 +0200 Message-ID: <20240624152732.1231678-6-bigeasy@linutronix.de> In-Reply-To: <20240624152732.1231678-1-bigeasy@linutronix.de> References: <20240624152732.1231678-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" perf_pending_task() is invoked in task context and disables preemption because perf_swevent_get_recursion_context() used to access per-CPU variables. The other reason is to create a RCU read section while accessing the perf_event. The recursion counter is no longer a per-CPU accounter so disabling preemption is no longer required. The RCU section is needed and must be created explicit. Replace the preemption-disable section with a explicit RCU-read section. Signed-off-by: Sebastian Andrzej Siewior Tested-by: Marco Elver --- kernel/events/core.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index fc9a78e1fb4aa..f75aa9f14c979 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5208,10 +5208,9 @@ static void perf_pending_task_sync(struct perf_event= *event) } =20 /* - * All accesses related to the event are within the same - * non-preemptible section in perf_pending_task(). The RCU - * grace period before the event is freed will make sure all - * those accesses are complete by then. + * All accesses related to the event are within the same RCU section in + * perf_pending_task(). The RCU grace period before the event is freed + * will make sure all those accesses are complete by then. */ rcuwait_wait_event(&event->pending_work_wait, !event->pending_work, TASK_= UNINTERRUPTIBLE); } @@ -6842,7 +6841,7 @@ static void perf_pending_task(struct callback_head *h= ead) * critical section as the ->pending_work reset. See comment in * perf_pending_task_sync(). */ - preempt_disable_notrace(); + rcu_read_lock(); /* * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. @@ -6855,10 +6854,10 @@ static void perf_pending_task(struct callback_head = *head) local_dec(&event->ctx->nr_pending); rcuwait_wake_up(&event->pending_work_wait); } + rcu_read_unlock(); =20 if (rctx >=3D 0) perf_swevent_put_recursion_context(rctx); - preempt_enable_notrace(); } =20 #ifdef CONFIG_GUEST_PERF_EVENTS --=20 2.45.2 From nobody Wed Dec 17 15:55:38 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFE4E19D8AA; Mon, 24 Jun 2024 15:27:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719242864; cv=none; b=krTjlqwclpYTVTRCvkCbIJbO0jMK4uWJiT6EW5NVwDonxLmuF1lsadoyY30pMd+5mc+cv5XyIrn+Mo86xZ9Varlq2zn6Ka2RJCGD709T6PTctEYcDNARx6tkH973atsdA6vE4EhTFl6zdaoLmilC6QOIOZdE6e/dmFCXhHu9nHs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719242864; c=relaxed/simple; bh=OVQCYTW4L9yu0EDueyCBAY4a89Jd+rzlyMa52xLOMps=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kfLDgESflsSCW782OsRVYuYOm9xcdbLAWlNRluolMKATCQnGCe4QNBtcvZt5N9SfEKXvMbu44OtC4/899QbTgA1e/L5FQRzFpqrO3KT3UI8MC3LH7lugvtOfGwEqzBp9hENh+ovz+7eAn6q9EPi3sFwv0h42qhQvBa3pXFOKPZs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=1VW3jWei; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=aEhsb9Kv; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="1VW3jWei"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="aEhsb9Kv" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1719242859; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iLt0LLfVcMa9a+gilIPNXUQswi86xUZORdJUsd8+HOo=; b=1VW3jWeiz7NTiwDQyDjfCBLU79UznLlW3fIYX2vtm/5EPNKLmO0vNAP303ZZjCc7d00Whl t5rFSOtUwFHSQ3VBewZMohJaeqH46MF5kTz6jriyA0KzYhRSdyNLS2JmOoxnUDACoV1MQZ jVcaE7bj3DHkq5BTMKg3HI7HI5XtMNPdqEv3cT6opKux/bh9VjcgTgsfHZjHsYQ5XopVrT /HJN+rd3WwsAXJDJB3qG3h+Pv/q2A0UUzd4g3dUVtxJBKuIzQl3c1mKluSIulqE8xKaoBd 7pb0X8a0TPUOvS2+fa4B6vqS9esmI4Vpit1Er+V8hab/UeUlMOl+2BtFzaO+cA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1719242859; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iLt0LLfVcMa9a+gilIPNXUQswi86xUZORdJUsd8+HOo=; b=aEhsb9Kv+WZm8yWFi3PZ0NpKtcEJYhzfgkt2ZKfFfYinb3Y7eltunTYTGvDt6THjXA2/+2 elJFV18Ha+WgPDBg== To: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Daniel Bristot de Oliveira , Frederic Weisbecker , Ian Rogers , Ingo Molnar , Jiri Olsa , Kan Liang , Marco Elver , Mark Rutland , Namhyung Kim , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior , Arnaldo Carvalho de Melo Subject: [PATCH v4 6/6] perf: Split __perf_pending_irq() out of perf_pending_irq() Date: Mon, 24 Jun 2024 17:15:19 +0200 Message-ID: <20240624152732.1231678-7-bigeasy@linutronix.de> In-Reply-To: <20240624152732.1231678-1-bigeasy@linutronix.de> References: <20240624152732.1231678-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" perf_pending_irq() invokes perf_event_wakeup() and __perf_pending_irq(). The former is in charge of waking any tasks which wait to be woken up while the latter disables perf-events. The irq_work perf_pending_irq(), while this an irq_work, the callback is invoked in thread context on PREEMPT_RT. This is needed because all the waking functions (wake_up_all(), kill_fasync()) acquire sleep locks which must not be used with disabled interrupts. Disabling events, as done by __perf_pending_irq(), expects a hardirq context and disabled interrupts. This requirement is not fulfilled on PREEMPT_RT. Split functionality based on perf_event::pending_disable into irq_work named `pending_disable_irq' and invoke it in hardirq context on PREEMPT_RT. Rename the split out callback to perf_pending_disable(). Tested-by: Marco Elver Tested-by: Arnaldo Carvalho de Melo Reported-by: Arnaldo Carvalho de Melo Signed-off-by: Sebastian Andrzej Siewior --- include/linux/perf_event.h | 1 + kernel/events/core.c | 31 +++++++++++++++++++++++-------- 2 files changed, 24 insertions(+), 8 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 99a7ea1d29ed5..65ece0d5b4b6d 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -783,6 +783,7 @@ struct perf_event { unsigned int pending_disable; unsigned long pending_addr; /* SIGTRAP */ struct irq_work pending_irq; + struct irq_work pending_disable_irq; struct callback_head pending_task; unsigned int pending_work; struct rcuwait pending_work_wait; diff --git a/kernel/events/core.c b/kernel/events/core.c index f75aa9f14c979..8bba63ea9c686 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2451,7 +2451,7 @@ static void __perf_event_disable(struct perf_event *e= vent, * hold the top-level event's child_mutex, so any descendant that * goes to exit will block in perf_event_exit_event(). * - * When called from perf_pending_irq it's OK because event->ctx + * When called from perf_pending_disable it's OK because event->ctx * is the current context on this CPU and preemption is disabled, * hence we can't get into perf_event_task_sched_out for this context. */ @@ -2491,7 +2491,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable); void perf_event_disable_inatomic(struct perf_event *event) { event->pending_disable =3D 1; - irq_work_queue(&event->pending_irq); + irq_work_queue(&event->pending_disable_irq); } =20 #define MAX_INTERRUPTS (~0ULL) @@ -5218,6 +5218,7 @@ static void perf_pending_task_sync(struct perf_event = *event) static void _free_event(struct perf_event *event) { irq_work_sync(&event->pending_irq); + irq_work_sync(&event->pending_disable_irq); perf_pending_task_sync(event); =20 unaccount_event(event); @@ -6760,7 +6761,7 @@ static void perf_sigtrap(struct perf_event *event) /* * Deliver the pending work in-event-context or follow the context. */ -static void __perf_pending_irq(struct perf_event *event) +static void __perf_pending_disable(struct perf_event *event) { int cpu =3D READ_ONCE(event->oncpu); =20 @@ -6798,11 +6799,26 @@ static void __perf_pending_irq(struct perf_event *e= vent) * irq_work_queue(); // FAILS * * irq_work_run() - * perf_pending_irq() + * perf_pending_disable() * * But the event runs on CPU-B and wants disabling there. */ - irq_work_queue_on(&event->pending_irq, cpu); + irq_work_queue_on(&event->pending_disable_irq, cpu); +} + +static void perf_pending_disable(struct irq_work *entry) +{ + struct perf_event *event =3D container_of(entry, struct perf_event, pendi= ng_disable_irq); + int rctx; + + /* + * If we 'fail' here, that's OK, it means recursion is already disabled + * and we won't recurse 'further'. + */ + rctx =3D perf_swevent_get_recursion_context(); + __perf_pending_disable(event); + if (rctx >=3D 0) + perf_swevent_put_recursion_context(rctx); } =20 static void perf_pending_irq(struct irq_work *entry) @@ -6825,8 +6841,6 @@ static void perf_pending_irq(struct irq_work *entry) perf_event_wakeup(event); } =20 - __perf_pending_irq(event); - if (rctx >=3D 0) perf_swevent_put_recursion_context(rctx); } @@ -9734,7 +9748,7 @@ static int __perf_event_overflow(struct perf_event *e= vent, * is processed. */ if (in_nmi()) - irq_work_queue(&event->pending_irq); + irq_work_queue(&event->pending_disable_irq); =20 } else if (event->attr.exclude_kernel && valid_sample) { /* @@ -11972,6 +11986,7 @@ perf_event_alloc(struct perf_event_attr *attr, int = cpu, =20 init_waitqueue_head(&event->waitq); init_irq_work(&event->pending_irq, perf_pending_irq); + event->pending_disable_irq =3D IRQ_WORK_INIT_HARD(perf_pending_disable); init_task_work(&event->pending_task, perf_pending_task); rcuwait_init(&event->pending_work_wait); =20 --=20 2.45.2