From nobody Wed Dec 17 15:56:43 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B03021B0139; Thu, 4 Jul 2024 17:04:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112672; cv=none; b=qIR1BsfbU6jz44EkTlpGfq9SJ07vrwrzezJdZg+P5yuIpHk0oMrkzabShMolWt14juQG1utv1QVRQZaDqrpPZaFt03Hxh0bHKHzZ3sHz/X3LEDxNgLfvFeY7u7s1Z8pvC/v8SaJ1l/Nyr7S0PIICOIi3noRNBIDI0XafABw3aHU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112672; c=relaxed/simple; bh=WXw3Qwht7czrmsx9ahtRrb+RTNTA9mu6D57a61bCLb0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=B3vNxD0SJ90fRCmy2saHJGpRBgVJsPFN7FRq1vYGqPldTvcbr5ipyuN/wRyQrHn5tDNvqS/rnaxKP5aaMwEyRZr/F3rqf1oEconA5vnfZprmOrHpzpmm0YO5HGLJbniNkUyzCBooxX0NdxrmJuw+eRCCOlgoMBAYAFsm8ZMpq5g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=XvyoeU2N; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=I3RXEBTD; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="XvyoeU2N"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="I3RXEBTD" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1720112669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NtdLMQUS4gOabICyiywJKeAVbmQAXHIv0kkFMgjqTTg=; b=XvyoeU2NSNXT3ntLU9B6SfomGMoTWODkAIn72dXP6Vm4wGUQEG3u6dhgj4SffYbPQu2Nkj xZs0LaxjMLuUdGF4sZFcrMRRghBShQHbs/3CkSaIAaBDJ/O9JVUdeAVdEJZG9gE9a+6xt9 5PPa/2908qE+cGEd84I/Z/4+BG1WyG6SStFHM3EAJTniuIT+cVfUYClu6zHSpjNR4ZvuFl GHDjqDRZXkcq7/3is93J399pKhaKEiaBJU+jyiHUSXxGR026ig5VgTGkgnOkGEfoJXzyKq DpygTEOsGnHEkUWy4/VKf8Ra1SlUSdqSWjGdBOOj3vQgxwq5+vl7fS/ODZwwnA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1720112669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NtdLMQUS4gOabICyiywJKeAVbmQAXHIv0kkFMgjqTTg=; b=I3RXEBTDuMkqbwi0Fmuhc30oSdyx1JHC8rIa3q6kplhgGvoPWg9SPSrLDcj7zrUZKQnlds UBWGqHoM7YX/UjCg== To: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Frederic Weisbecker , Ian Rogers , Ingo Molnar , Jiri Olsa , Kan Liang , Marco Elver , Mark Rutland , Namhyung Kim , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior , Arnaldo Carvalho de Melo Subject: [PATCH v5 1/7] perf: Move irq_work_queue() where the event is prepared. Date: Thu, 4 Jul 2024 19:03:35 +0200 Message-ID: <20240704170424.1466941-2-bigeasy@linutronix.de> In-Reply-To: <20240704170424.1466941-1-bigeasy@linutronix.de> References: <20240704170424.1466941-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Only if perf_event::pending_sigtrap is zero, the irq_work accounted by increminging perf_event::nr_pending. The member perf_event::pending_addr might be overwritten by a subsequent event if the signal was not yet delivered and is expected. The irq_work will not be enqeueued again because it has a check to be only enqueued once. Move irq_work_queue() to where the counter is incremented and perf_event::pending_sigtrap is set to make it more obvious that the irq_work is scheduled once. Tested-by: Marco Elver Tested-by: Arnaldo Carvalho de Melo Signed-off-by: Sebastian Andrzej Siewior --- kernel/events/core.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 586d4f3676240..647abeeaeeb02 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9738,6 +9738,11 @@ static int __perf_event_overflow(struct perf_event *= event, if (!event->pending_sigtrap) { event->pending_sigtrap =3D pending_id; local_inc(&event->ctx->nr_pending); + + event->pending_addr =3D 0; + if (valid_sample && (data->sample_flags & PERF_SAMPLE_ADDR)) + event->pending_addr =3D data->addr; + irq_work_queue(&event->pending_irq); } else if (event->attr.exclude_kernel && valid_sample) { /* * Should not be able to return to user space without @@ -9753,11 +9758,6 @@ static int __perf_event_overflow(struct perf_event *= event, */ WARN_ON_ONCE(event->pending_sigtrap !=3D pending_id); } - - event->pending_addr =3D 0; - if (valid_sample && (data->sample_flags & PERF_SAMPLE_ADDR)) - event->pending_addr =3D data->addr; - irq_work_queue(&event->pending_irq); } =20 READ_ONCE(event->overflow_handler)(event, data, regs); --=20 2.45.2 From nobody Wed Dec 17 15:56:43 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3C811B151F; Thu, 4 Jul 2024 17:04:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112673; cv=none; b=IXClgXoB3zpC0fEkVcsoH6vD4HJngPsNCC0vKf0l2tZnUakYUH4oz1LHPTZit7UDF5NzymHM0LnkEll0K7iKx2DyJ+IY84+FMAzob0L5mQkjkPcbbW1zHKlQoCq7j3JHFyIerLeNyUTXf+yFDz20xHT26ooK2iZh57tr6ayVfn4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112673; c=relaxed/simple; bh=6eqRPcMlbj+4GVbr5+U0VOyaTZIgPiL/0X2LQRUJRks=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZZI2bpFumGO+skjt2KKNazJ+gC2CFajoNCKilfraLGSUee5+AnURRHNhFqY2nGVLDbrApCtB1SgCFj0Qu5d8DuHm/EdJVr+Q5jgoW8t4DGMXs+ARNdWSHIFhXve0tAPWSA315aCXxIq0LxJwSBX1VhqeLqwlj7GqieZTll++pzk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Bywyui1x; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=HKQ3CKiW; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Bywyui1x"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="HKQ3CKiW" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1720112669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=poySimGW+vu2aH2BbOceqnQrJPwa17Ey3AjyM7/+uMc=; b=Bywyui1xnrK/jO9L52kEa/1jMRp69HkTNh3x5+ZW1b15PNiwFB3HJI/xWYV6M1wvbHwbOA 1vsO4Ko83PhAEsObJYdzeTs8b1VNX/pc2OQdcQD7xu35a7g7icrCimdZPM0+Vub7+Oon0m 8DqoLHY8n3lBpdYsuCVJLW4x4jmjTwt7E8ZQ+1XMHlJ0PZhpvfhySJ81xSL7SNP/OYno8q JzmrLr+tM8d5V2rsmG/FsPlAULjZqf754QKrtZZYOSWq0CYzxDGoOrvwUAtQi9dmzcXodA Vl3wtCESbAOx2gLIKrpjan3RdOcvs6W6jLqNqWs9gfVRv/IwuNKfdbt6LcWXDg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1720112669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=poySimGW+vu2aH2BbOceqnQrJPwa17Ey3AjyM7/+uMc=; b=HKQ3CKiWv88vO5FChtzpBmgGQ4x5UyxmAo2NufTRW6ZBJsVuoniA+JCupAOPY7jsPkfoGN HV1iBy7Ak7fGKNBA== To: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Frederic Weisbecker , Ian Rogers , Ingo Molnar , Jiri Olsa , Kan Liang , Marco Elver , Mark Rutland , Namhyung Kim , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH v5 2/7] task_work: Add TWA_NMI_CURRENT as an additional notify mode. Date: Thu, 4 Jul 2024 19:03:36 +0200 Message-ID: <20240704170424.1466941-3-bigeasy@linutronix.de> In-Reply-To: <20240704170424.1466941-1-bigeasy@linutronix.de> References: <20240704170424.1466941-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Adding task_work from NMI context requires the following: - The kasan_record_aux_stack() is not NMU safe and must be avoided. - Using TWA_RESUME is NMI safe. If the NMI occurs while the CPU is in userland then it will continue in userland and not invoke the `work' callback. Add TWA_NMI_CURRENT as an additional notify mode. In this mode skip kasan and use irq_work in hardirq-mode to for needed interrupt. Set TIF_NOTIFY_RESUME within the irq_work callback due to k[ac]san instrumentation in test_and_set_bit() which does not look NMI safe in case of a report. Suggested-by: Peter Zijlstra Signed-off-by: Sebastian Andrzej Siewior --- include/linux/task_work.h | 1 + kernel/task_work.c | 25 ++++++++++++++++++++++--- 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/include/linux/task_work.h b/include/linux/task_work.h index 26b8a47f41fca..cf5e7e891a776 100644 --- a/include/linux/task_work.h +++ b/include/linux/task_work.h @@ -18,6 +18,7 @@ enum task_work_notify_mode { TWA_RESUME, TWA_SIGNAL, TWA_SIGNAL_NO_IPI, + TWA_NMI_CURRENT, }; =20 static inline bool task_work_pending(struct task_struct *task) diff --git a/kernel/task_work.c b/kernel/task_work.c index 2134ac8057a94..05fb41fe09f5d 100644 --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -1,10 +1,19 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include +#include =20 static struct callback_head work_exited; /* all we need is ->next =3D=3D N= ULL */ =20 +static void task_work_set_notify_irq(struct irq_work *entry) +{ + test_and_set_tsk_thread_flag(current, TIF_NOTIFY_RESUME); +} +static DEFINE_PER_CPU(struct irq_work, irq_work_NMI_resume) =3D + IRQ_WORK_INIT_HARD(task_work_set_notify_irq); + /** * task_work_add - ask the @task to execute @work->func() * @task: the task which should run the callback @@ -12,7 +21,7 @@ static struct callback_head work_exited; /* all we need i= s ->next =3D=3D NULL */ * @notify: how to notify the targeted task * * Queue @work for task_work_run() below and notify the @task if @notify - * is @TWA_RESUME, @TWA_SIGNAL, or @TWA_SIGNAL_NO_IPI. + * is @TWA_RESUME, @TWA_SIGNAL, @TWA_SIGNAL_NO_IPI or @TWA_NMI_CURRENT. * * @TWA_SIGNAL works like signals, in that the it will interrupt the targe= ted * task and run the task_work, regardless of whether the task is currently @@ -24,6 +33,8 @@ static struct callback_head work_exited; /* all we need i= s ->next =3D=3D NULL */ * kernel anyway. * @TWA_RESUME work is run only when the task exits the kernel and returns= to * user mode, or before entering guest mode. + * @TWA_NMI_CURRENT works like @TWA_RESUME, except it can only be used for= the + * current @task and if the current context is NMI. * * Fails if the @task is exiting/exited and thus it can't process this @wo= rk. * Otherwise @work->func() will be called when the @task goes through one = of @@ -44,8 +55,13 @@ int task_work_add(struct task_struct *task, struct callb= ack_head *work, { struct callback_head *head; =20 - /* record the work call stack in order to print it in KASAN reports */ - kasan_record_aux_stack(work); + if (notify =3D=3D TWA_NMI_CURRENT) { + if (WARN_ON_ONCE(task !=3D current)) + return -EINVAL; + } else { + /* record the work call stack in order to print it in KASAN reports */ + kasan_record_aux_stack(work); + } =20 head =3D READ_ONCE(task->task_works); do { @@ -66,6 +82,9 @@ int task_work_add(struct task_struct *task, struct callba= ck_head *work, case TWA_SIGNAL_NO_IPI: __set_notify_signal(task); break; + case TWA_NMI_CURRENT: + irq_work_queue(this_cpu_ptr(&irq_work_NMI_resume)); + break; default: WARN_ON_ONCE(1); break; --=20 2.45.2 From nobody Wed Dec 17 15:56:43 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D14531B1503; Thu, 4 Jul 2024 17:04:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112674; cv=none; b=GpEcMu8lweD8h4x1DbWDBI4hDqUDlLeuMoV3CGqS0J2BPJnobc1l/4MO+ZMzcfXGPWh5LoFFWnMwxLFheCVJ1fGnQ28HwB/rQuuEl1EBpSeDod+Jh4XemB5A7OmDuWOw/XE6O2qgfkIjRbjoyQDF1n1LOZPFgX8w3Bhsrowsns4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112674; c=relaxed/simple; bh=OA6APz55wxezTyURWBrPGfOxP5hAQYOf9p8p58/aQ8w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iVDEnlBA/zs9XjdKbe7Kfkorjnxyb5S4B1422ByHUFYWCTibM80Syo7PMcYPf/C2I2ba1M43waH7tZ2Mzw+kyIAysygPBiL/IU+de4CgtosmsUdc2EDfo+0oDetwRYKLPxZLC/Pv8fvULMlbrACDF8637WosfG1zHcCjX/PComI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=swxatZHP; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=RKcm5xQj; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="swxatZHP"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="RKcm5xQj" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1720112669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q9glYCDIwd40nFYbMHkkHdjoLHTy0aSJSEXjHjzWcsA=; b=swxatZHPBWAC+IG+qiXlFT1TWc/LWGNiZPC++0/wxCWGHerqVHyG2Keje3q59YVdK1Tqqd +k0syISI5uQ0h45yOiLHUuaEcpzYQOGX2QzxolVfQchRWa40UxuPgiDSMHxNTdSE7wqXo6 sxjO73dIjiViKivNoixrlTzMd3uNz7apPU66zLXKilddTH/iN4UAstOns6qQ6GjZCeE2RE tM7gs6JCddRIZ2z8daM88IqhldzgaQx+Ph/VBPtOGscNHeF7zftBRPsMcryUlYY6QGTNGC xF8BKfh+EBvcOkNzh+eYl8SQTFZtejribcesMRUxIqMiW3G6nw4fydgSSOdHsg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1720112669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q9glYCDIwd40nFYbMHkkHdjoLHTy0aSJSEXjHjzWcsA=; b=RKcm5xQjFNPnSeS8C/Z+uy3aOZJrsa4NN8fvWe3rHmxN8V54uEwgijxJJZkKSCZsLiByUo MrQoCspWbspl7fDA== To: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Frederic Weisbecker , Ian Rogers , Ingo Molnar , Jiri Olsa , Kan Liang , Marco Elver , Mark Rutland , Namhyung Kim , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior , Arnaldo Carvalho de Melo Subject: [PATCH v5 3/7] perf: Enqueue SIGTRAP always via task_work. Date: Thu, 4 Jul 2024 19:03:37 +0200 Message-ID: <20240704170424.1466941-4-bigeasy@linutronix.de> In-Reply-To: <20240704170424.1466941-1-bigeasy@linutronix.de> References: <20240704170424.1466941-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A signal is delivered by raising irq_work() which works from any context including NMI. irq_work() can be delayed if the architecture does not provide an interrupt vector. In order not to lose a signal, the signal is injected via task_work during event_sched_out(). Instead going via irq_work, the signal could be added directly via task_work. The signal is sent to current and can be enqueued on its return path to userland. Queue signal via task_work and consider possible NMI context. Remove perf_event::pending_sigtrap and and use perf_event::pending_work instead. Tested-by: Marco Elver Tested-by: Arnaldo Carvalho de Melo Reported-by: Arnaldo Carvalho de Melo Link: https://lore.kernel.org/all/ZMAtZ2t43GXoF6tM@kernel.org/ Signed-off-by: Sebastian Andrzej Siewior --- include/linux/perf_event.h | 3 +-- kernel/events/core.c | 31 ++++++++++--------------------- 2 files changed, 11 insertions(+), 23 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 393fb13733b02..ea0d82418d854 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -781,7 +781,6 @@ struct perf_event { unsigned int pending_wakeup; unsigned int pending_kill; unsigned int pending_disable; - unsigned int pending_sigtrap; unsigned long pending_addr; /* SIGTRAP */ struct irq_work pending_irq; struct callback_head pending_task; @@ -963,7 +962,7 @@ struct perf_event_context { struct rcu_head rcu_head; =20 /* - * Sum (event->pending_sigtrap + event->pending_work) + * Sum (event->pending_work + event->pending_work) * * The SIGTRAP is targeted at ctx->task, as such it won't do changing * that until the signal is delivered. diff --git a/kernel/events/core.c b/kernel/events/core.c index 647abeeaeeb02..c278aefa94e76 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2283,17 +2283,6 @@ event_sched_out(struct perf_event *event, struct per= f_event_context *ctx) state =3D PERF_EVENT_STATE_OFF; } =20 - if (event->pending_sigtrap) { - event->pending_sigtrap =3D 0; - if (state !=3D PERF_EVENT_STATE_OFF && - !event->pending_work && - !task_work_add(current, &event->pending_task, TWA_RESUME)) { - event->pending_work =3D 1; - } else { - local_dec(&event->ctx->nr_pending); - } - } - perf_event_set_state(event, state); =20 if (!is_software_event(event)) @@ -6787,11 +6776,6 @@ static void __perf_pending_irq(struct perf_event *ev= ent) * Yay, we hit home and are in the context of the event. */ if (cpu =3D=3D smp_processor_id()) { - if (event->pending_sigtrap) { - event->pending_sigtrap =3D 0; - perf_sigtrap(event); - local_dec(&event->ctx->nr_pending); - } if (event->pending_disable) { event->pending_disable =3D 0; perf_event_disable_local(event); @@ -9732,21 +9716,26 @@ static int __perf_event_overflow(struct perf_event = *event, */ bool valid_sample =3D sample_is_allowed(event, regs); unsigned int pending_id =3D 1; + enum task_work_notify_mode notify_mode; =20 if (regs) pending_id =3D hash32_ptr((void *)instruction_pointer(regs)) ?: 1; - if (!event->pending_sigtrap) { - event->pending_sigtrap =3D pending_id; + + notify_mode =3D in_nmi() ? TWA_NMI_CURRENT : TWA_RESUME; + + if (!event->pending_work && + !task_work_add(current, &event->pending_task, notify_mode)) { + event->pending_work =3D pending_id; local_inc(&event->ctx->nr_pending); =20 event->pending_addr =3D 0; if (valid_sample && (data->sample_flags & PERF_SAMPLE_ADDR)) event->pending_addr =3D data->addr; - irq_work_queue(&event->pending_irq); + } else if (event->attr.exclude_kernel && valid_sample) { /* * Should not be able to return to user space without - * consuming pending_sigtrap; with exceptions: + * consuming pending_work; with exceptions: * * 1. Where !exclude_kernel, events can overflow again * in the kernel without returning to user space. @@ -9756,7 +9745,7 @@ static int __perf_event_overflow(struct perf_event *e= vent, * To approximate progress (with false negatives), * check 32-bit hash of the current IP. */ - WARN_ON_ONCE(event->pending_sigtrap !=3D pending_id); + WARN_ON_ONCE(event->pending_work !=3D pending_id); } } =20 --=20 2.45.2 From nobody Wed Dec 17 15:56:43 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFF731B3F06; Thu, 4 Jul 2024 17:04:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112674; cv=none; b=fN2yKMfqXfNQuJV/JLA6IPQYwkwX+cLLZv/eX+eoF0A41ot+ONYBg2NkcRdp0ZPVWahwzNUFR784OodVtfi+OVxA/IhaNz4o+XTaRksX/wrRuXWBiEWz/1fH0JIZlST5tCDSI5tw1VH0jrNSmZ8QV4aFDlQZTlXDkIOvBHLCNbg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112674; c=relaxed/simple; bh=aH/OGvLUI9SJrDSPzGOMJ+Bjl+QQ78TExgwXePwAdS4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T+67qzquwrQsIha+quyM58k0hmISCVGxH5yR1q2+HIuK27IZfQW1MmzDuox0BfsmKS+2/rvXYPZ9uTCAPd3rVGQ5j60BC0RswzGdxEwCjtYpAU9X37T0rIbyCTaBul0Uy/7KNmpzgrENQVWgBuN9K4zYVgjWMy0Qh85ZfaYWkXY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=1BvjwNtL; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=BzOH1PyB; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="1BvjwNtL"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="BzOH1PyB" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1720112670; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MHWQ5GgUYoJtg5IuAZ3WhjnGiXq5JPhWYhqyDGJpC1Y=; b=1BvjwNtLyO2iJGLwlK0SZK0RFlhFekkVP17WSg7czIyyP9gDOU0lBbOv1CyqnCBw9sov0m vRKtsREuAMBRHYBoXkJCNa7vSlEFb+NjhcI5P3KCHC7iL6ZRpAOHPdhOM63t9vrpwRZeQm DpZh1pkMfvbHOQoZ0wqHsKw42Qd4Th6T81vcuiLl5BZS45i994+FyupPTYN4EL9QDlgDuf 7tVH54bOz/0XOde8ibtJl/FQLbDtsfYmp211EY7/5WtYirVNXPHcx1o/TEgYA7qyPO7zgW rteAarjZBpqnJ1Gc/qk4IXMNLHmSL5gW9DxlnTggjrTdJ0lWXMZveJMRsLIjqQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1720112670; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MHWQ5GgUYoJtg5IuAZ3WhjnGiXq5JPhWYhqyDGJpC1Y=; b=BzOH1PyBNSVd6ur7CbdUDdPBVz/8DWbcCnFPzNiks5GrqiyDSlVxmMhieFfKGW3Gw/SapY TkEbquVojCLDh/Bg== To: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Frederic Weisbecker , Ian Rogers , Ingo Molnar , Jiri Olsa , Kan Liang , Marco Elver , Mark Rutland , Namhyung Kim , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH v5 4/7] perf: Shrink the size of the recursion counter. Date: Thu, 4 Jul 2024 19:03:38 +0200 Message-ID: <20240704170424.1466941-5-bigeasy@linutronix.de> In-Reply-To: <20240704170424.1466941-1-bigeasy@linutronix.de> References: <20240704170424.1466941-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There are four recursion counter, one for each context. The type of the counter is `int' but the counter is used as `bool' since it is only incremented if zero. The main goal here is to shrink the whole struct into 32bit int which can later be added task_struct into an existing hole. Reduce the type of the recursion counter to an unsigned char, keep the increment/ decrement operation. Tested-by: Marco Elver Signed-off-by: Sebastian Andrzej Siewior --- kernel/events/callchain.c | 2 +- kernel/events/core.c | 2 +- kernel/events/internal.h | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 1273be84392cf..ad57944b6c40e 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -29,7 +29,7 @@ static inline size_t perf_callchain_entry__sizeof(void) sysctl_perf_event_max_contexts_per_stack)); } =20 -static DEFINE_PER_CPU(int, callchain_recursion[PERF_NR_CONTEXTS]); +static DEFINE_PER_CPU(u8, callchain_recursion[PERF_NR_CONTEXTS]); static atomic_t nr_callchain_events; static DEFINE_MUTEX(callchain_mutex); static struct callchain_cpus_entries *callchain_cpus_entries; diff --git a/kernel/events/core.c b/kernel/events/core.c index c278aefa94e76..bd4b81bf63b6d 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9776,7 +9776,7 @@ struct swevent_htable { int hlist_refcount; =20 /* Recursion avoidance in each contexts */ - int recursion[PERF_NR_CONTEXTS]; + u8 recursion[PERF_NR_CONTEXTS]; }; =20 static DEFINE_PER_CPU(struct swevent_htable, swevent_htable); diff --git a/kernel/events/internal.h b/kernel/events/internal.h index 5150d5f84c033..f9a3244206b20 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -208,7 +208,7 @@ arch_perf_out_copy_user(void *dst, const void *src, uns= igned long n) =20 DEFINE_OUTPUT_COPY(__output_copy_user, arch_perf_out_copy_user) =20 -static inline int get_recursion_context(int *recursion) +static inline int get_recursion_context(u8 *recursion) { unsigned char rctx =3D interrupt_context_level(); =20 @@ -221,7 +221,7 @@ static inline int get_recursion_context(int *recursion) return rctx; } =20 -static inline void put_recursion_context(int *recursion, int rctx) +static inline void put_recursion_context(u8 *recursion, int rctx) { barrier(); recursion[rctx]--; --=20 2.45.2 From nobody Wed Dec 17 15:56:43 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A055F1B3F22; Thu, 4 Jul 2024 17:04:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112675; cv=none; b=rAOA8+mzrRmRBkbrGJdjU4iYB19uJ7Kp2dZYFEo1qG9pft8Cuo99BDVS5fuPvdFopjyzQSQwF8QNy3m66PIcctGnhA8SqvHyE5kJTwWMTuSbMbNowV3u823rnOxHfSISnIO9vLgzWmYn/q/1+ejpoRU/Vu6ulAygiih1s7MZqZE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112675; c=relaxed/simple; bh=7Qm+cOVAwIQaBTwwWubJEe9rJl3c3MGGBcBHZMgHC68=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F4yQrN3rTlF22ou+e3DjGTD7KdWaNHD0zPbCHuvj6WqoBDZr/cKZNXPIC9aWwErGODS5mosDsMseRWq+p2y3qTt9rdWa2R9aLn7hChjA0y1Een3EeK+fEaT3eFs6jX/3jGc/2zloxk9oWn2U77a8GsW8hRjeRYA5p5BVSfdmOS8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Qczxi2jD; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=+0cLXatt; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Qczxi2jD"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="+0cLXatt" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1720112671; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4BmoEiKHdjgFDrhp3mcuYFn54BZ+zNx9uZzc00KIUtM=; b=Qczxi2jDFOjvCDvHvage20Ub2s719/uPdRXZmz2FgqI/tkNfilq/IIx3FZbW7cJTdbAWHN uwkpTLdVW60UljVISBmfgSmDJxI9lmvYswYIPy7b9pBZmLRvWbudfKEMS+rgWwRJHgPcqI 5tDFzuIGDpgNRb/omxmeveXCqlJsOEu1MkW3kl93emk8aX2HTV6oTLSjSr1J2TD/5fzzo3 LvYnpLFl4OPkQGhvsJZy4ByCWnw9t9wHZuxXT8xERSId0IQSyOnmcvXkuaUgTb0iavFeVO ygT7WUcvUtZ5v/SEHmtMjw5OZtZbNuPP2VQz3sjoy5C/vM2iyg9cD7fjtX8YPw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1720112671; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4BmoEiKHdjgFDrhp3mcuYFn54BZ+zNx9uZzc00KIUtM=; b=+0cLXatt4AzL2iRwmfoOr9T2FKYm49DfTFZysv07N/QNTS/lEPFENg6yIofvXtXlL7NLYw nFGCSRg/RwfmGYCQ== To: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Frederic Weisbecker , Ian Rogers , Ingo Molnar , Jiri Olsa , Kan Liang , Marco Elver , Mark Rutland , Namhyung Kim , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH v5 5/7] perf: Move swevent_htable::recursion into task_struct. Date: Thu, 4 Jul 2024 19:03:39 +0200 Message-ID: <20240704170424.1466941-6-bigeasy@linutronix.de> In-Reply-To: <20240704170424.1466941-1-bigeasy@linutronix.de> References: <20240704170424.1466941-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The swevent_htable::recursion counter is used to avoid creating an swevent while an event is processed to avoid recursion. The counter is per-CPU and preemption must be disabled to have a stable counter. perf_pending_task() disables preemption to access the counter and then signal. This is problematic on PREEMPT_RT because sending a signal uses a spinlock_t which must not be acquired in atomic on PREEMPT_RT because it becomes a sleeping lock. The atomic context can be avoided by moving the counter into the task_struct. There is a 4 byte hole between futex_state (usually always on) and the following perf pointer (perf_event_ctxp). After the recursion lost some weight it fits perfectly. Move swevent_htable::recursion into task_struct. Tested-by: Marco Elver Signed-off-by: Sebastian Andrzej Siewior --- include/linux/perf_event.h | 6 ------ include/linux/sched.h | 7 +++++++ kernel/events/core.c | 13 +++---------- kernel/events/internal.h | 2 +- 4 files changed, 11 insertions(+), 17 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index ea0d82418d854..99a7ea1d29ed5 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -970,12 +970,6 @@ struct perf_event_context { local_t nr_pending; }; =20 -/* - * Number of contexts where an event can trigger: - * task, softirq, hardirq, nmi. - */ -#define PERF_NR_CONTEXTS 4 - struct perf_cpu_pmu_context { struct perf_event_pmu_context epc; struct perf_event_pmu_context *task_epc; diff --git a/include/linux/sched.h b/include/linux/sched.h index 61591ac6eab6d..afb1087f5831b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -734,6 +734,12 @@ enum perf_event_task_context { perf_nr_task_contexts, }; =20 +/* + * Number of contexts where an event can trigger: + * task, softirq, hardirq, nmi. + */ +#define PERF_NR_CONTEXTS 4 + struct wake_q_node { struct wake_q_node *next; }; @@ -1256,6 +1262,7 @@ struct task_struct { unsigned int futex_state; #endif #ifdef CONFIG_PERF_EVENTS + u8 perf_recursion[PERF_NR_CONTEXTS]; struct perf_event_context *perf_event_ctxp; struct mutex perf_event_mutex; struct list_head perf_event_list; diff --git a/kernel/events/core.c b/kernel/events/core.c index bd4b81bf63b6d..1a26a9c33306a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9774,11 +9774,7 @@ struct swevent_htable { struct swevent_hlist *swevent_hlist; struct mutex hlist_mutex; int hlist_refcount; - - /* Recursion avoidance in each contexts */ - u8 recursion[PERF_NR_CONTEXTS]; }; - static DEFINE_PER_CPU(struct swevent_htable, swevent_htable); =20 /* @@ -9976,17 +9972,13 @@ DEFINE_PER_CPU(struct pt_regs, __perf_regs[4]); =20 int perf_swevent_get_recursion_context(void) { - struct swevent_htable *swhash =3D this_cpu_ptr(&swevent_htable); - - return get_recursion_context(swhash->recursion); + return get_recursion_context(current->perf_recursion); } EXPORT_SYMBOL_GPL(perf_swevent_get_recursion_context); =20 void perf_swevent_put_recursion_context(int rctx) { - struct swevent_htable *swhash =3D this_cpu_ptr(&swevent_htable); - - put_recursion_context(swhash->recursion, rctx); + put_recursion_context(current->perf_recursion, rctx); } =20 void ___perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr) @@ -13653,6 +13645,7 @@ int perf_event_init_task(struct task_struct *child,= u64 clone_flags) { int ret; =20 + memset(child->perf_recursion, 0, sizeof(child->perf_recursion)); child->perf_event_ctxp =3D NULL; mutex_init(&child->perf_event_mutex); INIT_LIST_HEAD(&child->perf_event_list); diff --git a/kernel/events/internal.h b/kernel/events/internal.h index f9a3244206b20..f0daaa6f2a33b 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -221,7 +221,7 @@ static inline int get_recursion_context(u8 *recursion) return rctx; } =20 -static inline void put_recursion_context(u8 *recursion, int rctx) +static inline void put_recursion_context(u8 *recursion, unsigned char rctx) { barrier(); recursion[rctx]--; --=20 2.45.2 From nobody Wed Dec 17 15:56:43 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1C3A1B3F23; Thu, 4 Jul 2024 17:04:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112675; cv=none; b=XRvQ8kPZwKj2vU+QR8/QaI7vYiBbc7JectsyjVR5jLjfG3uTnKwSvBpw/gfjtUHKKl+ky1GMW1xzuizIRqiCaTuPCczjsgpjRmMmS+h4Uqy24ukihTCjSOlSqmOXbvYfFNFRsmfZuGs7dIT0LEHMy9IqUnq3cCv46EsuNTXyhJs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112675; c=relaxed/simple; bh=S8OVgI+z3/1TkjlEzzhcY/BKekKcyxSOhl+MLCQeg/Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QqoRLBcBDRHBmOswiSOMjaR/kl5tF9NYs8UVIHIaqNFk8wcIQuYJEOq08vKDU3q1cQ77sNsvpR6foc1kAZCCBB6LJ2SMJb658TZKHGlGhBwlY3Y7wB9+v6on4yNtfVj3QvHsC8NZ55WfvVCOxNoBcPr48HB+6LgehgujD3simhI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=2Ff73+0i; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=IwnVUIFC; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="2Ff73+0i"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="IwnVUIFC" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1720112671; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QGqrJgSvPRRSLcKLNJWunSQ8rRcOGjf/kJDwlFzATiA=; b=2Ff73+0iE5zoEOsuJ5v51gM5a/HAbvf/m1LM7H1Hydlqx3CMfN4FHiYXorP67T0LVNA3pz rAzvLITwlhNBHvOc8e3uqEmIchgtihOF98HBGXHj80pc54GLSUG/ECv8lESIHgszNUntLZ tFrnGMeElA+YO8CE2vAVHhDrD5jPY+6QokQJjlyDtaJvRiL3/sV25IAxcRHSJX3pldh6Kj 6PYPV1NUdhl6ArsobA8/RPJ3+YlqQ7TE3vE2/d0TZFULlEn4uH8iZg7aMcdBgx5d+27v83 HCS5WEMmr83XQ30XZ0tBz7djv5Y4ceMha0iGk3pW+IW+na9Vzi3YlbgAkNSnPg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1720112671; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QGqrJgSvPRRSLcKLNJWunSQ8rRcOGjf/kJDwlFzATiA=; b=IwnVUIFCuSkY5wwfIZMyb7+D9CCI+cUZ0cpkJBStoGKFJls6kj63n7D7d89tJKCOM0Hujp Xd16soKsvp6+jGCQ== To: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Frederic Weisbecker , Ian Rogers , Ingo Molnar , Jiri Olsa , Kan Liang , Marco Elver , Mark Rutland , Namhyung Kim , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH v5 6/7] perf: Don't disable preemption in perf_pending_task(). Date: Thu, 4 Jul 2024 19:03:40 +0200 Message-ID: <20240704170424.1466941-7-bigeasy@linutronix.de> In-Reply-To: <20240704170424.1466941-1-bigeasy@linutronix.de> References: <20240704170424.1466941-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" perf_pending_task() is invoked in task context and disables preemption because perf_swevent_get_recursion_context() used to access per-CPU variables. The other reason is to create a RCU read section while accessing the perf_event. The recursion counter is no longer a per-CPU accounter so disabling preemption is no longer required. The RCU section is needed and must be created explicit. Replace the preemption-disable section with a explicit RCU-read section. Tested-by: Marco Elver Signed-off-by: Sebastian Andrzej Siewior --- kernel/events/core.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 1a26a9c33306a..67f5aab933c81 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5208,10 +5208,9 @@ static void perf_pending_task_sync(struct perf_event= *event) } =20 /* - * All accesses related to the event are within the same - * non-preemptible section in perf_pending_task(). The RCU - * grace period before the event is freed will make sure all - * those accesses are complete by then. + * All accesses related to the event are within the same RCU section in + * perf_pending_task(). The RCU grace period before the event is freed + * will make sure all those accesses are complete by then. */ rcuwait_wait_event(&event->pending_work_wait, !event->pending_work, TASK_= UNINTERRUPTIBLE); } @@ -6842,7 +6841,7 @@ static void perf_pending_task(struct callback_head *h= ead) * critical section as the ->pending_work reset. See comment in * perf_pending_task_sync(). */ - preempt_disable_notrace(); + rcu_read_lock(); /* * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. @@ -6855,10 +6854,10 @@ static void perf_pending_task(struct callback_head = *head) local_dec(&event->ctx->nr_pending); rcuwait_wake_up(&event->pending_work_wait); } + rcu_read_unlock(); =20 if (rctx >=3D 0) perf_swevent_put_recursion_context(rctx); - preempt_enable_notrace(); } =20 #ifdef CONFIG_GUEST_PERF_EVENTS --=20 2.45.2 From nobody Wed Dec 17 15:56:43 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3B911B3F2A; Thu, 4 Jul 2024 17:04:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112675; cv=none; b=kmwWAC3aLCjwIX8yiY2iQxLlwo4pN/nFhBs5UzCvhnVhuCLodELqsuDJAEF0uFU+mbk3kFdUsy6m5VFMdFaxIv7qvi9s162ThCgG5yZRbz0L6fyeOOJIX7jhHcQ0w2zt1tQGPWoCXZRtlj0Al1yGOG7otBfO5vXPdNtNdxjLcOA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720112675; c=relaxed/simple; bh=PMXajY0yYWqHBFxHNSd9Xq7MdsGEj5g/AqG2BowAAMQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UWp/gXS7B2xcR2faZYHvehvsyX5v2mEBosvaKVItKZiIMatOBhXyLmvHIlPX2hagrY0oo8Se30zMiyz/82wLksFa/8OvMgV1NO0NXVDn9Lr3AUymLw6THFZyDnWA8CBkH4w9YS0rVdSCjmYImb4PdkYjIkO78gdA/7bnZb6nkRI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=RdYEBRiK; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=+VcvC5wc; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="RdYEBRiK"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="+VcvC5wc" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1720112672; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4LOnoc+dfeuSsOqavqeqWNmyyI3wODKhBqm5Wu987To=; b=RdYEBRiKB+jJiBSybOyVNnvFMFZFm/LzJNKdRFzJOnvYA8YjncYOY2l/Ab/TE9WqjbrtVL +xS3D2avhqvuygqgHkclko5+Tb2+ALhal+ahTMX8mvWXLt20FwXzPXPrveFE7ISSEiiIv6 +4deqdOnTyJ1bCjPZSiQpRDBZSRXKbIRJNIr6EbX9HlIc4apEpr6DbIjGHizZmsQ6M1qrX GWik52rSL7DOFzJoz9DIbkt7yX1oGWaCeQdvcXvGoHPxkytqRTr5RlEMt2TDvS/3/fCVq3 qw2t18dXk/9jDM91u94+wkz6u3aja3sDPDKtQKGO54x4UklyckzfVHmMjzrAbg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1720112672; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4LOnoc+dfeuSsOqavqeqWNmyyI3wODKhBqm5Wu987To=; b=+VcvC5wcVkf/zi8ndISmb19eBQSz49NGSOXupiI9DdOQDPMUGZOzGUhg2GqPvjpNjaLRUs MjcSXvGKhDbTqQCg== To: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Frederic Weisbecker , Ian Rogers , Ingo Molnar , Jiri Olsa , Kan Liang , Marco Elver , Mark Rutland , Namhyung Kim , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior , Arnaldo Carvalho de Melo Subject: [PATCH v5 7/7] perf: Split __perf_pending_irq() out of perf_pending_irq() Date: Thu, 4 Jul 2024 19:03:41 +0200 Message-ID: <20240704170424.1466941-8-bigeasy@linutronix.de> In-Reply-To: <20240704170424.1466941-1-bigeasy@linutronix.de> References: <20240704170424.1466941-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" perf_pending_irq() invokes perf_event_wakeup() and __perf_pending_irq(). The former is in charge of waking any tasks which waits to be woken up while the latter disables perf-events. The irq_work perf_pending_irq(), while this an irq_work, the callback is invoked in thread context on PREEMPT_RT. This is needed because all the waking functions (wake_up_all(), kill_fasync()) acquire sleep locks which must not be used with disabled interrupts. Disabling events, as done by __perf_pending_irq(), expects a hardirq context and disabled interrupts. This requirement is not fulfilled on PREEMPT_RT. Split functionality based on perf_event::pending_disable into irq_work named `pending_disable_irq' and invoke it in hardirq context on PREEMPT_RT. Rename the split out callback to perf_pending_disable(). Tested-by: Marco Elver Tested-by: Arnaldo Carvalho de Melo Reported-by: Arnaldo Carvalho de Melo Link: https://lore.kernel.org/all/ZMAtZ2t43GXoF6tM@kernel.org/ Signed-off-by: Sebastian Andrzej Siewior --- include/linux/perf_event.h | 1 + kernel/events/core.c | 29 ++++++++++++++++++++++------- 2 files changed, 23 insertions(+), 7 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 99a7ea1d29ed5..65ece0d5b4b6d 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -783,6 +783,7 @@ struct perf_event { unsigned int pending_disable; unsigned long pending_addr; /* SIGTRAP */ struct irq_work pending_irq; + struct irq_work pending_disable_irq; struct callback_head pending_task; unsigned int pending_work; struct rcuwait pending_work_wait; diff --git a/kernel/events/core.c b/kernel/events/core.c index 67f5aab933c81..0acf6ee4df528 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2451,7 +2451,7 @@ static void __perf_event_disable(struct perf_event *e= vent, * hold the top-level event's child_mutex, so any descendant that * goes to exit will block in perf_event_exit_event(). * - * When called from perf_pending_irq it's OK because event->ctx + * When called from perf_pending_disable it's OK because event->ctx * is the current context on this CPU and preemption is disabled, * hence we can't get into perf_event_task_sched_out for this context. */ @@ -2491,7 +2491,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable); void perf_event_disable_inatomic(struct perf_event *event) { event->pending_disable =3D 1; - irq_work_queue(&event->pending_irq); + irq_work_queue(&event->pending_disable_irq); } =20 #define MAX_INTERRUPTS (~0ULL) @@ -5218,6 +5218,7 @@ static void perf_pending_task_sync(struct perf_event = *event) static void _free_event(struct perf_event *event) { irq_work_sync(&event->pending_irq); + irq_work_sync(&event->pending_disable_irq); perf_pending_task_sync(event); =20 unaccount_event(event); @@ -6760,7 +6761,7 @@ static void perf_sigtrap(struct perf_event *event) /* * Deliver the pending work in-event-context or follow the context. */ -static void __perf_pending_irq(struct perf_event *event) +static void __perf_pending_disable(struct perf_event *event) { int cpu =3D READ_ONCE(event->oncpu); =20 @@ -6798,11 +6799,26 @@ static void __perf_pending_irq(struct perf_event *e= vent) * irq_work_queue(); // FAILS * * irq_work_run() - * perf_pending_irq() + * perf_pending_disable() * * But the event runs on CPU-B and wants disabling there. */ - irq_work_queue_on(&event->pending_irq, cpu); + irq_work_queue_on(&event->pending_disable_irq, cpu); +} + +static void perf_pending_disable(struct irq_work *entry) +{ + struct perf_event *event =3D container_of(entry, struct perf_event, pendi= ng_disable_irq); + int rctx; + + /* + * If we 'fail' here, that's OK, it means recursion is already disabled + * and we won't recurse 'further'. + */ + rctx =3D perf_swevent_get_recursion_context(); + __perf_pending_disable(event); + if (rctx >=3D 0) + perf_swevent_put_recursion_context(rctx); } =20 static void perf_pending_irq(struct irq_work *entry) @@ -6825,8 +6841,6 @@ static void perf_pending_irq(struct irq_work *entry) perf_event_wakeup(event); } =20 - __perf_pending_irq(event); - if (rctx >=3D 0) perf_swevent_put_recursion_context(rctx); } @@ -11967,6 +11981,7 @@ perf_event_alloc(struct perf_event_attr *attr, int = cpu, =20 init_waitqueue_head(&event->waitq); init_irq_work(&event->pending_irq, perf_pending_irq); + event->pending_disable_irq =3D IRQ_WORK_INIT_HARD(perf_pending_disable); init_task_work(&event->pending_task, perf_pending_task); rcuwait_init(&event->pending_work_wait); =20 --=20 2.45.2