From nobody Sun May 10 09:55:26 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C24DC433EF for ; Mon, 9 May 2022 16:00:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238754AbiEIQEm (ORCPT ); Mon, 9 May 2022 12:04:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238726AbiEIQEk (ORCPT ); Mon, 9 May 2022 12:04:40 -0400 X-Greylist: delayed 483 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Mon, 09 May 2022 09:00:45 PDT Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0472F0D; Mon, 9 May 2022 09:00:44 -0700 (PDT) Received: from [2603:3005:d05:2b00:6e0b:84ff:fee2:98bb] (helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1no5gT-0002xt-UY; Mon, 09 May 2022 11:52:29 -0400 Date: Mon, 9 May 2022 11:52:27 -0400 From: Rik van Riel To: Song Liu Cc: , , , , , , , Subject: [RFC] sched,livepatch: call stop_one_cpu in klp_check_and_switch_task Message-ID: <20220509115227.6075105e@imladris.surriel.com> In-Reply-To: <20220507174628.2086373-1-song@kernel.org> References: <20220507174628.2086373-1-song@kernel.org> X-Mailer: Claws Mail 4.0.0 (GTK+ 3.24.31; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: riel@shelob.surriel.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" After talking with Peter, this seems like it might be a potential approach to fix the issue for kernels both with PREEMPT enabled and disabled. If this looks like a reasonable approach to people, we can run experiments with this patch on a few thousand systems, and compare it with the kernel live patch transition latencies (and number of failures) on a kernel without that patch. Does this look like an approach that could work? ---8<--- sched,livepatch: call stop_one_cpu in klp_check_and_switch_task If a running task fails to transition to the new kernel live patch after the first attempt, use the stopper thread to preempt it during subsequent attem= pts at switching to the new kernel live patch. Signed-off-by: Rik van Riel diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c index 5d03a2ad1066..26e9e5f09822 100644 --- a/kernel/livepatch/transition.c +++ b/kernel/livepatch/transition.c @@ -9,6 +9,7 @@ =20 #include #include +#include #include "core.h" #include "patch.h" #include "transition.h" @@ -281,6 +282,11 @@ static int klp_check_and_switch_task(struct task_struc= t *task, void *arg) return 0; } =20 +static int kpatch_dummy_fn(void *dummy) +{ + return 0; +} + /* * Try to safely switch a task to the target patch state. If it's current= ly * running, or it's sleeping on a to-be-patched or to-be-unpatched functio= n, or @@ -315,6 +321,9 @@ static bool klp_try_switch_task(struct task_struct *tas= k) case -EBUSY: /* klp_check_and_switch_task() */ pr_debug("%s: %s:%d is running\n", __func__, task->comm, task->pid); + /* Preempt the task from the second KLP switch attempt. */ + if (klp_signals_cnt) + stop_one_cpu(task_cpu(task), kpatch_dummy_fn, NULL); break; case -EINVAL: /* klp_check_and_switch_task() */ pr_debug("%s: %s:%d has an unreliable stack\n",