From nobody Sat Feb  7 23:12:01 2026
Received: from h3cspam02-ex.h3c.com (smtp.h3c.com [60.191.123.50])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E349610E;
	Thu, 14 Mar 2024 07:08:20 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=60.191.123.50
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1710400104; cv=none;
 b=UEMgPgN4JzSrqfyY3iwGtKtf2ZYicR4za0AsDp1plnf7Rby8KIE4ngawAjll7zfCUsLFBtsmVwwSwd5lDbpfum2nr+U+NcE8bVPQq/WGpXFl7ywGIfIyenTdD42Gy/OppprMeL2nZ2jY58BwxRqJ9pYzeMyfiHVlpZg3Hn1pNPo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1710400104; c=relaxed/simple;
	bh=VHQmqpvBmzUmxKyyeewCBIRvDYNSD5Sb7pCTFIjAXyw=;
	h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To:
	 Content-Type:MIME-Version;
 b=AcSp/9MdygN+H3vM7x6bIjsTZyR0K+W4qh1uaAAtUIteqlhiVoTwOcCxJ0s+SqwOSMYO2Ekyxq/t6dhmh1DGdSU3xCSW6z9AVDeSCfK54FlGrseXMQlzpihpColaxDQmtezzHze5UFLFHn38yQASjTq7p3RRGnfy3IlaM8e9wy0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=h3c.com;
 spf=pass smtp.mailfrom=h3c.com; arc=none smtp.client-ip=60.191.123.50
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=h3c.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=h3c.com
Received: from mail.maildlp.com ([172.25.15.154])
	by h3cspam02-ex.h3c.com with ESMTP id 42E76L3B099275;
	Thu, 14 Mar 2024 15:06:21 +0800 (GMT-8)
	(envelope-from liu.yeC@h3c.com)
Received: from DAG6EX02-IMDC.srv.huawei-3com.com (unknown [10.62.14.11])
	by mail.maildlp.com (Postfix) with ESMTP id 582D3200BBEB;
	Thu, 14 Mar 2024 15:07:53 +0800 (CST)
Received: from DAG6EX02-IMDC.srv.huawei-3com.com (10.62.14.11) by
 DAG6EX02-IMDC.srv.huawei-3com.com (10.62.14.11) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.2.1258.27; Thu, 14 Mar 2024 15:06:22 +0800
Received: from DAG6EX02-IMDC.srv.huawei-3com.com ([fe80::4c21:7c89:4f9d:e4c4])
 by DAG6EX02-IMDC.srv.huawei-3com.com ([fe80::4c21:7c89:4f9d:e4c4%16]) with
 mapi id 15.02.1258.027; Thu, 14 Mar 2024 15:06:22 +0800
From: Liuye <liu.yeC@h3c.com>
To: Daniel Thompson <daniel.thompson@linaro.org>
CC: "jason.wessel@windriver.com" <jason.wessel@windriver.com>,
        "dianders@chromium.org" <dianders@chromium.org>,
        "gregkh@linuxfoundation.org"
	<gregkh@linuxfoundation.org>,
        "jirislaby@kernel.org" <jirislaby@kernel.org>,
        "kgdb-bugreport@lists.sourceforge.net"
	<kgdb-bugreport@lists.sourceforge.net>,
        "linux-kernel@vger.kernel.org"
	<linux-kernel@vger.kernel.org>,
        "linux-serial@vger.kernel.org"
	<linux-serial@vger.kernel.org>
Subject: 
 =?gb2312?B?tPC4tDogtPC4tDogtPC4tDogtPC4tDogtPC4tDogW1BBVENIXSBrZGI6IEZp?=
 =?gb2312?Q?x_the_deadlock_issue_in_KDB_debugging.?=
Thread-Topic: 
 =?gb2312?B?tPC4tDogtPC4tDogtPC4tDogtPC4tDogW1BBVENIXSBrZGI6IEZpeCB0aGUg?=
 =?gb2312?Q?deadlock_issue_in_KDB_debugging.?=
Thread-Index: 
 AQHaafG3YC/Li+j42kau1FDQhHr2m7EfIsgAgAMadaD///fWgIAJrHcQgAeL+QCAAIb8YP//gGOAgAGAGvCAAFNzgIABn6DQ
Date: Thu, 14 Mar 2024 07:06:22 +0000
Message-ID: <56ed54fd241c462189d2d030ad51eac6@h3c.com>
References: <20240228025602.3087748-1-liu.yeC@h3c.com>
 <20240228120516.GA22898@aspen.lan> <8b41d34adaef4ddcacde2dd00d4e3541@h3c.com>
 <20240301105931.GB5795@aspen.lan> <2ea381e7407a49aaa0b08fa7d4ff62d3@h3c.com>
 <20240312095756.GB202685@aspen.lan>
 <06cfa3459ed848cf8f228997b983cf53@h3c.com>
 <20240312102419.GC202685@aspen.lan>
 <410a443612e8441cb729c640a0d606c6@h3c.com>
 <20240313141745.GD202685@aspen.lan>
In-Reply-To: <20240313141745.GD202685@aspen.lan>
Accept-Language: zh-CN, en-US
Content-Language: zh-CN
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-sender-location: DAG2
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-DNSRBL: 
X-SPAM-SOURCE-CHECK: pass
X-MAIL: h3cspam02-ex.h3c.com 42E76L3B099275
Content-Type: text/plain; charset="utf-8"

>On Wed, Mar 13, 2024 at 01:22:17AM +0000, Liuye wrote:
>> >On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
>> >> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
>> >> >> I know that you said schedule_work is not NMI save, which is the=20
>> >> >> first issue. Perhaps it can be fixed using irq_work_queue. But=20
>> >> >> even if irq_work_queue is used to implement it, there will still=20
>> >> >> be a deadlock problem because slave cpu1 still has not released=20
>> >> >> the running queue lock of master CPU0.
>> >> >
>> >> >This doesn't sound right to me. Why do you think CPU1 won't=20
>> >> >release the run queue lock?
>> >>
>> >> In this example, CPU1 is waiting for CPU0 to release=20
>> >> dbg_slave_lock.
>> >
>> >That shouldn't be a problem. CPU0 will have released that lock by the=20
>> >time the irq work is dispatched.
>>
>> Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to be=20
>> handled, and we are back to the previous issue.
>
>Sorry but I still don't understand what problem you think can happen here.=
 What is wrong with calling schedule_work() from the IRQ work handler?
>
>Both irq_work_queue() and schedule_work() are calls to queue deferred work=
. It does not matter when the work is queued (providing we are lock safe). =
What matters is when the work is actually executed.
>
>Please can you describe the problem you think exists based on when the wor=
k is executed.

CPU0 enters the KDB process when processing serial port interrupts and trig=
gers an IPI (NMI) to other CPUs.=20
After entering a stable state, CPU0 is in interrupt context, while other CP=
Us are in NMI context.=20
Before other CPUs enter NMI context, there is a chance to obtain the runnin=
g queue of CPU0.=20
At this time, when CPU0 is processing kgdboc_restore_input, calling schedul=
e_work, need_more_worker here determines the chance to wake up processes on=
 system_wq.=20
This will cause CPU0 to acquire the running queue lock of this core, which =
is held by other CPUs.=20
but other CPUs are still in NMI context and have not exited because waiting=
 for CPU0 to release the dbg_slave_lock after schedule_work.

After thinking about it, the problem is not whether schedule_work is NMI sa=
fe, but that processes on system_wq should not be awakened immediately when=
 schedule_work is called.=20
I replaced schedule_work with schedule_delayed_work, and this solved my pro=
blem.

The new patch is as follows:

Index: drivers/tty/serial/kgdboc.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- drivers/tty/serial/kgdboc.c (revision 57862)
+++ drivers/tty/serial/kgdboc.c (working copy)
@@ -92,12 +92,12 @@
        mutex_unlock(&kgdboc_reset_mutex);
 }

-static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper=
);
+static DECLARE_DELAYED_WORK(kgdboc_restore_input_work, kgdboc_restore_inpu=
t_helper);

 static void kgdboc_restore_input(void)
 {
        if (likely(system_state =3D=3D SYSTEM_RUNNING))
-               schedule_work(&kgdboc_restore_input_work);
+               schedule_delayed_work(&kgdboc_restore_input_work,2*HZ);
 }

 static int kgdboc_register_kbd(char **cptr)
@@ -128,7 +128,7 @@
                        i--;
                }
        }
-       flush_work(&kgdboc_restore_input_work);
+       flush_delayed_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */
 #define kgdboc_register_kbd(x) 0