From nobody Sat Feb 7 23:12:01 2026 Received: from h3cspam02-ex.h3c.com (smtp.h3c.com [60.191.123.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E349610E; Thu, 14 Mar 2024 07:08:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=60.191.123.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710400104; cv=none; b=UEMgPgN4JzSrqfyY3iwGtKtf2ZYicR4za0AsDp1plnf7Rby8KIE4ngawAjll7zfCUsLFBtsmVwwSwd5lDbpfum2nr+U+NcE8bVPQq/WGpXFl7ywGIfIyenTdD42Gy/OppprMeL2nZ2jY58BwxRqJ9pYzeMyfiHVlpZg3Hn1pNPo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710400104; c=relaxed/simple; bh=VHQmqpvBmzUmxKyyeewCBIRvDYNSD5Sb7pCTFIjAXyw=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=AcSp/9MdygN+H3vM7x6bIjsTZyR0K+W4qh1uaAAtUIteqlhiVoTwOcCxJ0s+SqwOSMYO2Ekyxq/t6dhmh1DGdSU3xCSW6z9AVDeSCfK54FlGrseXMQlzpihpColaxDQmtezzHze5UFLFHn38yQASjTq7p3RRGnfy3IlaM8e9wy0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=h3c.com; spf=pass smtp.mailfrom=h3c.com; arc=none smtp.client-ip=60.191.123.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=h3c.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=h3c.com Received: from mail.maildlp.com ([172.25.15.154]) by h3cspam02-ex.h3c.com with ESMTP id 42E76L3B099275; Thu, 14 Mar 2024 15:06:21 +0800 (GMT-8) (envelope-from liu.yeC@h3c.com) Received: from DAG6EX02-IMDC.srv.huawei-3com.com (unknown [10.62.14.11]) by mail.maildlp.com (Postfix) with ESMTP id 582D3200BBEB; Thu, 14 Mar 2024 15:07:53 +0800 (CST) Received: from DAG6EX02-IMDC.srv.huawei-3com.com (10.62.14.11) by DAG6EX02-IMDC.srv.huawei-3com.com (10.62.14.11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.27; Thu, 14 Mar 2024 15:06:22 +0800 Received: from DAG6EX02-IMDC.srv.huawei-3com.com ([fe80::4c21:7c89:4f9d:e4c4]) by DAG6EX02-IMDC.srv.huawei-3com.com ([fe80::4c21:7c89:4f9d:e4c4%16]) with mapi id 15.02.1258.027; Thu, 14 Mar 2024 15:06:22 +0800 From: Liuye To: Daniel Thompson CC: "jason.wessel@windriver.com" , "dianders@chromium.org" , "gregkh@linuxfoundation.org" , "jirislaby@kernel.org" , "kgdb-bugreport@lists.sourceforge.net" , "linux-kernel@vger.kernel.org" , "linux-serial@vger.kernel.org" Subject: =?gb2312?B?tPC4tDogtPC4tDogtPC4tDogtPC4tDogtPC4tDogW1BBVENIXSBrZGI6IEZp?= =?gb2312?Q?x_the_deadlock_issue_in_KDB_debugging.?= Thread-Topic: =?gb2312?B?tPC4tDogtPC4tDogtPC4tDogtPC4tDogW1BBVENIXSBrZGI6IEZpeCB0aGUg?= =?gb2312?Q?deadlock_issue_in_KDB_debugging.?= Thread-Index: AQHaafG3YC/Li+j42kau1FDQhHr2m7EfIsgAgAMadaD///fWgIAJrHcQgAeL+QCAAIb8YP//gGOAgAGAGvCAAFNzgIABn6DQ Date: Thu, 14 Mar 2024 07:06:22 +0000 Message-ID: <56ed54fd241c462189d2d030ad51eac6@h3c.com> References: <20240228025602.3087748-1-liu.yeC@h3c.com> <20240228120516.GA22898@aspen.lan> <8b41d34adaef4ddcacde2dd00d4e3541@h3c.com> <20240301105931.GB5795@aspen.lan> <2ea381e7407a49aaa0b08fa7d4ff62d3@h3c.com> <20240312095756.GB202685@aspen.lan> <06cfa3459ed848cf8f228997b983cf53@h3c.com> <20240312102419.GC202685@aspen.lan> <410a443612e8441cb729c640a0d606c6@h3c.com> <20240313141745.GD202685@aspen.lan> In-Reply-To: <20240313141745.GD202685@aspen.lan> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-sender-location: DAG2 Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-DNSRBL: X-SPAM-SOURCE-CHECK: pass X-MAIL: h3cspam02-ex.h3c.com 42E76L3B099275 Content-Type: text/plain; charset="utf-8" >On Wed, Mar 13, 2024 at 01:22:17AM +0000, Liuye wrote: >> >On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote: >> >> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote: >> >> >> I know that you said schedule_work is not NMI save, which is the=20 >> >> >> first issue. Perhaps it can be fixed using irq_work_queue. But=20 >> >> >> even if irq_work_queue is used to implement it, there will still=20 >> >> >> be a deadlock problem because slave cpu1 still has not released=20 >> >> >> the running queue lock of master CPU0. >> >> > >> >> >This doesn't sound right to me. Why do you think CPU1 won't=20 >> >> >release the run queue lock? >> >> >> >> In this example, CPU1 is waiting for CPU0 to release=20 >> >> dbg_slave_lock. >> > >> >That shouldn't be a problem. CPU0 will have released that lock by the=20 >> >time the irq work is dispatched. >> >> Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to be=20 >> handled, and we are back to the previous issue. > >Sorry but I still don't understand what problem you think can happen here.= What is wrong with calling schedule_work() from the IRQ work handler? > >Both irq_work_queue() and schedule_work() are calls to queue deferred work= . It does not matter when the work is queued (providing we are lock safe). = What matters is when the work is actually executed. > >Please can you describe the problem you think exists based on when the wor= k is executed. CPU0 enters the KDB process when processing serial port interrupts and trig= gers an IPI (NMI) to other CPUs.=20 After entering a stable state, CPU0 is in interrupt context, while other CP= Us are in NMI context.=20 Before other CPUs enter NMI context, there is a chance to obtain the runnin= g queue of CPU0.=20 At this time, when CPU0 is processing kgdboc_restore_input, calling schedul= e_work, need_more_worker here determines the chance to wake up processes on= system_wq.=20 This will cause CPU0 to acquire the running queue lock of this core, which = is held by other CPUs.=20 but other CPUs are still in NMI context and have not exited because waiting= for CPU0 to release the dbg_slave_lock after schedule_work. After thinking about it, the problem is not whether schedule_work is NMI sa= fe, but that processes on system_wq should not be awakened immediately when= schedule_work is called.=20 I replaced schedule_work with schedule_delayed_work, and this solved my pro= blem. The new patch is as follows: Index: drivers/tty/serial/kgdboc.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- drivers/tty/serial/kgdboc.c (revision 57862) +++ drivers/tty/serial/kgdboc.c (working copy) @@ -92,12 +92,12 @@ mutex_unlock(&kgdboc_reset_mutex); } -static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper= ); +static DECLARE_DELAYED_WORK(kgdboc_restore_input_work, kgdboc_restore_inpu= t_helper); static void kgdboc_restore_input(void) { if (likely(system_state =3D=3D SYSTEM_RUNNING)) - schedule_work(&kgdboc_restore_input_work); + schedule_delayed_work(&kgdboc_restore_input_work,2*HZ); } static int kgdboc_register_kbd(char **cptr) @@ -128,7 +128,7 @@ i--; } } - flush_work(&kgdboc_restore_input_work); + flush_delayed_work(&kgdboc_restore_input_work); } #else /* ! CONFIG_KDB_KEYBOARD */ #define kgdboc_register_kbd(x) 0