From nobody Thu Apr 9 17:58:04 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B74A03ED112 for ; Tue, 3 Mar 2026 19:07:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772564847; cv=none; b=bXQfaJFbF1dXF9i3rFeQiltgA+46yMElF27+mCCko+0lWEBBbi1AifcwZqxzUGMpl2oy69qjeOZiEH4uHRYpc9O5z6bgkpsoOwA7uxntxBBa5upPzpiSFXAUCRqFUPTR/R/MqpD7ByS0KdQx+CN5Xod+9GLF1OimKwFQe1DpmT8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772564847; c=relaxed/simple; bh=FTVwAxulobi1odK7ouUMjFRJzDD++CyABn0c46NIU/M=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=g7GBZfl7DHwo3yP7IUj0FsR2L96sQqDV10hhYXsc26qVe3Nc/ab1MTJV9Qc633ztGTfSTGzYQZ2KZGl8+ne8q5zuztllrxdtgoBRn73t1QR7zNsN2nnT8eOVABDYpM8GiE6OKxQfIR7Wsjd1/vTf/i+u0uG0oW4BXknlUW7Qof8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cpKdg658; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cpKdg658" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1772564844; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=vlhhM+4qkVIuFT9mnQ99wIkDZ8ep1XVRlZ1/RpTvXIw=; b=cpKdg65806fkmMjuGuZrtK12WL4/V89NixZmj2Ot1mDW4hzNCrkPV65KSZwbPOuk0nZEP0 +NWt2CZG8HhGUtVTnKg4RgIb73ctt+UAfQ+TW8A0oAq+smn/dHKA1vaJJJ/ikqED3wc3HU LecrUNiC0lYfwH5jQdhrn0IBOsJk5JE= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-214-gt_U0k_tOC-GC2CogHca9w-1; Tue, 03 Mar 2026 14:07:21 -0500 X-MC-Unique: gt_U0k_tOC-GC2CogHca9w-1 X-Mimecast-MFC-AGG-ID: gt_U0k_tOC-GC2CogHca9w_1772564839 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 081311956048; Tue, 3 Mar 2026 19:07:19 +0000 (UTC) Received: from redhat.com (unknown [10.96.134.28]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 81ABA1800666; Tue, 3 Mar 2026 19:07:17 +0000 (UTC) From: "Herton R. Krzesinski" To: linux-kernel@vger.kernel.org Cc: herton@redhat.com, frederic@kernel.org, peterz@infradead.org, mingo@kernel.org, paulmck@linux.vnet.ibm.com, tglx@linutronix.de, anna-maria@linutronix.de, kyin@redhat.com, jaeshin@redhat.com Subject: [RFC] Processing of raised_list can stall if an IPI/interrupt is missed Date: Tue, 3 Mar 2026 16:07:15 -0300 Message-ID: <20260303190715.935867-1-herton@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Hello, I saw recently a report where a system went down after it stopped processing irq work items in raised_list (from kernel/irq_work.c). The system in quest= ion, from the vmcore data I got, is a linux guest under VMWare (under an x86_64 = host). It seems a very rare ocurrence, from what I know only two different users r= eported it so far. While it was reported on an old RHEL based kernel (4.18), I believe= the issue could still happen in newer kernels, since the processing of raised_list in= principle didn't change. Taking into account an x86_64 system, from my understanding of the code the= re are two ways raised_list can be consumed, either trough irq_work_tick() or through = the irq work interrupt/IPI. If the system has an working APIC, raised_list items ar= e only consumed through interrupt/IPI with irq_work_run() being called at arch/x86/kernel/irq_work.c, and irq_work_tick() will not call irq_work_run_list(raised) because of the arch_irq_work_has_interrupt() check in this case. So in this specific case, if the interrupt/IPI is missed somehow, processin= g of items in raised_list can stall forever, since __irq_work_queue_local() calls llist_add(), checking if it returns false for a non-empty list: if the list= was not consumed due an missed interrupt/IPI, it'll never call irq_work_raise()= again. This is what I saw on the vmcore from the one of the reports I mention abov= e, where the system died after some time, and from it we got some pending irq work i= tems in raised list in CPU 2: crash> pd raised_list:all per_cpu(raised_list, 0) =3D $1 =3D { first =3D 0x0 } per_cpu(raised_list, 1) =3D $2 =3D { first =3D 0x0 } per_cpu(raised_list, 2) =3D $3 =3D { first =3D 0xffffbb22d1609020 } ... crash> list 0xffffbb22d1609020 ffffbb22d1609020 ffffbb233d06b020 ffffbb233901d020 ffffbb2324ec1020 ffffbb232cf59020 ffffbb2328f0d020 ffffbb2320e7d020 ffffbb2334fd1020 ffffbb2330f95020 ffffbb231ce39020 ffffbb2318da5020 ffffbb2314d29020 ffffbb22e45f4020 ffffbb23007c5020 ffffbb2310cdd020 ffffbb230c8b1020 ffffbb2308857020 ffffbb2304821020 ffffbb22fc789020 ffffbb22e05f0020 ffffbb22f8715020 ffffbb22f46db020 ffffbb22f06ad020 ffffbb22e8635020 ffffbb22ec673020 ffff93d3a6a1efe0 ffff93d65151e6d0 crash> list 0xffffbb22d1609020 | wc -l 27 All other CPUs had no items, only CPU 2. These pending items looks to have caused some cascade effects which lead to soft lockups and system dying (eg. work item doesn't run, hold up resources and several tasks ends up stuck...). It appears relying on IPI only could be too strict like in this case, altho= ugh I don't know if the system missing an IPI/interrupt is something that can be expected. It looks to me we could have a virtualization bug/issue on this specific case (since running under VMWare), but may be should we make a fal= lback if something like this happens? For example making it less strict and allow irq_work_tick to also process the list? Like below: diff --git a/kernel/irq_work.c b/kernel/irq_work.c index 73f7e1fd4ab4..e47d64b56a38 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -188,9 +188,8 @@ bool irq_work_needs_cpu(void) raised =3D this_cpu_ptr(&raised_list); lazy =3D this_cpu_ptr(&lazy_list); =20 - if (llist_empty(raised) || arch_irq_work_has_interrupt()) - if (llist_empty(lazy)) - return false; + if (llist_empty(raised) && llist_empty(lazy)) + return false; =20 /* All work should have been flushed before going offline */ WARN_ON_ONCE(cpu_is_offline(smp_processor_id())); @@ -270,7 +269,7 @@ void irq_work_tick(void) { struct llist_head *raised =3D this_cpu_ptr(&raised_list); =20 - if (!llist_empty(raised) && !arch_irq_work_has_interrupt()) + if (!llist_empty(raised)) irq_work_run_list(raised); =20 if (!IS_ENABLED(CONFIG_PREEMPT_RT)) However, that above essentially reverts 76a33061b9323b7fdb220ae5fa116c10833= ec22e ("irq_work: Force raised irq work to run on irq work interrupt"), and could reintroduce the issue it fixed, however, since nohz_full_kick_func() (which= is the renamed nohz_full_kick_work_func()) is empty now, that may be is ok to not be strict anymore about making raised_list only run in irq work interrupt? Or may be it's not worth changing this since this is rare and missed self I= PI should not be expected?