From nobody Sun Feb 8 05:41:59 2026 Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D68762EC0AA for ; Thu, 9 Oct 2025 15:07:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760022429; cv=none; b=WQHSDAEaahrvzH9ok0ipXIBOaDjA1A583uNJ9QGnY+5QKqx7nYJXyY+ubgQzjAWRiWcHuexyGbYVphU+k/DPz/APAiqcNtRhq57+tGkMTmXnvyWib2ZS2TBulzBh4Fi8EH1WvNu6vndlaXEKB/lEspv0PU+Ap4BYs1fqmEHxh1U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760022429; c=relaxed/simple; bh=MCV/UPvicPJb69eUQgmnsoFbwCslV+n9NPE5LP4vs4Q=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=uyO8lynY86qqy3g6bd71ocO134gGVRksaLWtpYY+8+d7QAe4BR7DCfTQUfKUJAtPcHSMjuqqRZrRrECE1VDtNjd8HCYngtqBh7xfAPIB6UnxDljv2gYqDUlGsF81of24y2mm+AB+JWqLeaeWeh8epV1RqIgP0bRGrV7Xan66gyU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=LThOjMvF; arc=none smtp.client-ip=209.85.210.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LThOjMvF" Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-76e4fc419a9so1091991b3a.0 for ; Thu, 09 Oct 2025 08:07:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1760022427; x=1760627227; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=jMhg/akYhh9FLVNXWkmixFsJkM0F7NIHl3Wh7qHVX1c=; b=LThOjMvF7B93cWo8/qiqTGd7/dWqB+i+Z4vl7Cf3EGr8Y0r6vh0eY2J/r9M5tVqwnx HjnYycY+2wSgxoiAAxw/WXntH4Ti4NFExFRDW3/dm9Mjt+mAQGw5knbzC4W1VUGiG4Qh 6H6RNJUkxs9/YpKp/jeTmarJv2Gn71H2dpQU8cBRQNkrxEqneTNC18399EqASwLXHjBz HXHJOpZ+8r9/Xxp1OaEM7Pq86eiv74Qe33iM18S+twyYh/k4CGfqTMM6qdxZsCF2cx/6 55F7qsBQTJpxp7Kq9QRq0qDDHRAgDpQVlQVfiWydckHJ5bBdB+ZU0IUwp9TOI1XFLx4x 2RVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760022427; x=1760627227; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jMhg/akYhh9FLVNXWkmixFsJkM0F7NIHl3Wh7qHVX1c=; b=hG498wrVXDjH0SIK5oCO2rkC0ILd8apcNnSpEV0dBwU3r3Ni4IBifEBb3H5B8dCGg1 9+6hl0IvYF1lMbnqHfAbboZKh8zWaGxeiWE9HuNS9/nRuUoFaWdszBFMtxkjRu2k9vWN 2DnOU6b1/K8FKKpQPo7YkcKldrege/Drj1AxOy+tvDMVIey85Y2RL5BIBeQ1GLXbMrR5 VXgKnfLYYt89AA50M/XK48f5ARDRXXARHVFjRXm2lG0rhr8PkEfDCnA/JtRMpZb2fJk5 oniClmA0stwycYYCEooNr+TULaasLHAjELVvoyGCEY4Fsb8Wyfcj8N2FKj/qAcKqKDmX mlEw== X-Forwarded-Encrypted: i=1; AJvYcCVBraLe8ZthezYfYs1QZGp5vPR02UbclVK9K+omDfdtmrSHHDDN5mH3x3PshKDnQEgD2NyC7K0UouVj+lM=@vger.kernel.org X-Gm-Message-State: AOJu0Yz6Sv8t97we1cehKHksgHBddk935vwpAJdzsuEI//Ibt8kcItm1 ++ZXGqQi/BAukVjNg8vho/VH5fEMk+PR/Lj1QXhPxU5+lGz4XdhWKOyk X-Gm-Gg: ASbGncsxw1vrNwim3u9T9aYnrbVwixf1bNtpA5E5V5hMRaHUNfNYJClld/pXTsyOqc1 hKNprdytWnta8p6msjLZrwLDFY+WG4ezjI+3qg8F3LOUvxsdA8mHRNXJDzcY2ZHBXB1w3e9gERG MeBkG3dkppJkjKQ7zyp5OpRMfGu5L4TKcoVUpYKVA32iwdNI9B5t1O1ne0pHJfwicAVtu0o5clj MCcZPXX4DRBzgPnLWS6P3DNlzx0Qupd6KcqhefFmGvKwdQ3COEdV8T78YMSuoFMy/LmYQDz/kxE Dtm//7j4N1Ndk9Wik/Nc+GazDwjXlSvd0ZUynLIALrgqT98lxhAhL+cJyD0BqPsjTG7mFzwD1Ec TBSCF6azHrhWSlJ0LnTXcFagiAZwkDCuAaqBvlrS5YkCI2684u577i2KaocGohXKvooh8cBDMso gmbnKNMZv9 X-Google-Smtp-Source: AGHT+IE8AaDi5/+pKe6FXYvrz6gOtwVnUjotnRHBWrfyDNasFLAlpbPt4G5A8FZhItpKwM89IDiRUw== X-Received: by 2002:a05:6a00:178e:b0:782:ec0f:d271 with SMTP id d2e1a72fcca58-79385ce0c44mr9485358b3a.8.1760022426811; Thu, 09 Oct 2025 08:07:06 -0700 (PDT) Received: from localhost.localdomain ([119.127.198.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-794e33efc46sm3168825b3a.74.2025.10.09.08.07.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Oct 2025 08:07:06 -0700 (PDT) From: Guangbo Cui To: Sebastian Andrzej Siewior , Clark Williams , Steven Rostedt , Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng , Thomas Gleixner , Bjorn Helgaas Cc: Jonathan Cameron , Waiman Long , linux-rt-devel@lists.linux.dev, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, Guangbo Cui Subject: [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t Date: Thu, 9 Oct 2025 15:06:50 +0000 Message-ID: <20251009150651.93618-1-jckeep.cuiguangbo@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable When injecting AER errors under PREEMPT_RT, the kernel may trigger a lockdep warning about an invalid wait context: ``` [ 1850.950780] [ BUG: Invalid wait context ] [ 1850.951152] 6.17.0-11316-g7a405dbb0f03-dirty #7 Not tainted [ 1850.951457] ----------------------------- [ 1850.951680] irq/16-PCIe PME/56 is trying to lock: [ 1850.952004] ffff800082865238 (inject_lock){+.+.}-{3:3}, at: aer_inj_read= _config+0x38/0x1dc [ 1850.952731] other info that might help us debug this: [ 1850.952997] context-{5:5} [ 1850.953192] 5 locks held by irq/16-PCIe PME/56: [ 1850.953415] #0: ffff800082647390 (local_bh){.+.+}-{1:3}, at: __local_bh= _disable_ip+0x30/0x268 [ 1850.953931] #1: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_l= ock_acquire+0x4/0x48 [ 1850.954453] #2: ffff000004bb6c58 (&data->lock){+...}-{3:3}, at: pcie_pm= e_irq+0x34/0xc4 [ 1850.954949] #3: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_l= ock_acquire+0x4/0x48 [ 1850.955420] #4: ffff800082863d10 (pci_lock){....}-{2:2}, at: pci_bus_re= ad_config_dword+0x5c/0xd8 ``` This happens because the AER injection path (`aer_inj_read_config()`) is called in the context of the PCIe PME interrupt thread, which runs through `irq_forced_thread_fn()` under PREEMPT_RT. In this context, `pci_lock` (a raw_spinlock_t) is held with interrupts disabled (`spin_lock_irqsave()`), and then `aer_inj_read_config()` tries to acquire `inject_lock`, which is a `rt_spin_lock`. (Thanks Waiman Long) `rt_spin_lock` may sleep, so acquiring it while holding a raw spinlock with IRQs disabled violates the lock ordering rules. This leads to the =E2=80=9CInvalid wait context=E2=80=9D lockdep warning. In other words, the lock order looks like this: ``` raw_spin_lock_irqsave(&pci_lock); =E2=86=93 rt_spin_lock(&inject_lock); <-- not allowed ``` To fix this, convert `inject_lock` from an `rt_spin_lock` to a `raw_spinlock_t`, a raw spinlock is safe and consistent with the surrounding locking scheme. This resolves the lockdep =E2=80=9CInvalid wait context=E2=80=9D warning ob= served when injecting correctable AER errors through `/dev/aer_inject` on PREEMPT_RT. This was discovered while testing PCIe AER error injection on an arm64 QEMU virtual machine: ``` qemu-system-aarch64 \ -nographic \ -machine virt,highmem=3Doff,gic-version=3D3 \ -cpu cortex-a72 \ -kernel arch/arm64/boot/Image \ -initrd initramfs.cpio.gz \ -append "console=3DttyAMA0 root=3D/dev/ram rdinit=3D/linuxrc earlypri= ntk nokaslr" \ -m 2G \ -smp 1 \ -netdev user,id=3Dnet0,hostfwd=3Dtcp::2223-:22 \ -device virtio-net-pci,netdev=3Dnet0 \ -device pcie-root-port,id=3Drp0,chassis=3D1,slot=3D0x0 \ -device pci-testdev -s -S ``` Injecting a correctable PCIe error via /dev/aer_inject caused a BUG report with "Invalid wait context" in the irq/PCIe thread. ``` ~ # export HEX=3D"00020000000000000100000000000000000000000000000000000000" ~ # echo -n "$HEX" | xxd -r -p | tee /dev/aer_inject >/dev/null [ 1850.947170] pcieport 0000:00:02.0: aer_inject: Injecting errors 00000001= /00000000 into device 0000:00:02.0 [ 1850.949951] [ 1850.950479] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D [ 1850.950780] [ BUG: Invalid wait context ] [ 1850.951152] 6.17.0-11316-g7a405dbb0f03-dirty #7 Not tainted [ 1850.951457] ----------------------------- [ 1850.951680] irq/16-PCIe PME/56 is trying to lock: [ 1850.952004] ffff800082865238 (inject_lock){+.+.}-{3:3}, at: aer_inj_read= _config+0x38/0x1dc [ 1850.952731] other info that might help us debug this: [ 1850.952997] context-{5:5} [ 1850.953192] 5 locks held by irq/16-PCIe PME/56: [ 1850.953415] #0: ffff800082647390 (local_bh){.+.+}-{1:3}, at: __local_bh= _disable_ip+0x30/0x268 [ 1850.953931] #1: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_l= ock_acquire+0x4/0x48 [ 1850.954453] #2: ffff000004bb6c58 (&data->lock){+...}-{3:3}, at: pcie_pm= e_irq+0x34/0xc4 [ 1850.954949] #3: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_l= ock_acquire+0x4/0x48 [ 1850.955420] #4: ffff800082863d10 (pci_lock){....}-{2:2}, at: pci_bus_re= ad_config_dword+0x5c/0xd8 [ 1850.955932] stack backtrace: [ 1850.956412] CPU: 0 UID: 0 PID: 56 Comm: irq/16-PCIe PME Not tainted 6.17= .0-11316-g7a405dbb0f03-dirty #7 PREEMPT_{RT,(full)} [ 1850.957039] Hardware name: linux,dummy-virt (DT) [ 1850.957409] Call trace: [ 1850.957727] show_stack+0x18/0x24 (C) [ 1850.958089] dump_stack_lvl+0x40/0xbc [ 1850.958339] dump_stack+0x18/0x24 [ 1850.958586] __lock_acquire+0xa84/0x3008 [ 1850.958907] lock_acquire+0x128/0x2a8 [ 1850.959171] rt_spin_lock+0x50/0x1b8 [ 1850.959476] aer_inj_read_config+0x38/0x1dc [ 1850.959821] pci_bus_read_config_dword+0x80/0xd8 [ 1850.960079] pcie_capability_read_dword+0xac/0xd8 [ 1850.960454] pcie_pme_irq+0x44/0xc4 [ 1850.960728] irq_forced_thread_fn+0x30/0x94 [ 1850.960984] irq_thread+0x1ac/0x3a4 [ 1850.961308] kthread+0x1b4/0x208 [ 1850.961557] ret_from_fork+0x10/0x20 [ 1850.963088] pcieport 0000:00:02.0: AER: Correctable error message receiv= ed from 0000:00:02.0 [ 1850.963330] pcieport 0000:00:02.0: PCIe Bus Error: severity=3DCorrectabl= e, type=3DPhysical Layer, (Receiver ID) [ 1850.963351] pcieport 0000:00:02.0: device [1b36:000c] error status/mas= k=3D00000001/0000e000 [ 1850.963385] pcieport 0000:00:02.0: [ 0] RxErr (First) ``` Signed-off-by: Guangbo Cui Acked-by: Waiman Long --- Changes in v2: - Pulling kfree() out from the lock critical section. (Thanks Waiman Long) - Link to v1: https://lore.kernel.org/linux-pci/20251007060218.57222-1-jcke= ep.cuiguangbo@gmail.com/ drivers/pci/pcie/aer_inject.c | 40 ++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/drivers/pci/pcie/aer_inject.c b/drivers/pci/pcie/aer_inject.c index 91acc7b17f68..6dd231d9fccd 100644 --- a/drivers/pci/pcie/aer_inject.c +++ b/drivers/pci/pcie/aer_inject.c @@ -72,7 +72,7 @@ static LIST_HEAD(einjected); static LIST_HEAD(pci_bus_ops_list); =20 /* Protect einjected and pci_bus_ops_list */ -static DEFINE_SPINLOCK(inject_lock); +static DEFINE_RAW_SPINLOCK(inject_lock); =20 static void aer_error_init(struct aer_error *err, u32 domain, unsigned int bus, unsigned int devfn, @@ -126,12 +126,12 @@ static struct pci_bus_ops *pci_bus_ops_pop(void) unsigned long flags; struct pci_bus_ops *bus_ops; =20 - spin_lock_irqsave(&inject_lock, flags); + raw_spin_lock_irqsave(&inject_lock, flags); bus_ops =3D list_first_entry_or_null(&pci_bus_ops_list, struct pci_bus_ops, list); if (bus_ops) list_del(&bus_ops->list); - spin_unlock_irqrestore(&inject_lock, flags); + raw_spin_unlock_irqrestore(&inject_lock, flags); return bus_ops; } =20 @@ -223,7 +223,7 @@ static int aer_inj_read_config(struct pci_bus *bus, uns= igned int devfn, int domain; int rv; =20 - spin_lock_irqsave(&inject_lock, flags); + raw_spin_lock_irqsave(&inject_lock, flags); if (size !=3D sizeof(u32)) goto out; domain =3D pci_domain_nr(bus); @@ -236,12 +236,12 @@ static int aer_inj_read_config(struct pci_bus *bus, u= nsigned int devfn, sim =3D find_pci_config_dword(err, where, NULL); if (sim) { *val =3D *sim; - spin_unlock_irqrestore(&inject_lock, flags); + raw_spin_unlock_irqrestore(&inject_lock, flags); return 0; } out: rv =3D aer_inj_read(bus, devfn, where, size, val); - spin_unlock_irqrestore(&inject_lock, flags); + raw_spin_unlock_irqrestore(&inject_lock, flags); return rv; } =20 @@ -255,7 +255,7 @@ static int aer_inj_write_config(struct pci_bus *bus, un= signed int devfn, int domain; int rv; =20 - spin_lock_irqsave(&inject_lock, flags); + raw_spin_lock_irqsave(&inject_lock, flags); if (size !=3D sizeof(u32)) goto out; domain =3D pci_domain_nr(bus); @@ -271,12 +271,12 @@ static int aer_inj_write_config(struct pci_bus *bus, = unsigned int devfn, *sim ^=3D val; else *sim =3D val; - spin_unlock_irqrestore(&inject_lock, flags); + raw_spin_unlock_irqrestore(&inject_lock, flags); return 0; } out: rv =3D aer_inj_write(bus, devfn, where, size, val); - spin_unlock_irqrestore(&inject_lock, flags); + raw_spin_unlock_irqrestore(&inject_lock, flags); return rv; } =20 @@ -304,14 +304,14 @@ static int pci_bus_set_aer_ops(struct pci_bus *bus) if (!bus_ops) return -ENOMEM; ops =3D pci_bus_set_ops(bus, &aer_inj_pci_ops); - spin_lock_irqsave(&inject_lock, flags); + raw_spin_lock_irqsave(&inject_lock, flags); if (ops =3D=3D &aer_inj_pci_ops) goto out; pci_bus_ops_init(bus_ops, bus, ops); list_add(&bus_ops->list, &pci_bus_ops_list); bus_ops =3D NULL; out: - spin_unlock_irqrestore(&inject_lock, flags); + raw_spin_unlock_irqrestore(&inject_lock, flags); kfree(bus_ops); return 0; } @@ -383,7 +383,7 @@ static int aer_inject(struct aer_error_inj *einj) uncor_mask); } =20 - spin_lock_irqsave(&inject_lock, flags); + raw_spin_lock_irqsave(&inject_lock, flags); =20 err =3D __find_aer_error_by_dev(dev); if (!err) { @@ -404,14 +404,14 @@ static int aer_inject(struct aer_error_inj *einj) !(einj->cor_status & ~cor_mask)) { ret =3D -EINVAL; pci_warn(dev, "The correctable error(s) is masked by device\n"); - spin_unlock_irqrestore(&inject_lock, flags); + raw_spin_unlock_irqrestore(&inject_lock, flags); goto out_put; } if (!aer_mask_override && einj->uncor_status && !(einj->uncor_status & ~uncor_mask)) { ret =3D -EINVAL; pci_warn(dev, "The uncorrectable error(s) is masked by device\n"); - spin_unlock_irqrestore(&inject_lock, flags); + raw_spin_unlock_irqrestore(&inject_lock, flags); goto out_put; } =20 @@ -445,7 +445,7 @@ static int aer_inject(struct aer_error_inj *einj) rperr->source_id &=3D 0x0000ffff; rperr->source_id |=3D PCI_DEVID(einj->bus, devfn) << 16; } - spin_unlock_irqrestore(&inject_lock, flags); + raw_spin_unlock_irqrestore(&inject_lock, flags); =20 if (aer_mask_override) { pci_write_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, @@ -523,8 +523,8 @@ static int __init aer_inject_init(void) static void __exit aer_inject_exit(void) { struct aer_error *err, *err_next; - unsigned long flags; struct pci_bus_ops *bus_ops; + LIST_HEAD(einjected_to_free); =20 misc_deregister(&aer_inject_device); =20 @@ -533,12 +533,14 @@ static void __exit aer_inject_exit(void) kfree(bus_ops); } =20 - spin_lock_irqsave(&inject_lock, flags); - list_for_each_entry_safe(err, err_next, &einjected, list) { + scoped_guard(raw_spinlock_irqsave, &inject_lock) { + list_splice_init(&einjected, &einjected_to_free); + } + + list_for_each_entry_safe(err, err_next, &einjected_to_free, list) { list_del(&err->list); kfree(err); } - spin_unlock_irqrestore(&inject_lock, flags); } =20 module_init(aer_inject_init); --=20 2.43.0