From nobody Mon Feb 9 14:14:26 2026 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F24B93090CC for ; Sat, 20 Dec 2025 11:03:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766228598; cv=none; b=jrAIIO/lYqwPkOoW4BNGAguDpMzawM/W2Ct7Pkntd7V1ZgJRHj8kgNX5uLJvID4lrxIAV4bkvWkP0eCpkUNIf0x+07Axazj+Vua1CoEll+ttJPu468X+Zuihl2AyNFE8M+bdgtQZhUjagdWDzI3Zlni4F+kG5YM6p8IMzF+61No= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766228598; c=relaxed/simple; bh=o+vEFROiqUoDAPTJIDN1NXPMAkTeLu+cB5g1KoffBU8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=t5NAhFiVbF8NiYlaaBjo/iUb9eiUu+EpQO9r/i0VwY/U1EreTuy4RgvzyKEB1L8eyy4mcUuarkzkp87K5TSFODuFT3UVOCG5pALQYEeBdlRjeEl1QIKN4XtDlgGP9qN5qFNqd4oqBinjJTHrSSjDox0Qi6TyI28B9OBHE33vjeM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=SqTeyUF5; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SqTeyUF5" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-7b9c17dd591so2286917b3a.3 for ; Sat, 20 Dec 2025 03:03:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766228595; x=1766833395; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QmUq8CHI3JkNtGWiGFMZyA0zmDp2QcgRy2thJxu/yXE=; b=SqTeyUF5qSB0bWs/TjWh+IANzwyLtcU5nZKbZVxUQ4kIZfqGv7kp5vMG8qM+rxE0ff y0j/WO3b8WIM34PoCjukuO8+TFJc6HU51pbvTkMlDlUj4asMiiNy5ongYqZ5xDBPvYRe WayxD91Nlo0j2m0VptnpkYIo4ITdeJMBtVXZMstNL5uRIid0soNprtOkLSAUnploWrko 6I96mKkmgIuXkzcuU1rinI9xp59gxKVxb+6Dgptt/Zi0M3f1l9TlXnMdVt7iv2VQ/2M2 dIO88u5sXC4Nd8jzeGlHAbCSPvb2ERUiNIxd71twmGyci7GbwHAwq01eCDlpksiW9jg5 88wA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766228595; x=1766833395; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=QmUq8CHI3JkNtGWiGFMZyA0zmDp2QcgRy2thJxu/yXE=; b=hAEeMUFmXS5q+mAoV3877poLMaxgrD5J9LYnSwclzCROSB0d58U/P00UzH2x9Mrg/G s0CVy6eZM1wl3hQszLgxHvWROQ0m8ja0WM1GuuXXZEIpyUVqW+TPx9w5NjHFPzAQ4fb2 sLVJcAOCzK8pRNHm11bFNzcI2fz3p8f+YySZPoYYoRW8flSTKmy14kPLpdrQ6DLFA7ef Xyfz1S3PJxiC4OUPGjqlsYNSfG7+xn5M6JYcVw8gOw08EtfeZxxdj6TjF99zcecfTwlg bfr2foejjZ3pR8Ga2mgxyWNnPdadQ7VNTdbsIXzMs6yC212/BQPuJEkDNj95R/Rt+oia o3Kw== X-Forwarded-Encrypted: i=1; AJvYcCXThSAR49vfwqEO5TL8X73R04lMxAhivz1KIP4ACg6U6eGMawnf2aJyEDsDbhFMQTNNpbGD5HKeqHtc37Q=@vger.kernel.org X-Gm-Message-State: AOJu0YyWM2j8C3A7NOMyfy4cEPsXVSFK47wVfj5hVFXfR/WdWJv7D1gg tetz6dGzj0N0onHdxPiuVkanfE7a8RqUarxT+VegaCtd8ioRxN77jC9P X-Gm-Gg: AY/fxX6F+gNUXKbxC8cIosgWDsJeFwQxrQMr1w5KtFLAODSXVxgTLakt+1JwW/Nqf57 UiSvB9EI2ADYj8DMMJWD+0NpX3ZZQZj3RPgexjLBcVgYwqXzxGwmY2EFHKxEV1zVQTPBCxF3z6R F8m0+FnYpTx81LALM3SfKP64f/alhWrRwP6nfImts04sX7YeTT4aq1H/HHhIoXLYSnjKw82U3pr PFCzP0jgx+ic7jhvWZ7QGAj5xzU9bO1Dc8v85bR696zFi73D1wNgnFyunLG+0zlicN6BvOV/EhC jgO8Xn24HyorlVKF/x2fm+XsusuO5lZmaEp5vXbQ+fWelNGOjJJtOrwXXUAN+RbHm2cweU0GaY9 PTiy7zIUqYK9i+sBqXyRXN+UJnFsLvmAHM0ATJFB3yjOWHWFmReXyFRZKjaQvtLbooUqn0SLTcB naKoHTcAIbVcGTdJuPBJJJXBuuVLV9xJUfJqNB1UixKufeO+dXklQ= X-Google-Smtp-Source: AGHT+IGbAZ0JhMv5C7UPQHUgypPKxamy+ux8uZDHLsQ7+9J/dXNnA0eGTxwDCnt+Pb+rIopbPEgwTA== X-Received: by 2002:a05:6a00:e11:b0:7b9:4e34:621b with SMTP id d2e1a72fcca58-7ff6421137cmr4854852b3a.12.1766228595290; Sat, 20 Dec 2025 03:03:15 -0800 (PST) Received: from ionutnechita-arz2022.localdomain ([2a02:2f0e:c406:a500:4e4:f8f7:202b:9c23]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7ff7a84368dsm5015547b3a.2.2025.12.20.03.03.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 Dec 2025 03:03:14 -0800 (PST) From: "Ionut Nechita (WindRiver)" X-Google-Original-From: "Ionut Nechita (WindRiver)" To: axboe@kernel.dk, ming.lei@redhat.com Cc: gregkh@linuxfoundation.org, muchun.song@linux.dev, sashal@kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, Ionut Nechita Subject: [PATCH 1/2] block/blk-mq: fix RT kernel regression with queue_lock in hot path Date: Sat, 20 Dec 2025 13:02:40 +0200 Message-ID: <20251220110241.8435-2-ionut.nechita@windriver.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251220110241.8435-1-ionut.nechita@windriver.com> References: <20251220110241.8435-1-ionut.nechita@windriver.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ionut Nechita Commit 679b1874eba7 ("block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding") introduced queue_lock acquisition in blk_mq_run_hw_queue() to synchronize QUEUE_FLAG_QUIESCED checks. On RT kernels (CONFIG_PREEMPT_RT), regular spinlocks are converted to rt_mutex (sleeping locks). When multiple MSI-X IRQ threads process I/O completions concurrently, they contend on queue_lock in the hot path, causing all IRQ threads to enter D (uninterruptible sleep) state. This serializes interrupt processing completely. Test case (MegaRAID 12GSAS with 8 MSI-X vectors on RT kernel): - Good (v6.6.52-rt): 640 MB/s sequential read - Bad (v6.6.64-rt): 153 MB/s sequential read (-76% regression) - 6-8 out of 8 MSI-X IRQ threads stuck in D-state waiting on queue_lock The original commit message mentioned memory barriers as an alternative approach. Use full memory barriers (smp_mb) instead of queue_lock to provide the same ordering guarantees without sleeping in RT kernel. Memory barriers ensure proper synchronization: - CPU0 either sees QUEUE_FLAG_QUIESCED cleared, OR - CPU1 sees dispatch list/sw queue bitmap updates This maintains correctness while avoiding lock contention that causes RT kernel IRQ threads to sleep in the I/O completion path. Fixes: 679b1874eba7 ("block: fix ordering between checking QUEUE_FLAG_QUIES= CED request adding") Cc: stable@vger.kernel.org Signed-off-by: Ionut Nechita --- block/blk-mq.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 5da948b07058..5fb8da4958d0 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2292,22 +2292,19 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx= , bool async) =20 might_sleep_if(!async && hctx->flags & BLK_MQ_F_BLOCKING); =20 + /* + * First lockless check to avoid unnecessary overhead. + * Memory barrier below synchronizes with blk_mq_unquiesce_queue(). + */ need_run =3D blk_mq_hw_queue_need_run(hctx); if (!need_run) { - unsigned long flags; - - /* - * Synchronize with blk_mq_unquiesce_queue(), because we check - * if hw queue is quiesced locklessly above, we need the use - * ->queue_lock to make sure we see the up-to-date status to - * not miss rerunning the hw queue. - */ - spin_lock_irqsave(&hctx->queue->queue_lock, flags); + /* Synchronize with blk_mq_unquiesce_queue() */ + smp_mb(); need_run =3D blk_mq_hw_queue_need_run(hctx); - spin_unlock_irqrestore(&hctx->queue->queue_lock, flags); - if (!need_run) return; + /* Ensure dispatch list/sw queue updates visible before execution */ + smp_mb(); } =20 if (async || !cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)) { --=20 2.52.0