From nobody Sat Feb 7 15:12:15 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD743EB64DD for ; Thu, 29 Jun 2023 11:07:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232210AbjF2LHX (ORCPT ); Thu, 29 Jun 2023 07:07:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232138AbjF2LHQ (ORCPT ); Thu, 29 Jun 2023 07:07:16 -0400 Received: from out-55.mta1.migadu.com (out-55.mta1.migadu.com [IPv6:2001:41d0:203:375::37]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94B16E72 for ; Thu, 29 Jun 2023 04:07:15 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1688036833; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dNin81+NgdWvZKUgysygT1wpEuRFau3v3obyeOkX5fM=; b=wwqEmJdrPmGhPaPCNNeah4+ETGQTleAhQTC7xostmyZuZdJychdvBYIDsi+H4x6MrfHW8p c6P2ePqkoYs95JZypBggJd7pyt64ZMEfrqpnao0izFegeXWFKgAL8UwFUZRoOWGW+V21oL hto1QBQcoDWwYWAKXB5SN1k+aLYpCkk= From: chengming.zhou@linux.dev To: axboe@kernel.dk, ming.lei@redhat.com, hch@lst.de, tj@kernel.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Chengming Zhou Subject: [PATCH v2 1/4] blk-mq: use percpu csd to remote complete instead of per-rq csd Date: Thu, 29 Jun 2023 19:03:56 +0800 Message-Id: <20230629110359.1111832-2-chengming.zhou@linux.dev> In-Reply-To: <20230629110359.1111832-1-chengming.zhou@linux.dev> References: <20230629110359.1111832-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Chengming Zhou If request need to be completed remotely, we insert it into percpu llist, and smp_call_function_single_async() if llist is empty previously. We don't need to use per-rq csd, percpu csd is enough. And the size of struct request is decreased by 24 bytes. This way is cleaner, and looks correct, given block softirq is guaranteed t= o be scheduled to consume the list if one new request is added to this percpu li= st, either smp_call_function_single_async() returns -EBUSY or 0. Signed-off-by: Chengming Zhou Reviewed-by: Ming Lei --- v2: - Change to use call_single_data_t, which avoid to use 2 cache lines for 1 csd, as suggested by Ming Lei. - Improve the commit log, the explanation is copied from Ming Lei. --- block/blk-mq.c | 12 ++++++++---- include/linux/blk-mq.h | 5 +---- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index decb6ab2d508..e52200edd2b1 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -43,6 +43,7 @@ #include "blk-ioprio.h" =20 static DEFINE_PER_CPU(struct llist_head, blk_cpu_done); +static DEFINE_PER_CPU(call_single_data_t, blk_cpu_csd); =20 static void blk_mq_insert_request(struct request *rq, blk_insert_t flags); static void blk_mq_request_bypass_insert(struct request *rq, @@ -1156,13 +1157,13 @@ static void blk_mq_complete_send_ipi(struct request= *rq) { struct llist_head *list; unsigned int cpu; + call_single_data_t *csd; =20 cpu =3D rq->mq_ctx->cpu; list =3D &per_cpu(blk_cpu_done, cpu); - if (llist_add(&rq->ipi_list, list)) { - INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq); - smp_call_function_single_async(cpu, &rq->csd); - } + csd =3D &per_cpu(blk_cpu_csd, cpu); + if (llist_add(&rq->ipi_list, list)) + smp_call_function_single_async(cpu, csd); } =20 static void blk_mq_raise_softirq(struct request *rq) @@ -4796,6 +4797,9 @@ static int __init blk_mq_init(void) =20 for_each_possible_cpu(i) init_llist_head(&per_cpu(blk_cpu_done, i)); + for_each_possible_cpu(i) + INIT_CSD(&per_cpu(blk_cpu_csd, i), + __blk_mq_complete_request_remote, NULL); open_softirq(BLOCK_SOFTIRQ, blk_done_softirq); =20 cpuhp_setup_state_nocalls(CPUHP_BLOCK_SOFTIRQ_DEAD, diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index f401067ac03a..070551197c0e 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -182,10 +182,7 @@ struct request { rq_end_io_fn *saved_end_io; } flush; =20 - union { - struct __call_single_data csd; - u64 fifo_time; - }; + u64 fifo_time; =20 /* * completion callback. --=20 2.39.2 From nobody Sat Feb 7 15:12:15 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37C09C001B3 for ; Thu, 29 Jun 2023 11:07:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232131AbjF2LHi (ORCPT ); Thu, 29 Jun 2023 07:07:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232170AbjF2LHU (ORCPT ); Thu, 29 Jun 2023 07:07:20 -0400 Received: from out-53.mta1.migadu.com (out-53.mta1.migadu.com [95.215.58.53]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F3E31FD7 for ; Thu, 29 Jun 2023 04:07:18 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1688036836; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MamvVjI1jeXw3Kjngjhi6g/DzhtYYjp3x8kVfhCNxDw=; b=gLJaqTIYrWNNdCKmHcyxDtI7XwL0mz5l3b8romo8vPvm3yG7ob3tBombjqyDMsDaGTHxQb +kHsA+GFLKWIs8pAEGKOvbbzIVfb/nq9my9qTM5RpeenzPBIiyKWjry3rp6b7NL5Rz8VSt 15Y7uKyzuXr8dJjMaVC2yBqu+gsqTXE= From: chengming.zhou@linux.dev To: axboe@kernel.dk, ming.lei@redhat.com, hch@lst.de, tj@kernel.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Chengming Zhou Subject: [PATCH v2 2/4] blk-flush: count inflight flush_data requests Date: Thu, 29 Jun 2023 19:03:57 +0800 Message-Id: <20230629110359.1111832-3-chengming.zhou@linux.dev> In-Reply-To: <20230629110359.1111832-1-chengming.zhou@linux.dev> References: <20230629110359.1111832-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Chengming Zhou The flush state machine use a double list to link all inflight flush_data requests, to avoid issuing separate post-flushes for these flush_data requests which shared PREFLUSH. So we can't reuse rq->queuelist, this is why we need rq->flush.list In preparation of the next patch that reuse rq->queuelist for flush state machine, we change the double linked list to unsigned long counter, which count all inflight flush_data requests. This is ok since we only need to know if there is any inflight flush_data request, so unsigned long counter is good. Signed-off-by: Chengming Zhou --- block/blk-flush.c | 9 +++++---- block/blk.h | 5 ++--- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index dba392cf22be..bb7adfc2a5da 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -187,7 +187,8 @@ static void blk_flush_complete_seq(struct request *rq, break; =20 case REQ_FSEQ_DATA: - list_move_tail(&rq->flush.list, &fq->flush_data_in_flight); + list_del_init(&rq->flush.list); + fq->flush_data_in_flight++; spin_lock(&q->requeue_lock); list_add_tail(&rq->queuelist, &q->flush_list); spin_unlock(&q->requeue_lock); @@ -299,7 +300,7 @@ static void blk_kick_flush(struct request_queue *q, str= uct blk_flush_queue *fq, return; =20 /* C2 and C3 */ - if (!list_empty(&fq->flush_data_in_flight) && + if (fq->flush_data_in_flight && time_before(jiffies, fq->flush_pending_since + FLUSH_PENDING_TIMEOUT)) return; @@ -374,6 +375,7 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct r= equest *rq, * the comment in flush_end_io(). */ spin_lock_irqsave(&fq->mq_flush_lock, flags); + fq->flush_data_in_flight--; blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error); spin_unlock_irqrestore(&fq->mq_flush_lock, flags); =20 @@ -445,7 +447,7 @@ bool blk_insert_flush(struct request *rq) blk_rq_init_flush(rq); rq->flush.seq |=3D REQ_FSEQ_POSTFLUSH; spin_lock_irq(&fq->mq_flush_lock); - list_move_tail(&rq->flush.list, &fq->flush_data_in_flight); + fq->flush_data_in_flight++; spin_unlock_irq(&fq->mq_flush_lock); return false; default: @@ -496,7 +498,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node,= int cmd_size, =20 INIT_LIST_HEAD(&fq->flush_queue[0]); INIT_LIST_HEAD(&fq->flush_queue[1]); - INIT_LIST_HEAD(&fq->flush_data_in_flight); =20 return fq; =20 diff --git a/block/blk.h b/block/blk.h index 608c5dcc516b..686712e13835 100644 --- a/block/blk.h +++ b/block/blk.h @@ -15,15 +15,14 @@ struct elevator_type; extern struct dentry *blk_debugfs_root; =20 struct blk_flush_queue { + spinlock_t mq_flush_lock; unsigned int flush_pending_idx:1; unsigned int flush_running_idx:1; blk_status_t rq_status; unsigned long flush_pending_since; struct list_head flush_queue[2]; - struct list_head flush_data_in_flight; + unsigned long flush_data_in_flight; struct request *flush_rq; - - spinlock_t mq_flush_lock; }; =20 bool is_flush_rq(struct request *req); --=20 2.39.2 From nobody Sat Feb 7 15:12:15 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DC2FEB64DD for ; Thu, 29 Jun 2023 11:07:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232313AbjF2LHt (ORCPT ); Thu, 29 Jun 2023 07:07:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232197AbjF2LHW (ORCPT ); Thu, 29 Jun 2023 07:07:22 -0400 Received: from out-52.mta1.migadu.com (out-52.mta1.migadu.com [95.215.58.52]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C0B11FCB for ; Thu, 29 Jun 2023 04:07:20 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1688036839; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hdKaqogn/I3BFaAaWb1R9T3dUmGgvZb8Wo54xKMjRbQ=; b=Silj4FteQiWcNtz1G5WBCi+kTB4VQErkWfbaa7zSefq9ofMpCw2enGXVGq1vM3jhZTLgFn NEpX1/qmz+ji/HU91h7L4sqAwSCrqyD+NBNJCJer5WZU0VA0mzEShHlAz39RAbTxvuJ6UF SZ1nj4EpaWnZkhJ282cITiMpW021VZQ= From: chengming.zhou@linux.dev To: axboe@kernel.dk, ming.lei@redhat.com, hch@lst.de, tj@kernel.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Chengming Zhou Subject: [PATCH v2 3/4] blk-flush: reuse rq queuelist in flush state machine Date: Thu, 29 Jun 2023 19:03:58 +0800 Message-Id: <20230629110359.1111832-4-chengming.zhou@linux.dev> In-Reply-To: <20230629110359.1111832-1-chengming.zhou@linux.dev> References: <20230629110359.1111832-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Chengming Zhou Since we don't need to maintain inflight flush_data requests list anymore, we can reuse rq->queuelist for flush pending list. This patch decrease the size of struct request by 16 bytes. Signed-off-by: Chengming Zhou --- block/blk-flush.c | 12 +++++------- include/linux/blk-mq.h | 1 - 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index bb7adfc2a5da..81588edbe8b0 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -183,14 +183,13 @@ static void blk_flush_complete_seq(struct request *rq, /* queue for flush */ if (list_empty(pending)) fq->flush_pending_since =3D jiffies; - list_move_tail(&rq->flush.list, pending); + list_move_tail(&rq->queuelist, pending); break; =20 case REQ_FSEQ_DATA: - list_del_init(&rq->flush.list); fq->flush_data_in_flight++; spin_lock(&q->requeue_lock); - list_add_tail(&rq->queuelist, &q->flush_list); + list_move_tail(&rq->queuelist, &q->flush_list); spin_unlock(&q->requeue_lock); blk_mq_kick_requeue_list(q); break; @@ -202,7 +201,7 @@ static void blk_flush_complete_seq(struct request *rq, * flush data request completion path. Restore @rq for * normal completion and end it. */ - list_del_init(&rq->flush.list); + list_del_init(&rq->queuelist); blk_flush_restore_request(rq); blk_mq_end_request(rq, error); break; @@ -258,7 +257,7 @@ static enum rq_end_io_ret flush_end_io(struct request *= flush_rq, fq->flush_running_idx ^=3D 1; =20 /* and push the waiting requests to the next stage */ - list_for_each_entry_safe(rq, n, running, flush.list) { + list_for_each_entry_safe(rq, n, running, queuelist) { unsigned int seq =3D blk_flush_cur_seq(rq); =20 BUG_ON(seq !=3D REQ_FSEQ_PREFLUSH && seq !=3D REQ_FSEQ_POSTFLUSH); @@ -292,7 +291,7 @@ static void blk_kick_flush(struct request_queue *q, str= uct blk_flush_queue *fq, { struct list_head *pending =3D &fq->flush_queue[fq->flush_pending_idx]; struct request *first_rq =3D - list_first_entry(pending, struct request, flush.list); + list_first_entry(pending, struct request, queuelist); struct request *flush_rq =3D fq->flush_rq; =20 /* C1 described at the top of this file */ @@ -386,7 +385,6 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct r= equest *rq, static void blk_rq_init_flush(struct request *rq) { rq->flush.seq =3D 0; - INIT_LIST_HEAD(&rq->flush.list); rq->rq_flags |=3D RQF_FLUSH_SEQ; rq->flush.saved_end_io =3D rq->end_io; /* Usually NULL */ rq->end_io =3D mq_flush_data_end_io; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 070551197c0e..96644d6f8d18 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -178,7 +178,6 @@ struct request { =20 struct { unsigned int seq; - struct list_head list; rq_end_io_fn *saved_end_io; } flush; =20 --=20 2.39.2 From nobody Sat Feb 7 15:12:15 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AEE4EB64D9 for ; Thu, 29 Jun 2023 11:08:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232196AbjF2LIC (ORCPT ); Thu, 29 Jun 2023 07:08:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232180AbjF2LHZ (ORCPT ); Thu, 29 Jun 2023 07:07:25 -0400 Received: from out-12.mta1.migadu.com (out-12.mta1.migadu.com [IPv6:2001:41d0:203:375::c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F408294E for ; Thu, 29 Jun 2023 04:07:24 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1688036842; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/O8YJWAa7A//A8YJ1hLAuWA8H/d3IRDfJMQYLuFl59Q=; b=mbIzMFKmGOLPlHTie0bZkFzRR4kKsmNBsTYTIAcXDY9PhKLh+AWN83tEAps+QrlCmHpQJ5 6vPVir4H9Ri7Ue7A0rX45a+10TN8gVkVcHKOdW31Bm1xjWYh2rKJU5FdRDZn+3xBXxdoKu lTL8df0btTRLK09OBxfpmKmt+c9Wkm4= From: chengming.zhou@linux.dev To: axboe@kernel.dk, ming.lei@redhat.com, hch@lst.de, tj@kernel.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Chengming Zhou Subject: [PATCH v2 4/4] blk-mq: delete unused completion_data in struct request Date: Thu, 29 Jun 2023 19:03:59 +0800 Message-Id: <20230629110359.1111832-5-chengming.zhou@linux.dev> In-Reply-To: <20230629110359.1111832-1-chengming.zhou@linux.dev> References: <20230629110359.1111832-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Chengming Zhou After global search, I found "completion_data" in struct request is not used anywhere, so just clean it up by the way. Signed-off-by: Chengming Zhou --- include/linux/blk-mq.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 96644d6f8d18..ab790eba5fcf 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -158,13 +158,11 @@ struct request { =20 /* * The rb_node is only used inside the io scheduler, requests - * are pruned when moved to the dispatch queue. So let the - * completion_data share space with the rb_node. + * are pruned when moved to the dispatch queue. */ union { struct rb_node rb_node; /* sort/lookup */ struct bio_vec special_vec; - void *completion_data; }; =20 /* --=20 2.39.2