From nobody Sun Dec 14 20:14:36 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3935C6FA82 for ; Sun, 25 Sep 2022 08:51:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230435AbiIYIv1 (ORCPT ); Sun, 25 Sep 2022 04:51:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229674AbiIYIvX (ORCPT ); Sun, 25 Sep 2022 04:51:23 -0400 Received: from smtp-fw-6001.amazon.com (smtp-fw-6001.amazon.com [52.95.48.154]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CA2F2DAAA; Sun, 25 Sep 2022 01:51:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1664095883; x=1695631883; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=Jd64G1ELGY+Szb0KFz4W+irMQzt8TG4hZRQWpxBwscg=; b=gNZroi7VIPgIQ9m3tfYrZM+A/GP+6fAhusX9EJYSQc0yqiP1YdUezFKg GcHK7vzyVT8jFmM041cx3JhhSJ5WxkgVnVi/JIoXXB/vnooyFD+D+9Ifm dVR9uag8rnMrI3DIzEUvLvzxR5EAn3HZko7PPp5FSTq/Om0wPkBGyZU88 I=; Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-pdx-2b-31df91b1.us-west-2.amazon.com) ([10.43.8.2]) by smtp-border-fw-6001.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2022 08:51:12 +0000 Received: from EX13MTAUWB001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan2.pdx.amazon.com [10.236.137.194]) by email-inbound-relay-pdx-2b-31df91b1.us-west-2.amazon.com (Postfix) with ESMTPS id B31F945BE9; Sun, 25 Sep 2022 08:51:10 +0000 (UTC) Received: from EX13D23UWC002.ant.amazon.com (10.43.162.22) by EX13MTAUWB001.ant.amazon.com (10.43.161.207) with Microsoft SMTP Server (TLS) id 15.0.1497.38; Sun, 25 Sep 2022 08:51:10 +0000 Received: from EX19D017UWC003.ant.amazon.com (10.13.139.227) by EX13D23UWC002.ant.amazon.com (10.43.162.22) with Microsoft SMTP Server (TLS) id 15.0.1497.38; Sun, 25 Sep 2022 08:51:09 +0000 Received: from EX19D017UWC003.ant.amazon.com ([fe80::78e9:1d67:81fd:68c5]) by EX19D017UWC003.ant.amazon.com ([fe80::78e9:1d67:81fd:68c5%6]) with mapi id 15.02.1118.012; Sun, 25 Sep 2022 08:51:09 +0000 From: "Lu, Davina" To: "linux-ext4@vger.kernel.org" , "linux-kernel@vger.kernel.org" CC: "Kiselev, Oleg" , "Liu, Frank" , "Mohamed Abuelfotoh, Hazem" Subject: RE: significant drop fio IOPS performance on v5.10 Thread-Topic: significant drop fio IOPS performance on v5.10 Thread-Index: AdjO5vTzPItE0he4TsCHXHwBRGK6iAB1NzvQ Date: Sun, 25 Sep 2022 08:51:09 +0000 Message-ID: <1cdc68e6a98d4e93a95be5d887bcc75d@amazon.com> References: <357ace228adf4e859df5e9f3f4f18b49@amazon.com> In-Reply-To: <357ace228adf4e859df5e9f3f4f18b49@amazon.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.43.162.65] Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org + abuehaze@amazon.com -----Original Message----- From: Lu, Davina=20 Sent: Friday, September 23, 2022 10:55 AM To: linux-ext4@vger.kernel.org; linux-kernel@vger.kernel.org Cc: Kiselev, Oleg ; Liu, Frank Subject: significant drop fio IOPS performance on v5.10 Hello, I was profiling the 5.10 kernel and comparing it to 4.14. On a system with= 64 virtual CPUs and 256 GiB of RAM, I am observing a significant drop in I= O performance. Using the following FIO with the script "sudo ftest_write.sh= " in attachment, I saw FIO iops result drop from 22K to less tha= n 1K.=20 The script simply does: mount a the EXT4 16GiB volume with max IOPS 64000K,= mounting option is " -o noatime,nodiratime,data=3Dordered", then run fio w= ith 2048 fio wring thread with 28800000 file size with { --name=3D16kb_rand= _write_only_2048_jobs --directory=3D/rdsdbdata1 --rw=3Drandwrite --ioengine= =3Dsync --buffered=3D1 --bs=3D16k --max-jobs=3D2048 --numjobs=3D2048 --runt= ime=3D60 --time_based --thread --filesize=3D28800000 --fsync=3D1 --group_re= porting }. My analyzing is that the degradation is introduce by commit {244adf6426ee31= a83f397b700d964cff12a247d3} and the issue is the contention on rsv_conversi= on_wq. The simplest option is to increase the journal size, but that intro= duces more operational complexity. Another option is to add the following = change in attachment "allow more ext4-rsv-conversion workqueue.patch" From 27e1b0e14275a281b3529f6a60c7b23a81356751 Mon Sep 17 00:00:00 2001 From: davinalu Date: Fri, 23 Sep 2022 00:43:53 +0000 Subject: [PATCH] allow more ext4-rsv-conversion workqueue to speedup fio w= riting --- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a0af833f7da7..6b34298c= dc3b 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4963,7 +4963,7 @@ static int ext4_fill_super(struct super_block *sb, vo= id *data, int silent) * concurrency isn't really necessary. Limit it to 1. */ EXT4_SB(sb)->rsv_conversion_wq =3D - alloc_workqueue("ext4-rsv-conversion", WQ_MEM_RECLAIM | WQ_= UNBOUND, 1); + alloc_workqueue("ext4-rsv-conversion", WQ_MEM_RECLAIM |=20 + WQ_UNBOUND | __WQ_ORDERED, 0); if (!EXT4_SB(sb)->rsv_conversion_wq) { printk(KERN_ERR "EXT4-fs: failed to create workqueue\n"); ret =3D -ENOMEM; My thought is: If the max_active is 1, it means the "__WQ_ORDERED" combined= with WQ_UNBOUND setting, based on alloc_workqueue(). So I added it . I am not sure should we need "__WQ_ORDERED" or not? without "__WQ_ORDERED" = it looks also work at my testbed, but I added since not much fio TP differe= nce on my testbed result with/out "__WQ_ORDERED". From My understanding and observation: with dioread_unlock and delay_alloc = both enabled, the bio_endio() and ext4_writepages() will trigger this wor= k queue to ext4_do_flush_completed_IO(). Looks like the work queue is an on= e-by-one updating: at EXT4 extend.c io_end->list_vec list only have one io= _end_vec each time. So if the BIO has high performance, and we have only on= e thread to do EXT4 flush will be an bottleneck here. The "ext4-rsv-convers= ion" this workqueue is mainly for update the EXT4_IO_END_UNWRITTEN extend b= lock(only exist on dioread_unlock and delay_alloc options are set) and exte= nd status if I understand correctly here. Am I correct? This works on my test system and passes xfstests, but will this cause any = corruption on ext4 extends blocks updates, not even sure about the journal = transaction updates either? Can you tell me what I will break if this change is made? Thanks Davina