From nobody Thu Dec 18 19:29:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7C43ECAAD8 for ; Fri, 23 Sep 2022 00:55:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231680AbiIWAzZ (ORCPT ); Thu, 22 Sep 2022 20:55:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231556AbiIWAzQ (ORCPT ); Thu, 22 Sep 2022 20:55:16 -0400 Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7F3578222; Thu, 22 Sep 2022 17:55:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1663894517; x=1695430517; h=from:to:cc:subject:date:message-id:mime-version; bh=YgcEuKjEWfOKVyJG17iTn15cj0QAKaMPA6iEniYStng=; b=Obuy8krZLvurCU9RDI8ntBhdW6dsivW3+feM8sohxwcd2gPNfPWkYDEA fymfY4AYjAVmfQT4GO2taFa8WJC6t850IfX7rWhMu5WddUpBwfUkB9DiJ HzqTPFxTWKEcP0j+LIwXT4FaPbLBbBWYjGZXRP0ogRww5oul1DVdL+dCF M=; X-Amazon-filename: ftest_write.sh, allow more ext4-rsv-conversion workqueue.patch X-IronPort-AV: E=Sophos;i="5.93,337,1654560000"; d="sh'?scan'208,223";a="227962107" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-iad-1a-828bd003.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-33001.sea14.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Sep 2022 00:55:02 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-iad-1a-828bd003.us-east-1.amazon.com (Postfix) with ESMTPS id 9CAAF80E93; Fri, 23 Sep 2022 00:55:00 +0000 (UTC) Received: from EX13D23UWC002.ant.amazon.com (10.43.162.22) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.38; Fri, 23 Sep 2022 00:55:00 +0000 Received: from EX19D017UWC003.ant.amazon.com (10.13.139.227) by EX13D23UWC002.ant.amazon.com (10.43.162.22) with Microsoft SMTP Server (TLS) id 15.0.1497.38; Fri, 23 Sep 2022 00:54:59 +0000 Received: from EX19D017UWC003.ant.amazon.com ([fe80::78e9:1d67:81fd:68c5]) by EX19D017UWC003.ant.amazon.com ([fe80::78e9:1d67:81fd:68c5%6]) with mapi id 15.02.1118.012; Fri, 23 Sep 2022 00:54:59 +0000 From: "Lu, Davina" To: "linux-ext4@vger.kernel.org" , "linux-kernel@vger.kernel.org" CC: "Kiselev, Oleg" , "Liu, Frank" Subject: significant drop fio IOPS performance on v5.10 Thread-Topic: significant drop fio IOPS performance on v5.10 Thread-Index: AdjO5vTzPItE0he4TsCHXHwBRGK6iA== Date: Fri, 23 Sep 2022 00:54:59 +0000 Message-ID: <357ace228adf4e859df5e9f3f4f18b49@amazon.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-originating-ip: [10.43.160.104] Content-Type: multipart/mixed; boundary="_003_357ace228adf4e859df5e9f3f4f18b49amazoncom_" MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --_003_357ace228adf4e859df5e9f3f4f18b49amazoncom_ Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Hello, I was profiling the 5.10 kernel and comparing it to 4.14. On a system with= 64 virtual CPUs and 256 GiB of RAM, I am observing a significant drop in I= O performance. Using the following FIO with the script "sudo ftest_write.sh= " in attachment, I saw FIO iops result drop from 22K to less tha= n 1K.=20 The script simply does: mount a the EXT4 16GiB volume with max IOPS 64000K,= mounting option is " -o noatime,nodiratime,data=3Dordered", then run fio w= ith 2048 fio wring thread with 28800000 file size with { --name=3D16kb_rand= _write_only_2048_jobs --directory=3D/rdsdbdata1 --rw=3Drandwrite --ioengine= =3Dsync --buffered=3D1 --bs=3D16k --max-jobs=3D2048 --numjobs=3D2048 --runt= ime=3D60 --time_based --thread --filesize=3D28800000 --fsync=3D1 --group_re= porting }. My analyzing is that the degradation is introduce by commit {244adf6426ee31= a83f397b700d964cff12a247d3} and the issue is the contention on rsv_conversi= on_wq. The simplest option is to increase the journal size, but that intro= duces more operational complexity. Another option is to add the following = change in attachment "allow more ext4-rsv-conversion workqueue.patch" From 27e1b0e14275a281b3529f6a60c7b23a81356751 Mon Sep 17 00:00:00 2001 From: davinalu Date: Fri, 23 Sep 2022 00:43:53 +0000 Subject: [PATCH] allow more ext4-rsv-conversion workqueue to speedup fio w= riting --- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a0af833f7da7..6b34298cdc3b 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4963,7 +4963,7 @@ static int ext4_fill_super(struct super_block *sb, vo= id *data, int silent) * concurrency isn't really necessary. Limit it to 1. */ EXT4_SB(sb)->rsv_conversion_wq =3D - alloc_workqueue("ext4-rsv-conversion", WQ_MEM_RECLAIM | WQ_= UNBOUND, 1); + alloc_workqueue("ext4-rsv-conversion", WQ_MEM_RECLAIM | WQ_= UNBOUND | __WQ_ORDERED, 0); if (!EXT4_SB(sb)->rsv_conversion_wq) { printk(KERN_ERR "EXT4-fs: failed to create workqueue\n"); ret =3D -ENOMEM; My thought is: If the max_active is 1, it means the "__WQ_ORDERED" combined= with WQ_UNBOUND setting, based on alloc_workqueue(). So I added it . I am not sure should we need "__WQ_ORDERED" or not? without "__WQ_ORDERED" = it looks also work at my testbed, but I added since not much fio TP differe= nce on my testbed result with/out "__WQ_ORDERED". From My understanding and observation: with dioread_unlock and delay_alloc = both enabled, the bio_endio() and ext4_writepages() will trigger this wor= k queue to ext4_do_flush_completed_IO(). Looks like the work queue is an on= e-by-one updating: at EXT4 extend.c io_end->list_vec list only have one io= _end_vec each time. So if the BIO has high performance, and we have only on= e thread to do EXT4 flush will be an bottleneck here. The "ext4-rsv-convers= ion" this workqueue is mainly for update the EXT4_IO_END_UNWRITTEN extend b= lock(only exist on dioread_unlock and delay_alloc options are set) and exte= nd status if I understand correctly here. Am I correct? This works on my test system and passes xfstests, but will this cause any = corruption on ext4 extends blocks updates, not even sure about the journal = transaction updates either? Can you tell me what I will break if this change is made? Thanks Davina --_003_357ace228adf4e859df5e9f3f4f18b49amazoncom_ Content-Type: application/octet-stream; name="ftest_write.sh" Content-Description: ftest_write.sh Content-Disposition: attachment; filename="ftest_write.sh"; size=1068; creation-date="Fri, 23 Sep 2022 00:11:42 GMT"; modification-date="Fri, 23 Sep 2022 00:11:36 GMT" Content-Transfer-Encoding: base64 IyEvYmluL2Jhc2gNCg0KIyBzZXR1cCwgbWFrZSBzdXJlIGluc3RhbGwgZmlvIGFuZCBlMmZzcHJv ZyB0b29scyBmaXJzdA0KaWYgWyAteiAiJDEiIF07dGhlbg0KICAgICAgICBlY2hvICdVc2FnZTpm dGVzdCA8ZGV2X25hbWU+Jw0KICAgICAgICBleGl0IDANCmZpDQplY2hvICJDcmVhdGUgL3Jkc2Ri ZGF0YTEvIg0KbWtkaXIgL3Jkc2RiZGF0YTEvDQpta2UyZnMgLW0gMSAtdCBleHQ0IC1iIDQwOTYg LUwgL3Jkc2RiZGF0YSAvZGV2LyQxIC1KIHNpemU9MTI4DQpzbGVlcCAxDQptb3VudCAtdCBleHQ0 IC1vIG5vYXRpbWUsbm9kaXJhdGltZSxkYXRhPW9yZGVyZWQgL2Rldi8kMSAvcmRzZGJkYXRhMQ0K DQojIHRlc3QNCiNmb3IgaSBpbiBgc2VxIDEgMTBgOyBkbw0KICAgICAgICBybSAtcmYgL3Jkc2Ri ZGF0YTEvKg0KICAgICAgICAvdXNyL2Jpbi9maW8gLS1uYW1lPTE2a2JfcmFuZF93cml0ZV9vbmx5 XzIwNDhfam9icyAtLWRpcmVjdG9yeT0vcmRzZGJkYXRhMSAtLXJ3PXJhbmR3cml0ZSAtLWlvZW5n aW5lPXN5bmMgLS1idWZmZXJlZD0xIC0tYnM9MTZrIC0tbWF4LWpvYnM9MjA0OCAtLW51bWpvYnM9 MjA0OCAtLXJ1bnRpbWU9MzAgLS10aHJlYWQgLS1maWxlc2l6ZT0yODgwMDAwMCAtLWZzeW5jPTEg LS1ncm91cF9yZXBvcnRpbmcgLS1jcmVhdGVfb25seT0xID4gL2Rldi9udWxsDQogICAgICAgIHN1 ZG8gZWNobyAxID4gL3Byb2Mvc3lzL3ZtL2Ryb3BfY2FjaGVzDQogICAgICAgIGVjaG8gInN0YXJ0 IHRlc3QgJHtpfSINCiAgICAgICAgL3Vzci9iaW4vZmlvIC0tbmFtZT0xNmtiX3JhbmRfd3JpdGVf b25seV8yMDQ4X2pvYnMgLS1kaXJlY3Rvcnk9L3Jkc2RiZGF0YTEgLS1ydz1yYW5kd3JpdGUgLS1p b2VuZ2luZT1zeW5jIC0tYnVmZmVyZWQ9MSAtLWJzPTE2ayAtLW1heC1qb2JzPTIwNDggLS1udW1q b2JzPTIwNDggLS1ydW50aW1lPTYwIC0tdGltZV9iYXNlZCAtLXRocmVhZCAtLWZpbGVzaXplPTI4 ODAwMDAwIC0tZnN5bmM9MSAtLWdyb3VwX3JlcG9ydGluZw0KI2RvbmUNCg0KIyBjbGVhbnVwDQp1 bW91bnQgL3Jkc2RiZGF0YTENCnJtIC9yZHNkYmRhdGExLyAtcmYNCg0K --_003_357ace228adf4e859df5e9f3f4f18b49amazoncom_ Content-Type: application/octet-stream; name="allow more ext4-rsv-conversion workqueue.patch" Content-Description: allow more ext4-rsv-conversion workqueue.patch Content-Disposition: attachment; filename="allow more ext4-rsv-conversion workqueue.patch"; size=1027; creation-date="Fri, 23 Sep 2022 00:47:51 GMT"; modification-date="Fri, 23 Sep 2022 00:47:42 GMT" Content-Transfer-Encoding: base64 RnJvbSAyN2UxYjBlMTQyNzVhMjgxYjM1MjlmNmE2MGM3YjIzYTgxMzU2NzUxIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQ0KRnJvbTogZGF2aW5hbHUgPGRhdmluYWx1QGFtYXpvbi5jb20+DQpEYXRl OiBGcmksIDIzIFNlcCAyMDIyIDAwOjQzOjUzICswMDAwDQpTdWJqZWN0OiBbUEFUQ0hdIGFsbG93 IG1vcmUgZXh0NC1yc3YtY29udmVyc2lvbiB3b3JrcXVldWUgdG8gc3BlZWR1cCBmaW8NCiB3cml0 aW5nDQoNCi0tLQ0KIGZzL2V4dDQvc3VwZXIuYyB8IDIgKy0NCiAxIGZpbGUgY2hhbmdlZCwgMSBp bnNlcnRpb24oKyksIDEgZGVsZXRpb24oLSkNCg0KZGlmZiAtLWdpdCBhL2ZzL2V4dDQvc3VwZXIu YyBiL2ZzL2V4dDQvc3VwZXIuYw0KaW5kZXggYTBhZjgzM2Y3ZGE3Li42YjM0Mjk4Y2RjM2IgMTAw NjQ0DQotLS0gYS9mcy9leHQ0L3N1cGVyLmMNCisrKyBiL2ZzL2V4dDQvc3VwZXIuYw0KQEAgLTQ5 NjMsNyArNDk2Myw3IEBAIHN0YXRpYyBpbnQgZXh0NF9maWxsX3N1cGVyKHN0cnVjdCBzdXBlcl9i bG9jayAqc2IsIHZvaWQgKmRhdGEsIGludCBzaWxlbnQpDQogICAgICAgICAqIGNvbmN1cnJlbmN5 IGlzbid0IHJlYWxseSBuZWNlc3NhcnkuICBMaW1pdCBpdCB0byAxLg0KICAgICAgICAgKi8NCiAg ICAgICAgRVhUNF9TQihzYiktPnJzdl9jb252ZXJzaW9uX3dxID0NCi0gICAgICAgICAgICAgICBh bGxvY193b3JrcXVldWUoImV4dDQtcnN2LWNvbnZlcnNpb24iLCBXUV9NRU1fUkVDTEFJTSB8IFdR X1VOQk9VTkQsIDEpOw0KKyAgICAgICAgICAgICAgIGFsbG9jX3dvcmtxdWV1ZSgiZXh0NC1yc3Yt Y29udmVyc2lvbiIsIFdRX01FTV9SRUNMQUlNIHwgV1FfVU5CT1VORCB8IF9fV1FfT1JERVJFRCwg MCk7DQogICAgICAgIGlmICghRVhUNF9TQihzYiktPnJzdl9jb252ZXJzaW9uX3dxKSB7DQogICAg ICAgICAgICAgICAgcHJpbnRrKEtFUk5fRVJSICJFWFQ0LWZzOiBmYWlsZWQgdG8gY3JlYXRlIHdv cmtxdWV1ZVxuIik7DQogICAgICAgICAgICAgICAgcmV0ID0gLUVOT01FTTsNCi0tDQoyLjM3LjEN Cg== --_003_357ace228adf4e859df5e9f3f4f18b49amazoncom_--