From nobody Thu Apr 2 10:05:31 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65D66C6FA8E for ; Mon, 26 Sep 2022 07:07:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233794AbiIZHHA (ORCPT ); Mon, 26 Sep 2022 03:07:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233625AbiIZHGr (ORCPT ); Mon, 26 Sep 2022 03:06:47 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A9CE26AC3; Mon, 26 Sep 2022 00:06:46 -0700 (PDT) Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28Q73uin024562; Mon, 26 Sep 2022 07:06:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=3T8n4DAzpX8976dEkrHYHNj0usjR1dSkjNzRQWeQKjk=; b=tZz0eXBoxcfWJ55abi/B2lftnq7fvhI/igzY5U7v86sEORayyz+BFEWsUr9Dg6iaqux/ qTqaWm7QO67o66o0vA+i2JJ//7GEV510jZTfb7J/i29lUvZDsuuGhapAOP+eu69EQ6sW 734M2EiNTXnTYlzjyMgQfJI5zhoDS/uQKaj1zYqgfWSBL5ByHHT8Dt+R/AHPM7f13qTp FP9LB9l9Gs+jwk7QXxLcqQ5fOVJX1r3mzRGm5AGy/sLPvIts5rnuVOM+u4mK1GZo2SrT Rt/9DlAvH0D0O6JkXJrqKAg42jlDOAbYE5h3fX8EQGEBU7HsTTkSE/zMJR/dCrJtjrju vQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtb6u5u3q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:41 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 28Q6PMHx016059; Mon, 26 Sep 2022 07:06:41 GMT Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtb6u5u32-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:41 +0000 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 28Q75a4b024417; Mon, 26 Sep 2022 07:06:38 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma04fra.de.ibm.com with ESMTP id 3jssh91h5d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:38 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 28Q76aj844761562 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Sep 2022 07:06:36 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E74CF11C04C; Mon, 26 Sep 2022 07:06:35 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3347111C04A; Mon, 26 Sep 2022 07:06:33 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.30.221]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 26 Sep 2022 07:06:32 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andreas Dilger , Jan Kara , Ritesh Harjani Subject: [RFC v2 1/8] ext4: Stop searching if PA doesn't satisfy non-extent file Date: Mon, 26 Sep 2022 12:34:52 +0530 Message-Id: <113e30014fdcf409680e20ec1ef4455ace33884d.1664172580.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: u4p5KA2iQY4nM0DBaFyolO-g6eJHlDzR X-Proofpoint-GUID: wqpbA6U1hjza_1Lujul9jui8cpi5Oq0b X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-26_04,2022-09-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 clxscore=1015 priorityscore=1501 mlxlogscore=918 spamscore=0 lowpriorityscore=0 suspectscore=0 mlxscore=0 impostorscore=0 phishscore=0 adultscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2209260043 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If we come across a PA that matches the logical offset but is unable to satisfy a non-extent file due to its physical start being higher than that supported by non extent files, then simply stop searching for another PA and break out of loop. This is because, since PAs don't overlap, we won't be able to find another inode PA which can satisfy the original request. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) --- fs/ext4/mballoc.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 71f5b67d7f28..2e3eb632a216 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4383,8 +4383,13 @@ ext4_mb_use_preallocated(struct ext4_allocation_cont= ext *ac) /* non-extent files can't have physical blocks past 2^32 */ if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) && (pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len) > - EXT4_MAX_BLOCK_FILE_PHYS)) - continue; + EXT4_MAX_BLOCK_FILE_PHYS)) { + /* + * Since PAs don't overlap, we won't find any + * other PA to satisfy this. + */ + break; + } =20 /* found preallocated blocks, use them */ spin_lock(&pa->pa_lock); --=20 2.31.1 From nobody Thu Apr 2 10:05:31 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40053C07E9D for ; Mon, 26 Sep 2022 07:07:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233907AbiIZHHG (ORCPT ); Mon, 26 Sep 2022 03:07:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233811AbiIZHG4 (ORCPT ); Mon, 26 Sep 2022 03:06:56 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CEF2240B1; Mon, 26 Sep 2022 00:06:50 -0700 (PDT) Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28Q4ZO4x021578; Mon, 26 Sep 2022 07:06:45 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=ffM+mRQ5OxJ4gjOuN8SlbuOlVJEumCavFUTpBKNJ95I=; b=nm++zQOCf2l2pHdNj0//GG+tG836Gwgnz1mf6WwEWcsvl+I8YP40jUbjSoGC/p37rmFl /hmPm2DQ2aF8dGW/bhL9VwLWEUxe1TEObW7bUSV9QtYr1NuGLKnQCDR234WmvmzjVYkF qOdf18uDx+60ZUMI0DrO92gugXDQbarhfZzbufWtbzm5xoaJ+tBuAGqDa5JZdIJ3zn0s KY6hYlPWQplYEKNlGV7mj9vpXKZx9jpQaR6Lmdb2EDPrywOYlQg7onCaobpZyR91IwRJ XKekXzuk+8Rski1y9Ds5IpzcDvDwyrs+LnXFA31EBFEM2jW8ZW7Cujnd+04Whj3CcRbK IA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtbtgvycv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:44 +0000 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 28Q5ZxNd023580; Mon, 26 Sep 2022 07:06:44 GMT Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtbtgvybx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:44 +0000 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 28Q75GoY016800; Mon, 26 Sep 2022 07:06:41 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma01fra.de.ibm.com with ESMTP id 3jssh8sgqk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:41 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 28Q776I845416890 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Sep 2022 07:07:06 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3C5C011C04A; Mon, 26 Sep 2022 07:06:39 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 695E111C054; Mon, 26 Sep 2022 07:06:36 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.30.221]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 26 Sep 2022 07:06:36 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andreas Dilger , Jan Kara , Ritesh Harjani Subject: [RFC v2 2/8] ext4: Refactor code related to freeing PAs Date: Mon, 26 Sep 2022 12:34:53 +0530 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: md4C-NwLNnhTlwl8kq_0l1W2OY0QCBxB X-Proofpoint-ORIG-GUID: XZMD91FsheY5BEdGFVbT5qCn22Slth3- X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-26_04,2022-09-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 mlxlogscore=999 malwarescore=0 impostorscore=0 priorityscore=1501 phishscore=0 spamscore=0 bulkscore=0 clxscore=1011 mlxscore=0 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2209260043 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch makes the following changes: * Rename ext4_mb_pa_free to ext4_mb_pa_put_free to better reflect its purpose * Add new ext4_mb_pa_free() which only handles freeing * Refactor ext4_mb_pa_callback() to use ext4_mb_pa_free() There are no functional changes in this patch Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) --- fs/ext4/mballoc.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 2e3eb632a216..8be6f8765a6f 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4531,16 +4531,21 @@ static void ext4_mb_mark_pa_deleted(struct super_bl= ock *sb, } } =20 -static void ext4_mb_pa_callback(struct rcu_head *head) +static void inline ext4_mb_pa_free(struct ext4_prealloc_space *pa) { - struct ext4_prealloc_space *pa; - pa =3D container_of(head, struct ext4_prealloc_space, u.pa_rcu); - + BUG_ON(!pa); BUG_ON(atomic_read(&pa->pa_count)); BUG_ON(pa->pa_deleted =3D=3D 0); kmem_cache_free(ext4_pspace_cachep, pa); } =20 +static void ext4_mb_pa_callback(struct rcu_head *head) +{ + struct ext4_prealloc_space *pa; + pa =3D container_of(head, struct ext4_prealloc_space, u.pa_rcu); + ext4_mb_pa_free(pa); +} + /* * drops a reference to preallocated space descriptor * if this was the last reference and the space is consumed @@ -5067,14 +5072,20 @@ static int ext4_mb_pa_alloc(struct ext4_allocation_= context *ac) return 0; } =20 -static void ext4_mb_pa_free(struct ext4_allocation_context *ac) +static void ext4_mb_pa_put_free(struct ext4_allocation_context *ac) { struct ext4_prealloc_space *pa =3D ac->ac_pa; =20 BUG_ON(!pa); ac->ac_pa =3D NULL; WARN_ON(!atomic_dec_and_test(&pa->pa_count)); - kmem_cache_free(ext4_pspace_cachep, pa); + /* + * current function is only called due to an error or due to + * len of found blocks < len of requested blocks hence the PA has not + * been added to grp->bb_prealloc_list. So we don't need to lock it + */ + pa->pa_deleted =3D 1; + ext4_mb_pa_free(pa); } =20 #ifdef CONFIG_EXT4_DEBUG @@ -5623,13 +5634,13 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, * So we have to free this pa here itself. */ if (*errp) { - ext4_mb_pa_free(ac); + ext4_mb_pa_put_free(ac); ext4_discard_allocated_blocks(ac); goto errout; } if (ac->ac_status =3D=3D AC_STATUS_FOUND && ac->ac_o_ex.fe_len >=3D ac->ac_f_ex.fe_len) - ext4_mb_pa_free(ac); + ext4_mb_pa_put_free(ac); } if (likely(ac->ac_status =3D=3D AC_STATUS_FOUND)) { *errp =3D ext4_mb_mark_diskspace_used(ac, handle, reserv_clstrs); @@ -5648,7 +5659,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, * If block allocation fails then the pa allocated above * needs to be freed here itself. */ - ext4_mb_pa_free(ac); + ext4_mb_pa_put_free(ac); *errp =3D -ENOSPC; } =20 --=20 2.31.1 From nobody Thu Apr 2 10:05:31 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35DBDC07E9D for ; Mon, 26 Sep 2022 07:07:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233912AbiIZHHL (ORCPT ); Mon, 26 Sep 2022 03:07:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233795AbiIZHG5 (ORCPT ); Mon, 26 Sep 2022 03:06:57 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8BD231345; Mon, 26 Sep 2022 00:06:52 -0700 (PDT) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28Q4ZOvs022149; Mon, 26 Sep 2022 07:06:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=WCaGnJR/nP95v1L/TPEC5EtqwHoBOZ14pf7HGb7yqys=; b=H6agPsyW1QmxpfTl1ksR/qFER1/tJsKFduq/+N7ilh1B3npHSvT4RTSh5BkDhoRabvg6 yUUpHg9DZk5YLr+Z6SPb3ujZTjnGiE80GkINmh5/nmlQ4rFvYr4aH/SXYsDX5zBlwIfO tnuWoUiOSE6UufNU+Z7IaX5DMtNKZ0GnmU+BiVSGOsaRJgB6K/+lZ1cULHKnl+KSsG6/ dM/2VseG2MvMULoN04JeYj/xvc3ZBDoeHu01WATOmOaB5gBLAEJMgHzH62LRH3/YmrB/ tJj6G5Z0dg5KwGO1sj43+0gvpYOrJ3dGva21LQV+HEQbxi+GzSo1aDLUDXmhtApIqYHD LA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtb7bdabb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:48 +0000 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 28Q5QqPp014269; Mon, 26 Sep 2022 07:06:47 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtb7bda9q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:47 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 28Q75XEx014722; Mon, 26 Sep 2022 07:06:45 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma06ams.nl.ibm.com with ESMTP id 3jss5j21r1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:44 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 28Q76g1r18350340 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Sep 2022 07:06:42 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8E1E711C04A; Mon, 26 Sep 2022 07:06:42 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B315E11C050; Mon, 26 Sep 2022 07:06:39 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.30.221]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 26 Sep 2022 07:06:39 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andreas Dilger , Jan Kara , Ritesh Harjani Subject: [RFC v2 3/8] ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated() Date: Mon, 26 Sep 2022 12:34:54 +0530 Message-Id: <8ec3ed728f046495c39f01881b57ef12fdabeb2a.1664172580.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: _D04IJcON_a8YMcIGnavoA6QcsVoqi6n X-Proofpoint-GUID: _Ha7nZgbIEuKJg69tudfPim2erDf-GUD X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-26_04,2022-09-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 phishscore=0 mlxlogscore=999 bulkscore=0 clxscore=1015 malwarescore=0 spamscore=0 adultscore=0 mlxscore=0 lowpriorityscore=0 suspectscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2209260043 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Change some variable names to be more consistent and refactor some of the code to make it easier to read. There are no functional changes in this patch Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) --- fs/ext4/mballoc.c | 97 ++++++++++++++++++++++++----------------------- 1 file changed, 49 insertions(+), 48 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 8be6f8765a6f..84950df709bb 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4000,7 +4000,8 @@ ext4_mb_normalize_request(struct ext4_allocation_cont= ext *ac, loff_t orig_size __maybe_unused; ext4_lblk_t start; struct ext4_inode_info *ei =3D EXT4_I(ac->ac_inode); - struct ext4_prealloc_space *pa; + struct ext4_prealloc_space *tmp_pa; + ext4_lblk_t tmp_pa_start, tmp_pa_end; =20 /* do normalize only data requests, metadata requests do not need preallocation */ @@ -4103,56 +4104,53 @@ ext4_mb_normalize_request(struct ext4_allocation_co= ntext *ac, =20 /* check we don't cross already preallocated blocks */ rcu_read_lock(); - list_for_each_entry_rcu(pa, &ei->i_prealloc_list, pa_inode_list) { - ext4_lblk_t pa_end; - - if (pa->pa_deleted) + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + if (tmp_pa->pa_deleted) continue; - spin_lock(&pa->pa_lock); - if (pa->pa_deleted) { - spin_unlock(&pa->pa_lock); + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted) { + spin_unlock(&tmp_pa->pa_lock); continue; } =20 - pa_end =3D pa->pa_lstart + EXT4_C2B(EXT4_SB(ac->ac_sb), - pa->pa_len); + tmp_pa_start =3D tmp_pa->pa_lstart; + tmp_pa_end =3D tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); =20 /* PA must not overlap original request */ - BUG_ON(!(ac->ac_o_ex.fe_logical >=3D pa_end || - ac->ac_o_ex.fe_logical < pa->pa_lstart)); + BUG_ON(!(ac->ac_o_ex.fe_logical >=3D tmp_pa_end || + ac->ac_o_ex.fe_logical < tmp_pa_start)); =20 /* skip PAs this normalized request doesn't overlap with */ - if (pa->pa_lstart >=3D end || pa_end <=3D start) { - spin_unlock(&pa->pa_lock); + if (tmp_pa_start >=3D end || tmp_pa_end <=3D start) { + spin_unlock(&tmp_pa->pa_lock); continue; } - BUG_ON(pa->pa_lstart <=3D start && pa_end >=3D end); + BUG_ON(tmp_pa_start <=3D start && tmp_pa_end >=3D end); =20 /* adjust start or end to be adjacent to this pa */ - if (pa_end <=3D ac->ac_o_ex.fe_logical) { - BUG_ON(pa_end < start); - start =3D pa_end; - } else if (pa->pa_lstart > ac->ac_o_ex.fe_logical) { - BUG_ON(pa->pa_lstart > end); - end =3D pa->pa_lstart; + if (tmp_pa_end <=3D ac->ac_o_ex.fe_logical) { + BUG_ON(tmp_pa_end < start); + start =3D tmp_pa_end; + } else if (tmp_pa_start > ac->ac_o_ex.fe_logical) { + BUG_ON(tmp_pa_start > end); + end =3D tmp_pa_start; } - spin_unlock(&pa->pa_lock); + spin_unlock(&tmp_pa->pa_lock); } rcu_read_unlock(); size =3D end - start; =20 /* XXX: extra loop to check we really don't overlap preallocations */ rcu_read_lock(); - list_for_each_entry_rcu(pa, &ei->i_prealloc_list, pa_inode_list) { - ext4_lblk_t pa_end; + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted =3D=3D 0) { + tmp_pa_start =3D tmp_pa->pa_lstart; + tmp_pa_end =3D tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); =20 - spin_lock(&pa->pa_lock); - if (pa->pa_deleted =3D=3D 0) { - pa_end =3D pa->pa_lstart + EXT4_C2B(EXT4_SB(ac->ac_sb), - pa->pa_len); - BUG_ON(!(start >=3D pa_end || end <=3D pa->pa_lstart)); + BUG_ON(!(start >=3D tmp_pa_end || end <=3D tmp_pa_start)); } - spin_unlock(&pa->pa_lock); + spin_unlock(&tmp_pa->pa_lock); } rcu_read_unlock(); =20 @@ -4362,7 +4360,8 @@ ext4_mb_use_preallocated(struct ext4_allocation_conte= xt *ac) int order, i; struct ext4_inode_info *ei =3D EXT4_I(ac->ac_inode); struct ext4_locality_group *lg; - struct ext4_prealloc_space *pa, *cpa =3D NULL; + struct ext4_prealloc_space *tmp_pa, *cpa =3D NULL; + ext4_lblk_t tmp_pa_start, tmp_pa_end; ext4_fsblk_t goal_block; =20 /* only data can be preallocated */ @@ -4371,18 +4370,20 @@ ext4_mb_use_preallocated(struct ext4_allocation_con= text *ac) =20 /* first, try per-file preallocation */ rcu_read_lock(); - list_for_each_entry_rcu(pa, &ei->i_prealloc_list, pa_inode_list) { + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { =20 /* all fields in this condition don't change, * so we can skip locking for them */ - if (ac->ac_o_ex.fe_logical < pa->pa_lstart || - ac->ac_o_ex.fe_logical >=3D (pa->pa_lstart + - EXT4_C2B(sbi, pa->pa_len))) + tmp_pa_start =3D tmp_pa->pa_lstart; + tmp_pa_end =3D tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); + + if (ac->ac_o_ex.fe_logical < tmp_pa_start || + ac->ac_o_ex.fe_logical >=3D tmp_pa_end) continue; =20 /* non-extent files can't have physical blocks past 2^32 */ if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) && - (pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len) > + (tmp_pa->pa_pstart + EXT4_C2B(sbi, tmp_pa->pa_len) > EXT4_MAX_BLOCK_FILE_PHYS)) { /* * Since PAs don't overlap, we won't find any @@ -4392,16 +4393,16 @@ ext4_mb_use_preallocated(struct ext4_allocation_con= text *ac) } =20 /* found preallocated blocks, use them */ - spin_lock(&pa->pa_lock); - if (pa->pa_deleted =3D=3D 0 && pa->pa_free) { - atomic_inc(&pa->pa_count); - ext4_mb_use_inode_pa(ac, pa); - spin_unlock(&pa->pa_lock); + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted =3D=3D 0 && tmp_pa->pa_free) { + atomic_inc(&tmp_pa->pa_count); + ext4_mb_use_inode_pa(ac, tmp_pa); + spin_unlock(&tmp_pa->pa_lock); ac->ac_criteria =3D 10; rcu_read_unlock(); return true; } - spin_unlock(&pa->pa_lock); + spin_unlock(&tmp_pa->pa_lock); } rcu_read_unlock(); =20 @@ -4425,16 +4426,16 @@ ext4_mb_use_preallocated(struct ext4_allocation_con= text *ac) */ for (i =3D order; i < PREALLOC_TB_SIZE; i++) { rcu_read_lock(); - list_for_each_entry_rcu(pa, &lg->lg_prealloc_list[i], + list_for_each_entry_rcu(tmp_pa, &lg->lg_prealloc_list[i], pa_inode_list) { - spin_lock(&pa->pa_lock); - if (pa->pa_deleted =3D=3D 0 && - pa->pa_free >=3D ac->ac_o_ex.fe_len) { + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted =3D=3D 0 && + tmp_pa->pa_free >=3D ac->ac_o_ex.fe_len) { =20 cpa =3D ext4_mb_check_group_pa(goal_block, - pa, cpa); + tmp_pa, cpa); } - spin_unlock(&pa->pa_lock); + spin_unlock(&tmp_pa->pa_lock); } rcu_read_unlock(); } --=20 2.31.1 From nobody Thu Apr 2 10:05:31 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90029C6FA8E for ; Mon, 26 Sep 2022 07:07:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233826AbiIZHHk (ORCPT ); Mon, 26 Sep 2022 03:07:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233871AbiIZHG6 (ORCPT ); Mon, 26 Sep 2022 03:06:58 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE0B631372; Mon, 26 Sep 2022 00:06:55 -0700 (PDT) Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28Q4ZSj1024914; Mon, 26 Sep 2022 07:06:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=IetkwfUweD3Gi0+PhqVbWTS8N537yLyHgepl+ZxlTvE=; b=jEJ+mmb34xw3lN2rRgFfwpGb5giP7k7pLvnixW0YFGsGNyYHH46psb+F4TSl1r04dwLw Q8z6dl1GLPcM9YYGQKcuUKruWa1mcZA0W8W96z3tyNHzUEibDTO4jF5Bn/Ws2nVYaVDm mtGD0FJuzon4WwkTrMLeLPi2/t+Y8EAJm3MPId1/ptilJTN8D4UV5oZc+v0yzBtBVy9I r8C9e2hi9pS6CyWmsuvVY9RLSALJ1ESObFcWZHx8WeAJ0wY7KQF/eZmp5yFxg95AWy6O cMlpEZ2Xd5j2nmvCsqyGTjfIteyV4YDsZzy3nskHg3ovnc2I8u0A6vZSL14HfPcyK9ft gA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtb6u5u7g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:51 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 28Q6e4Gv014320; Mon, 26 Sep 2022 07:06:50 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtb6u5u6m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:50 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 28Q75V1G027407; Mon, 26 Sep 2022 07:06:48 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma04ams.nl.ibm.com with ESMTP id 3jssh8t11b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:48 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 28Q77Dmx31195446 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Sep 2022 07:07:13 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EF77E11C050; Mon, 26 Sep 2022 07:06:45 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 18D5A11C04A; Mon, 26 Sep 2022 07:06:43 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.30.221]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 26 Sep 2022 07:06:42 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andreas Dilger , Jan Kara , Ritesh Harjani Subject: [RFC v2 4/8] ext4: Move overlap assert logic into a separate function Date: Mon, 26 Sep 2022 12:34:55 +0530 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: IGrYpB58pwexRhVausQzoTpMOb1F6qZ2 X-Proofpoint-GUID: z2x27Ifrq87nHGbb42m1SMATGQCuGw9s X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-26_04,2022-09-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 spamscore=0 lowpriorityscore=0 suspectscore=0 mlxscore=0 impostorscore=0 phishscore=0 adultscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2209260043 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Abstract out the logic to double check for overlaps in normalize_pa to a separate function. Since there has been no reports in past where we have seen any overlaps which hits this bug_on(), in future we can consider calling this function under "#ifdef AGGRESSIVE_CHECK" only. There are no functional changes in this patch Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) --- fs/ext4/mballoc.c | 36 ++++++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 84950df709bb..d1ce34888dcc 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3985,6 +3985,29 @@ static void ext4_mb_normalize_group_request(struct e= xt4_allocation_context *ac) mb_debug(sb, "goal %u blocks for locality group\n", ac->ac_g_ex.fe_len); } =20 +static inline void +ext4_mb_pa_assert_overlap(struct ext4_allocation_context *ac, + ext4_lblk_t start, ext4_lblk_t end) +{ + struct ext4_sb_info *sbi =3D EXT4_SB(ac->ac_sb); + struct ext4_inode_info *ei =3D EXT4_I(ac->ac_inode); + struct ext4_prealloc_space *tmp_pa; + ext4_lblk_t tmp_pa_start, tmp_pa_end; + + rcu_read_lock(); + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted =3D=3D 0) { + tmp_pa_start =3D tmp_pa->pa_lstart; + tmp_pa_end =3D tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); + + BUG_ON(!(start >=3D tmp_pa_end || end <=3D tmp_pa_start)); + } + spin_unlock(&tmp_pa->pa_lock); + } + rcu_read_unlock(); +} + /* * Normalization means making request better in terms of * size and alignment @@ -4141,18 +4164,7 @@ ext4_mb_normalize_request(struct ext4_allocation_con= text *ac, size =3D end - start; =20 /* XXX: extra loop to check we really don't overlap preallocations */ - rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { - spin_lock(&tmp_pa->pa_lock); - if (tmp_pa->pa_deleted =3D=3D 0) { - tmp_pa_start =3D tmp_pa->pa_lstart; - tmp_pa_end =3D tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); - - BUG_ON(!(start >=3D tmp_pa_end || end <=3D tmp_pa_start)); - } - spin_unlock(&tmp_pa->pa_lock); - } - rcu_read_unlock(); + ext4_mb_pa_assert_overlap(ac, start, end); =20 /* * In this function "start" and "size" are normalized for better --=20 2.31.1 From nobody Thu Apr 2 10:05:31 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8F5FC07E9D for ; Mon, 26 Sep 2022 07:07:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233920AbiIZHHs (ORCPT ); Mon, 26 Sep 2022 03:07:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233891AbiIZHG7 (ORCPT ); Mon, 26 Sep 2022 03:06:59 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C80D924962; Mon, 26 Sep 2022 00:06:58 -0700 (PDT) Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28Q4eGcn024910; Mon, 26 Sep 2022 07:06:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=0q0BYZuRJwbMKv3NZXlCr8nj4/9ERnVAsBGtncQ7xqE=; b=bYhqjGvwmq8+q0LzvPPL4Tx1RLBhk9ztvbQrcNfVaISlnbop9r0ou1xj7H+t0HBeosPt UQdVz8klSiTIh9oMnWG5sQgwXsAsH7acxnh/9m+DhBFwoBPdgZo6/cITD1gdTu2w1xjo d7Bpds1PlcHH+uIPhFZ83ys0vZMbAkQ2w1dKAat7FGFHXZEXTFj3Tgjf+P2TWDGBAKP9 OZcYw0B4NzS6OkdRJlrfZg+pKwudrOt/Q5m0GAl7krTMkOGby8ILUwICGVOKhS5ytiTp N/2KFk/553MW4qvH5KsrJz1knncAP2kq0rKO4QkLBewXn1iJUhdYRpsCc00RqnwCKK8o BQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtbxr4n3s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:54 +0000 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 28Q6xetw013542; Mon, 26 Sep 2022 07:06:54 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtbxr4n2x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:54 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 28Q75frv014755; Mon, 26 Sep 2022 07:06:52 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma06ams.nl.ibm.com with ESMTP id 3jss5j21r7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:51 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 28Q76nfY38994416 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Sep 2022 07:06:49 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 990BD11C04A; Mon, 26 Sep 2022 07:06:49 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 95A7811C050; Mon, 26 Sep 2022 07:06:46 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.30.221]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 26 Sep 2022 07:06:46 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andreas Dilger , Jan Kara , Ritesh Harjani Subject: [RFC v2 5/8] ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request() Date: Mon, 26 Sep 2022 12:34:56 +0530 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: WQgwoI0wHryuJtuLEzfsnp37WvK2Mhss X-Proofpoint-GUID: eixyOSNGUA9dq3wMfHaMlbaL9UOjmQ_x X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-26_04,2022-09-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 mlxscore=0 spamscore=0 clxscore=1015 phishscore=0 lowpriorityscore=0 bulkscore=0 priorityscore=1501 mlxlogscore=999 suspectscore=0 impostorscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2209260043 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Abstract out the logic of fixing PA overlaps in ext4_mb_normalize_request to improve readability of code. This also makes it easier to make changes to the overlap logic in future. There are no functional changes in this patch Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) --- fs/ext4/mballoc.c | 110 +++++++++++++++++++++++++++++----------------- 1 file changed, 69 insertions(+), 41 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index d1ce34888dcc..dda9a72c81d9 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4008,6 +4008,74 @@ ext4_mb_pa_assert_overlap(struct ext4_allocation_con= text *ac, rcu_read_unlock(); } =20 +/* + * Given an allocation context "ac" and a range "start", "end", check + * and adjust boundaries if the range overlaps with any of the existing + * preallocatoins stored in the corresponding inode of the allocation cont= ext. + * + *Parameters: + * ac allocation context + * start start of the new range + * end end of the new range + */ +static inline void +ext4_mb_pa_adjust_overlap(struct ext4_allocation_context *ac, + ext4_lblk_t *start, ext4_lblk_t *end) +{ + struct ext4_inode_info *ei =3D EXT4_I(ac->ac_inode); + struct ext4_sb_info *sbi =3D EXT4_SB(ac->ac_sb); + struct ext4_prealloc_space *tmp_pa; + ext4_lblk_t new_start, new_end; + ext4_lblk_t tmp_pa_start, tmp_pa_end; + + new_start =3D *start; + new_end =3D *end; + + /* check we don't cross already preallocated blocks */ + rcu_read_lock(); + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + if (tmp_pa->pa_deleted) + continue; + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted) { + spin_unlock(&tmp_pa->pa_lock); + continue; + } + + tmp_pa_start =3D tmp_pa->pa_lstart; + tmp_pa_end =3D tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); + + /* PA must not overlap original request */ + BUG_ON(!(ac->ac_o_ex.fe_logical >=3D tmp_pa_end || + ac->ac_o_ex.fe_logical < tmp_pa_start)); + + /* skip PAs this normalized request doesn't overlap with */ + if (tmp_pa_start >=3D new_end || tmp_pa_end <=3D new_start) { + spin_unlock(&tmp_pa->pa_lock); + continue; + } + BUG_ON(tmp_pa_start <=3D new_start && tmp_pa_end >=3D new_end); + + /* adjust start or end to be adjacent to this pa */ + if (tmp_pa_end <=3D ac->ac_o_ex.fe_logical) { + BUG_ON(tmp_pa_end < new_start); + new_start =3D tmp_pa_end; + } else if (tmp_pa_start > ac->ac_o_ex.fe_logical) { + BUG_ON(tmp_pa_start > new_end); + new_end =3D tmp_pa_start; + } + spin_unlock(&tmp_pa->pa_lock); + } + rcu_read_unlock(); + + /* XXX: extra loop to check we really don't overlap preallocations */ + ext4_mb_pa_assert_overlap(ac, new_start, new_end); + + *start =3D new_start; + *end =3D new_end; + return; +} + /* * Normalization means making request better in terms of * size and alignment @@ -4022,9 +4090,6 @@ ext4_mb_normalize_request(struct ext4_allocation_cont= ext *ac, loff_t size, start_off; loff_t orig_size __maybe_unused; ext4_lblk_t start; - struct ext4_inode_info *ei =3D EXT4_I(ac->ac_inode); - struct ext4_prealloc_space *tmp_pa; - ext4_lblk_t tmp_pa_start, tmp_pa_end; =20 /* do normalize only data requests, metadata requests do not need preallocation */ @@ -4125,47 +4190,10 @@ ext4_mb_normalize_request(struct ext4_allocation_co= ntext *ac, =20 end =3D start + size; =20 - /* check we don't cross already preallocated blocks */ - rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { - if (tmp_pa->pa_deleted) - continue; - spin_lock(&tmp_pa->pa_lock); - if (tmp_pa->pa_deleted) { - spin_unlock(&tmp_pa->pa_lock); - continue; - } - - tmp_pa_start =3D tmp_pa->pa_lstart; - tmp_pa_end =3D tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); - - /* PA must not overlap original request */ - BUG_ON(!(ac->ac_o_ex.fe_logical >=3D tmp_pa_end || - ac->ac_o_ex.fe_logical < tmp_pa_start)); - - /* skip PAs this normalized request doesn't overlap with */ - if (tmp_pa_start >=3D end || tmp_pa_end <=3D start) { - spin_unlock(&tmp_pa->pa_lock); - continue; - } - BUG_ON(tmp_pa_start <=3D start && tmp_pa_end >=3D end); + ext4_mb_pa_adjust_overlap(ac, &start, &end); =20 - /* adjust start or end to be adjacent to this pa */ - if (tmp_pa_end <=3D ac->ac_o_ex.fe_logical) { - BUG_ON(tmp_pa_end < start); - start =3D tmp_pa_end; - } else if (tmp_pa_start > ac->ac_o_ex.fe_logical) { - BUG_ON(tmp_pa_start > end); - end =3D tmp_pa_start; - } - spin_unlock(&tmp_pa->pa_lock); - } - rcu_read_unlock(); size =3D end - start; =20 - /* XXX: extra loop to check we really don't overlap preallocations */ - ext4_mb_pa_assert_overlap(ac, start, end); - /* * In this function "start" and "size" are normalized for better * alignment and length such that we could preallocate more blocks. --=20 2.31.1 From nobody Thu Apr 2 10:05:31 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A636C07E9D for ; Mon, 26 Sep 2022 07:07:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233936AbiIZHHy (ORCPT ); Mon, 26 Sep 2022 03:07:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57596 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233795AbiIZHH1 (ORCPT ); Mon, 26 Sep 2022 03:07:27 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B7583057C; Mon, 26 Sep 2022 00:07:03 -0700 (PDT) Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28Q4ZONe024460; Mon, 26 Sep 2022 07:06:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=TpsPknYhghWtRkEo0+Mes2SuC3vC03CHPsGQl1tjLG4=; b=rUoALr05Dg8ZFHWWPqF1JeqelM11axwvtS911EGEjdR3D2f1okzqH1AaELNI/uhEGv4K VYDA3plNwkNkO5DBoGTeGd802upJdB/3G6PMoN7ap8ImYp5PHepXvqTVj8cRDRLbMpcm cFXoIgxNquQuLHTpavlvFJeSomYwfOz83Qn6rzZRFUy4HDbevhvdU7S4DqwJrhS+AXZt eYcl6WWbqcDLmHqsT3AyJocP1iG81qrE7TsjtYGvhD+NrubuI6+5pH4sSZEfG/JrINS4 X5i04fyOezu1Ll6VlvVSlubdfW50QnmaVQzHaAYH+u7BV9O956VJhXuhtaYZ2QHkH+9A mA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtb6u5uaa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:58 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 28Q76w4A007171; Mon, 26 Sep 2022 07:06:58 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtb6u5u9b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:57 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 28Q76lvn024296; Mon, 26 Sep 2022 07:06:55 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma02fra.de.ibm.com with ESMTP id 3jssh91gvp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:55 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 28Q77Kup18743630 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Sep 2022 07:07:20 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 217B211C04C; Mon, 26 Sep 2022 07:06:53 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4559611C04A; Mon, 26 Sep 2022 07:06:50 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.30.221]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 26 Sep 2022 07:06:49 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andreas Dilger , Jan Kara , Ritesh Harjani Subject: [RFC v2 6/8] ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union Date: Mon, 26 Sep 2022 12:34:57 +0530 Message-Id: <68364daa1af99536695b9a1df07652c14caf7cac.1664172580.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: sX9g0goObfJ6CqqnstgmIiUI4Fh66Gwg X-Proofpoint-GUID: HX0L6VqhBWhCnsHWIgnxCrGLphg0IqqG X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-26_04,2022-09-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 spamscore=0 lowpriorityscore=0 suspectscore=0 mlxscore=0 impostorscore=0 phishscore=0 adultscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2209260043 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" ** Splitting pa->pa_inode_list ** Currently, we use the same pa->pa_inode_list to add a pa to either the inode preallocation list or the locality group preallocation list. For better clarity, split this list into a union of 2 list_heads and use either of the them based on the type of pa. ** Splitting pa->pa_obj_lock ** Currently, pa->pa_obj_lock is either assigned &ei->i_prealloc_lock for inode PAs or lg_prealloc_lock for lg PAs, and is then used to lock the lists containing these PAs. Make the distinction between the 2 PA types clear by changing this lock to a union of 2 locks. Explicitly use the pa_lock_node.inode_lock for inode PAs and pa_lock_node.lg_lock for lg PAs. This patch is required so that the locality group preallocation code remains the same as in upcoming patches we are going to make changes to inode preallocation code to move from list to rbtree based implementation. This patch also makes it easier to review the upcoming patches. There are no functional changes in this patch. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) --- fs/ext4/mballoc.c | 76 +++++++++++++++++++++++++++-------------------- fs/ext4/mballoc.h | 10 +++++-- 2 files changed, 52 insertions(+), 34 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index dda9a72c81d9..b91710fe881f 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3995,7 +3995,7 @@ ext4_mb_pa_assert_overlap(struct ext4_allocation_cont= ext *ac, ext4_lblk_t tmp_pa_start, tmp_pa_end; =20 rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_node.inode_list)= { spin_lock(&tmp_pa->pa_lock); if (tmp_pa->pa_deleted =3D=3D 0) { tmp_pa_start =3D tmp_pa->pa_lstart; @@ -4033,7 +4033,7 @@ ext4_mb_pa_adjust_overlap(struct ext4_allocation_cont= ext *ac, =20 /* check we don't cross already preallocated blocks */ rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_node.inode_list)= { if (tmp_pa->pa_deleted) continue; spin_lock(&tmp_pa->pa_lock); @@ -4410,7 +4410,7 @@ ext4_mb_use_preallocated(struct ext4_allocation_conte= xt *ac) =20 /* first, try per-file preallocation */ rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_node.inode_list)= { =20 /* all fields in this condition don't change, * so we can skip locking for them */ @@ -4467,7 +4467,7 @@ ext4_mb_use_preallocated(struct ext4_allocation_conte= xt *ac) for (i =3D order; i < PREALLOC_TB_SIZE; i++) { rcu_read_lock(); list_for_each_entry_rcu(tmp_pa, &lg->lg_prealloc_list[i], - pa_inode_list) { + pa_node.lg_list) { spin_lock(&tmp_pa->pa_lock); if (tmp_pa->pa_deleted =3D=3D 0 && tmp_pa->pa_free >=3D ac->ac_o_ex.fe_len) { @@ -4640,9 +4640,15 @@ static void ext4_mb_put_pa(struct ext4_allocation_co= ntext *ac, list_del(&pa->pa_group_list); ext4_unlock_group(sb, grp); =20 - spin_lock(pa->pa_obj_lock); - list_del_rcu(&pa->pa_inode_list); - spin_unlock(pa->pa_obj_lock); + if (pa->pa_type =3D=3D MB_INODE_PA) { + spin_lock(pa->pa_node_lock.inode_lock); + list_del_rcu(&pa->pa_node.inode_list); + spin_unlock(pa->pa_node_lock.inode_lock); + } else { + spin_lock(pa->pa_node_lock.lg_lock); + list_del_rcu(&pa->pa_node.lg_list); + spin_unlock(pa->pa_node_lock.lg_lock); + } =20 call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback); } @@ -4710,7 +4716,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *= ac) pa->pa_len =3D ac->ac_b_ex.fe_len; pa->pa_free =3D pa->pa_len; spin_lock_init(&pa->pa_lock); - INIT_LIST_HEAD(&pa->pa_inode_list); + INIT_LIST_HEAD(&pa->pa_node.inode_list); INIT_LIST_HEAD(&pa->pa_group_list); pa->pa_deleted =3D 0; pa->pa_type =3D MB_INODE_PA; @@ -4725,14 +4731,14 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context= *ac) ei =3D EXT4_I(ac->ac_inode); grp =3D ext4_get_group_info(sb, ac->ac_b_ex.fe_group); =20 - pa->pa_obj_lock =3D &ei->i_prealloc_lock; + pa->pa_node_lock.inode_lock =3D &ei->i_prealloc_lock; pa->pa_inode =3D ac->ac_inode; =20 list_add(&pa->pa_group_list, &grp->bb_prealloc_list); =20 - spin_lock(pa->pa_obj_lock); - list_add_rcu(&pa->pa_inode_list, &ei->i_prealloc_list); - spin_unlock(pa->pa_obj_lock); + spin_lock(pa->pa_node_lock.inode_lock); + list_add_rcu(&pa->pa_node.inode_list, &ei->i_prealloc_list); + spin_unlock(pa->pa_node_lock.inode_lock); atomic_inc(&ei->i_prealloc_active); } =20 @@ -4764,7 +4770,7 @@ ext4_mb_new_group_pa(struct ext4_allocation_context *= ac) pa->pa_len =3D ac->ac_b_ex.fe_len; pa->pa_free =3D pa->pa_len; spin_lock_init(&pa->pa_lock); - INIT_LIST_HEAD(&pa->pa_inode_list); + INIT_LIST_HEAD(&pa->pa_node.lg_list); INIT_LIST_HEAD(&pa->pa_group_list); pa->pa_deleted =3D 0; pa->pa_type =3D MB_GROUP_PA; @@ -4780,7 +4786,7 @@ ext4_mb_new_group_pa(struct ext4_allocation_context *= ac) lg =3D ac->ac_lg; BUG_ON(lg =3D=3D NULL); =20 - pa->pa_obj_lock =3D &lg->lg_prealloc_lock; + pa->pa_node_lock.lg_lock =3D &lg->lg_prealloc_lock; pa->pa_inode =3D NULL; =20 list_add(&pa->pa_group_list, &grp->bb_prealloc_list); @@ -4956,9 +4962,15 @@ ext4_mb_discard_group_preallocations(struct super_bl= ock *sb, list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) { =20 /* remove from object (inode or locality group) */ - spin_lock(pa->pa_obj_lock); - list_del_rcu(&pa->pa_inode_list); - spin_unlock(pa->pa_obj_lock); + if (pa->pa_type =3D=3D MB_GROUP_PA) { + spin_lock(pa->pa_node_lock.lg_lock); + list_del_rcu(&pa->pa_node.lg_list); + spin_unlock(pa->pa_node_lock.lg_lock); + } else { + spin_lock(pa->pa_node_lock.inode_lock); + list_del_rcu(&pa->pa_node.inode_list); + spin_unlock(pa->pa_node_lock.inode_lock); + } =20 if (pa->pa_type =3D=3D MB_GROUP_PA) ext4_mb_release_group_pa(&e4b, pa); @@ -5021,8 +5033,8 @@ void ext4_discard_preallocations(struct inode *inode,= unsigned int needed) spin_lock(&ei->i_prealloc_lock); while (!list_empty(&ei->i_prealloc_list) && needed) { pa =3D list_entry(ei->i_prealloc_list.prev, - struct ext4_prealloc_space, pa_inode_list); - BUG_ON(pa->pa_obj_lock !=3D &ei->i_prealloc_lock); + struct ext4_prealloc_space, pa_node.inode_list); + BUG_ON(pa->pa_node_lock.inode_lock !=3D &ei->i_prealloc_lock); spin_lock(&pa->pa_lock); if (atomic_read(&pa->pa_count)) { /* this shouldn't happen often - nobody should @@ -5039,7 +5051,7 @@ void ext4_discard_preallocations(struct inode *inode,= unsigned int needed) if (pa->pa_deleted =3D=3D 0) { ext4_mb_mark_pa_deleted(sb, pa); spin_unlock(&pa->pa_lock); - list_del_rcu(&pa->pa_inode_list); + list_del_rcu(&pa->pa_node.inode_list); list_add(&pa->u.pa_tmp_list, &list); needed--; continue; @@ -5331,7 +5343,7 @@ ext4_mb_discard_lg_preallocations(struct super_block = *sb, =20 spin_lock(&lg->lg_prealloc_lock); list_for_each_entry_rcu(pa, &lg->lg_prealloc_list[order], - pa_inode_list, + pa_node.lg_list, lockdep_is_held(&lg->lg_prealloc_lock)) { spin_lock(&pa->pa_lock); if (atomic_read(&pa->pa_count)) { @@ -5354,7 +5366,7 @@ ext4_mb_discard_lg_preallocations(struct super_block = *sb, ext4_mb_mark_pa_deleted(sb, pa); spin_unlock(&pa->pa_lock); =20 - list_del_rcu(&pa->pa_inode_list); + list_del_rcu(&pa->pa_node.lg_list); list_add(&pa->u.pa_tmp_list, &discard_list); =20 total_entries--; @@ -5415,7 +5427,7 @@ static void ext4_mb_add_n_trim(struct ext4_allocation= _context *ac) /* Add the prealloc space to lg */ spin_lock(&lg->lg_prealloc_lock); list_for_each_entry_rcu(tmp_pa, &lg->lg_prealloc_list[order], - pa_inode_list, + pa_node.lg_list, lockdep_is_held(&lg->lg_prealloc_lock)) { spin_lock(&tmp_pa->pa_lock); if (tmp_pa->pa_deleted) { @@ -5424,8 +5436,8 @@ static void ext4_mb_add_n_trim(struct ext4_allocation= _context *ac) } if (!added && pa->pa_free < tmp_pa->pa_free) { /* Add to the tail of the previous entry */ - list_add_tail_rcu(&pa->pa_inode_list, - &tmp_pa->pa_inode_list); + list_add_tail_rcu(&pa->pa_node.lg_list, + &tmp_pa->pa_node.lg_list); added =3D 1; /* * we want to count the total @@ -5436,7 +5448,7 @@ static void ext4_mb_add_n_trim(struct ext4_allocation= _context *ac) lg_prealloc_count++; } if (!added) - list_add_tail_rcu(&pa->pa_inode_list, + list_add_tail_rcu(&pa->pa_node.lg_list, &lg->lg_prealloc_list[order]); spin_unlock(&lg->lg_prealloc_lock); =20 @@ -5492,9 +5504,9 @@ static int ext4_mb_release_context(struct ext4_alloca= tion_context *ac) * doesn't grow big. */ if (likely(pa->pa_free)) { - spin_lock(pa->pa_obj_lock); - list_del_rcu(&pa->pa_inode_list); - spin_unlock(pa->pa_obj_lock); + spin_lock(pa->pa_node_lock.lg_lock); + list_del_rcu(&pa->pa_node.lg_list); + spin_unlock(pa->pa_node_lock.lg_lock); ext4_mb_add_n_trim(ac); } } @@ -5504,9 +5516,9 @@ static int ext4_mb_release_context(struct ext4_alloca= tion_context *ac) * treat per-inode prealloc list as a lru list, then try * to trim the least recently used PA. */ - spin_lock(pa->pa_obj_lock); - list_move(&pa->pa_inode_list, &ei->i_prealloc_list); - spin_unlock(pa->pa_obj_lock); + spin_lock(pa->pa_node_lock.inode_lock); + list_move(&pa->pa_node.inode_list, &ei->i_prealloc_list); + spin_unlock(pa->pa_node_lock.inode_lock); } =20 ext4_mb_put_pa(ac, ac->ac_sb, pa); diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index dcda2a943cee..398a6688c341 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -114,7 +114,10 @@ struct ext4_free_data { }; =20 struct ext4_prealloc_space { - struct list_head pa_inode_list; + union { + struct list_head inode_list; /* for inode PAs */ + struct list_head lg_list; /* for lg PAs */ + } pa_node; struct list_head pa_group_list; union { struct list_head pa_tmp_list; @@ -128,7 +131,10 @@ struct ext4_prealloc_space { ext4_grpblk_t pa_len; /* len of preallocated chunk */ ext4_grpblk_t pa_free; /* how many blocks are free */ unsigned short pa_type; /* pa type. inode or group */ - spinlock_t *pa_obj_lock; + union { + spinlock_t *inode_lock; /* locks the inode list holding this PA */ + spinlock_t *lg_lock; /* locks the lg list holding this PA */ + } pa_node_lock; struct inode *pa_inode; /* hack, for history only */ }; =20 --=20 2.31.1 From nobody Thu Apr 2 10:05:31 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11E8EC07E9D for ; Mon, 26 Sep 2022 07:08:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234039AbiIZHH7 (ORCPT ); Mon, 26 Sep 2022 03:07:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233869AbiIZHHg (ORCPT ); Mon, 26 Sep 2022 03:07:36 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7096B32053; Mon, 26 Sep 2022 00:07:05 -0700 (PDT) Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28Q4ZPGU003596; Mon, 26 Sep 2022 07:07:01 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=5qAdjYeOo3rtNC1zCVP1HJAhQUo/mXSsy+ptv58QnzQ=; b=JDrby8iQBE9qImvd3+dytF9PT4bDcnDQ4XPWf2+tIvwU1Uq7gC/nDjhG/G07bgs+k5vQ 9NZ8AFiQspV1XOESEM2+wTV2wXyRkwkbRxOJMjqH7btjRealixQApd2/S4eBfpfY/+c9 IALm0JWGF88UYbH8a3DoMARLYm+1rCgFDYCmzDZAbhYcWHKoBCXVEBWn+KVcojSToSFW 69Q9fLv+k4LmZ09upQ/PQkOxXm250SwlH7+Mok7c92lYCwhOMx8IYSFsVqh5xduW/XVi m2r0TqraIN0NTjktlcTvmFNOZTxqGujP7u2LjpI4rHan2FjzykcEIupbrxbkoBtjKvWl JQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtbh4w9px-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:07:01 +0000 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 28Q6qcSK006838; Mon, 26 Sep 2022 07:07:01 GMT Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtbh4w9p8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:07:00 +0000 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 28Q7679B025676; Mon, 26 Sep 2022 07:06:59 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma05fra.de.ibm.com with ESMTP id 3jssh8sgw2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:06:59 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 28Q76ux28847748 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Sep 2022 07:06:56 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B0CB311C052; Mon, 26 Sep 2022 07:06:56 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 954BA11C04C; Mon, 26 Sep 2022 07:06:53 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.30.221]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 26 Sep 2022 07:06:53 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andreas Dilger , Jan Kara , Ritesh Harjani Subject: [RFC v2 7/8] ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list Date: Mon, 26 Sep 2022 12:34:58 +0530 Message-Id: <94b9ed776b051271072ab9cef6eba00635640ad2.1664172580.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: ztP3nrrVlNa10Ad0Rx7KOUfN6R47N1K6 X-Proofpoint-GUID: wIIky415OJcEJora31rFjotBfem3XVGD X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-26_04,2022-09-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 priorityscore=1501 lowpriorityscore=0 phishscore=0 clxscore=1015 malwarescore=0 impostorscore=0 adultscore=0 bulkscore=0 spamscore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2209260043 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Currently, the kernel uses i_prealloc_list to hold all the inode preallocations. This is known to cause degradation in performance in workloads which perform large number of sparse writes on a single file. This is mainly because functions like ext4_mb_normalize_request() and ext4_mb_use_preallocated() iterate over this complete list, resulting in slowdowns when large number of PAs are present. Patch 27bc446e2 partially fixed this by enforcing a limit of 512 for the inode preallocation list and adding logic to continually trim the list if it grows above the threshold, however our testing revealed that a hardcoded value is not suitable for all kinds of workloads. To optimize this, add an rbtree to the inode and hold the inode preallocations in this rbtree. This will make iterating over inode PAs faster and scale much better than a linked list. Additionally, we also had to remove the LRU logic that was added during trimming of the list (in ext4_mb_release_context()) as it will add extra overhead in rbtree. The discards now happen in the lowest-logical-offset-first order. ** Locking notes ** With the introduction of rbtree to maintain inode PAs, we can't use RCU to walk the tree for searching since it can result in partial traversals which might miss some nodes(or entire subtrees) while discards happen in parallel (which happens under a lock). Hence this patch converts the ei->i_prealloc_lock spin_lock to rw_lock. Almost all the codepaths that read/modify the PA rbtrees are protected by the higher level inode->i_data_sem (except ext4_mb_discard_group_preallocations() and ext4_clear_inode()) IIUC, the only place we need lock protection is when one thread is reading "searching" the PA rbtree (earlier protected under rcu_read_lock()) and another is "deleting" the PAs in ext4_mb_discard_group_preallocations() function (which iterates all the PAs using the grp->bb_prealloc_list and deletes PAs from the tree without taking any inode lock (i_data_sem)). So, this patch converts all rcu_read_lock/unlock() paths for inode list PA to use read_lock() and all places where we were using ei->i_prealloc_lock spinlock will now be using write_lock(). Note that this makes the fast path (searching of the right PA e.g. ext4_mb_use_preallocated() or ext4_mb_normalize_request()), now use read_lock() instead of rcu_read_lock/unlock(). Ths also will now block due to slow discard path (ext4_mb_discard_group_preallocations()) which uses write_lock(). But this is not as bad as it looks. This is because - 1. The slow path only occurs when the normal allocation failed and we can say that we are low on disk space. One can argue this scenario won't be much frequent. 2. ext4_mb_discard_group_preallocations(), locks and unlocks the rwlock for deleting every individual PA. This gives enough opportunity for the fast path to acquire the read_lock for searching the PA inode list. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) --- fs/ext4/ext4.h | 4 +- fs/ext4/mballoc.c | 175 ++++++++++++++++++++++++++++++---------------- fs/ext4/mballoc.h | 6 +- fs/ext4/super.c | 4 +- 4 files changed, 123 insertions(+), 66 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 3bf9a6926798..d54b972f1f0f 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1120,8 +1120,8 @@ struct ext4_inode_info { =20 /* mballoc */ atomic_t i_prealloc_active; - struct list_head i_prealloc_list; - spinlock_t i_prealloc_lock; + struct rb_root i_prealloc_node; + rwlock_t i_prealloc_lock; =20 /* extents status tree */ struct ext4_es_tree i_es_tree; diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index b91710fe881f..ec66a5a094f0 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3985,6 +3985,24 @@ static void ext4_mb_normalize_group_request(struct e= xt4_allocation_context *ac) mb_debug(sb, "goal %u blocks for locality group\n", ac->ac_g_ex.fe_len); } =20 +/* + * This function returns the next element to look at during inode + * PA rbtree walk. We assume that we have held the inode PA rbtree lock + * (ei->i_prealloc_lock) + * + * new_start The start of the range we want to compare + * cur_start The existing start that we are comparing against + * node The node of the rb_tree + */ +static inline struct rb_node* +ext4_mb_pa_rb_next_iter(int new_start, int cur_start, struct rb_node *node) +{ + if (new_start < cur_start) + return node->rb_left; + else + return node->rb_right; +} + static inline void ext4_mb_pa_assert_overlap(struct ext4_allocation_context *ac, ext4_lblk_t start, ext4_lblk_t end) @@ -3993,27 +4011,31 @@ ext4_mb_pa_assert_overlap(struct ext4_allocation_co= ntext *ac, struct ext4_inode_info *ei =3D EXT4_I(ac->ac_inode); struct ext4_prealloc_space *tmp_pa; ext4_lblk_t tmp_pa_start, tmp_pa_end; + struct rb_node *iter; =20 - rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_node.inode_list)= { - spin_lock(&tmp_pa->pa_lock); - if (tmp_pa->pa_deleted =3D=3D 0) { - tmp_pa_start =3D tmp_pa->pa_lstart; - tmp_pa_end =3D tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); + read_lock(&ei->i_prealloc_lock); + iter =3D ei->i_prealloc_node.rb_node; + while (iter) { + tmp_pa =3D rb_entry(iter, struct ext4_prealloc_space, + pa_node.inode_node); + tmp_pa_start =3D tmp_pa->pa_lstart; + tmp_pa_end =3D tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); =20 + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted =3D=3D 0) BUG_ON(!(start >=3D tmp_pa_end || end <=3D tmp_pa_start)); - } spin_unlock(&tmp_pa->pa_lock); + + iter =3D ext4_mb_pa_rb_next_iter(start, tmp_pa_start, iter); } - rcu_read_unlock(); + read_unlock(&ei->i_prealloc_lock); } - /* * Given an allocation context "ac" and a range "start", "end", check * and adjust boundaries if the range overlaps with any of the existing * preallocatoins stored in the corresponding inode of the allocation cont= ext. * - *Parameters: + * Parameters: * ac allocation context * start start of the new range * end end of the new range @@ -4025,6 +4047,7 @@ ext4_mb_pa_adjust_overlap(struct ext4_allocation_cont= ext *ac, struct ext4_inode_info *ei =3D EXT4_I(ac->ac_inode); struct ext4_sb_info *sbi =3D EXT4_SB(ac->ac_sb); struct ext4_prealloc_space *tmp_pa; + struct rb_node *iter; ext4_lblk_t new_start, new_end; ext4_lblk_t tmp_pa_start, tmp_pa_end; =20 @@ -4032,19 +4055,29 @@ ext4_mb_pa_adjust_overlap(struct ext4_allocation_co= ntext *ac, new_end =3D *end; =20 /* check we don't cross already preallocated blocks */ - rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_node.inode_list)= { - if (tmp_pa->pa_deleted) + read_lock(&ei->i_prealloc_lock); + iter =3D ei->i_prealloc_node.rb_node; + while (iter) { + tmp_pa =3D rb_entry(iter, struct ext4_prealloc_space, + pa_node.inode_node); + tmp_pa_start =3D tmp_pa->pa_lstart; + tmp_pa_end =3D tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); + + /* + * If pa is deleted, ignore overlaps and just iterate in rbtree + * based on tmp_pa_start + */ + if (tmp_pa->pa_deleted) { + iter =3D ext4_mb_pa_rb_next_iter(new_start, tmp_pa_start, iter); continue; + } spin_lock(&tmp_pa->pa_lock); if (tmp_pa->pa_deleted) { spin_unlock(&tmp_pa->pa_lock); + iter =3D ext4_mb_pa_rb_next_iter(new_start, tmp_pa_start, iter); continue; } =20 - tmp_pa_start =3D tmp_pa->pa_lstart; - tmp_pa_end =3D tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); - /* PA must not overlap original request */ BUG_ON(!(ac->ac_o_ex.fe_logical >=3D tmp_pa_end || ac->ac_o_ex.fe_logical < tmp_pa_start)); @@ -4052,6 +4085,7 @@ ext4_mb_pa_adjust_overlap(struct ext4_allocation_cont= ext *ac, /* skip PAs this normalized request doesn't overlap with */ if (tmp_pa_start >=3D new_end || tmp_pa_end <=3D new_start) { spin_unlock(&tmp_pa->pa_lock); + iter =3D ext4_mb_pa_rb_next_iter(new_start, tmp_pa_start, iter); continue; } BUG_ON(tmp_pa_start <=3D new_start && tmp_pa_end >=3D new_end); @@ -4065,8 +4099,9 @@ ext4_mb_pa_adjust_overlap(struct ext4_allocation_cont= ext *ac, new_end =3D tmp_pa_start; } spin_unlock(&tmp_pa->pa_lock); + iter =3D ext4_mb_pa_rb_next_iter(new_start, tmp_pa_start, iter); } - rcu_read_unlock(); + read_unlock(&ei->i_prealloc_lock); =20 /* XXX: extra loop to check we really don't overlap preallocations */ ext4_mb_pa_assert_overlap(ac, new_start, new_end); @@ -4193,7 +4228,6 @@ ext4_mb_normalize_request(struct ext4_allocation_cont= ext *ac, ext4_mb_pa_adjust_overlap(ac, &start, &end); =20 size =3D end - start; - /* * In this function "start" and "size" are normalized for better * alignment and length such that we could preallocate more blocks. @@ -4402,6 +4436,7 @@ ext4_mb_use_preallocated(struct ext4_allocation_conte= xt *ac) struct ext4_locality_group *lg; struct ext4_prealloc_space *tmp_pa, *cpa =3D NULL; ext4_lblk_t tmp_pa_start, tmp_pa_end; + struct rb_node *iter; ext4_fsblk_t goal_block; =20 /* only data can be preallocated */ @@ -4409,17 +4444,23 @@ ext4_mb_use_preallocated(struct ext4_allocation_con= text *ac) return false; =20 /* first, try per-file preallocation */ - rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_node.inode_list)= { + read_lock(&ei->i_prealloc_lock); + iter =3D ei->i_prealloc_node.rb_node; + while (iter) { + tmp_pa =3D rb_entry(iter, struct ext4_prealloc_space, pa_node.inode_node= ); =20 /* all fields in this condition don't change, * so we can skip locking for them */ tmp_pa_start =3D tmp_pa->pa_lstart; tmp_pa_end =3D tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); =20 + /* original request start doesn't lie in this PA */ if (ac->ac_o_ex.fe_logical < tmp_pa_start || - ac->ac_o_ex.fe_logical >=3D tmp_pa_end) + ac->ac_o_ex.fe_logical >=3D tmp_pa_end) { + iter =3D ext4_mb_pa_rb_next_iter(ac->ac_o_ex.fe_logical, + tmp_pa_start, iter); continue; + } =20 /* non-extent files can't have physical blocks past 2^32 */ if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) && @@ -4439,12 +4480,14 @@ ext4_mb_use_preallocated(struct ext4_allocation_con= text *ac) ext4_mb_use_inode_pa(ac, tmp_pa); spin_unlock(&tmp_pa->pa_lock); ac->ac_criteria =3D 10; - rcu_read_unlock(); + read_unlock(&ei->i_prealloc_lock); return true; } spin_unlock(&tmp_pa->pa_lock); + iter =3D ext4_mb_pa_rb_next_iter(ac->ac_o_ex.fe_logical, + tmp_pa_start, iter); } - rcu_read_unlock(); + read_unlock(&ei->i_prealloc_lock); =20 /* can we use group allocation? */ if (!(ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC)) @@ -4596,6 +4639,7 @@ static void ext4_mb_put_pa(struct ext4_allocation_con= text *ac, { ext4_group_t grp; ext4_fsblk_t grp_blk; + struct ext4_inode_info *ei =3D EXT4_I(ac->ac_inode); =20 /* in this short window concurrent discard can set pa_deleted */ spin_lock(&pa->pa_lock); @@ -4641,16 +4685,34 @@ static void ext4_mb_put_pa(struct ext4_allocation_c= ontext *ac, ext4_unlock_group(sb, grp); =20 if (pa->pa_type =3D=3D MB_INODE_PA) { - spin_lock(pa->pa_node_lock.inode_lock); - list_del_rcu(&pa->pa_node.inode_list); - spin_unlock(pa->pa_node_lock.inode_lock); + write_lock(pa->pa_node_lock.inode_lock); + rb_erase(&pa->pa_node.inode_node, &ei->i_prealloc_node); + write_unlock(pa->pa_node_lock.inode_lock); + ext4_mb_pa_free(pa); } else { spin_lock(pa->pa_node_lock.lg_lock); list_del_rcu(&pa->pa_node.lg_list); spin_unlock(pa->pa_node_lock.lg_lock); + call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback); } +} + +static int ext4_mb_pa_cmp(struct rb_node *new, struct rb_node *cur) +{ + ext4_grpblk_t cur_start, new_start; + struct ext4_prealloc_space *cur_pa =3D rb_entry(cur, + struct ext4_prealloc_space, + pa_node.inode_node); + struct ext4_prealloc_space *new_pa =3D rb_entry(new, + struct ext4_prealloc_space, + pa_node.inode_node); + cur_start =3D cur_pa->pa_lstart; + new_start =3D new_pa->pa_lstart; =20 - call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback); + if (new_start < cur_start) + return 1; + else + return -1; } =20 /* @@ -4716,7 +4778,6 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *= ac) pa->pa_len =3D ac->ac_b_ex.fe_len; pa->pa_free =3D pa->pa_len; spin_lock_init(&pa->pa_lock); - INIT_LIST_HEAD(&pa->pa_node.inode_list); INIT_LIST_HEAD(&pa->pa_group_list); pa->pa_deleted =3D 0; pa->pa_type =3D MB_INODE_PA; @@ -4736,9 +4797,10 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context = *ac) =20 list_add(&pa->pa_group_list, &grp->bb_prealloc_list); =20 - spin_lock(pa->pa_node_lock.inode_lock); - list_add_rcu(&pa->pa_node.inode_list, &ei->i_prealloc_list); - spin_unlock(pa->pa_node_lock.inode_lock); + write_lock(pa->pa_node_lock.inode_lock); + ext4_mb_rb_insert(&ei->i_prealloc_node, &pa->pa_node.inode_node, + ext4_mb_pa_cmp); + write_unlock(pa->pa_node_lock.inode_lock); atomic_inc(&ei->i_prealloc_active); } =20 @@ -4904,6 +4966,7 @@ ext4_mb_discard_group_preallocations(struct super_blo= ck *sb, struct ext4_prealloc_space *pa, *tmp; struct list_head list; struct ext4_buddy e4b; + struct ext4_inode_info *ei; int err; int free =3D 0; =20 @@ -4967,18 +5030,21 @@ ext4_mb_discard_group_preallocations(struct super_b= lock *sb, list_del_rcu(&pa->pa_node.lg_list); spin_unlock(pa->pa_node_lock.lg_lock); } else { - spin_lock(pa->pa_node_lock.inode_lock); - list_del_rcu(&pa->pa_node.inode_list); - spin_unlock(pa->pa_node_lock.inode_lock); + write_lock(pa->pa_node_lock.inode_lock); + ei =3D EXT4_I(pa->pa_inode); + rb_erase(&pa->pa_node.inode_node, &ei->i_prealloc_node); + write_unlock(pa->pa_node_lock.inode_lock); } =20 - if (pa->pa_type =3D=3D MB_GROUP_PA) + list_del(&pa->u.pa_tmp_list); + + if (pa->pa_type =3D=3D MB_GROUP_PA) { ext4_mb_release_group_pa(&e4b, pa); - else + call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback); + } else { ext4_mb_release_inode_pa(&e4b, bitmap_bh, pa); - - list_del(&pa->u.pa_tmp_list); - call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback); + ext4_mb_pa_free(pa); + } } =20 ext4_unlock_group(sb, group); @@ -5008,6 +5074,7 @@ void ext4_discard_preallocations(struct inode *inode,= unsigned int needed) ext4_group_t group =3D 0; struct list_head list; struct ext4_buddy e4b; + struct rb_node *iter; int err; =20 if (!S_ISREG(inode->i_mode)) { @@ -5030,17 +5097,18 @@ void ext4_discard_preallocations(struct inode *inod= e, unsigned int needed) =20 repeat: /* first, collect all pa's in the inode */ - spin_lock(&ei->i_prealloc_lock); - while (!list_empty(&ei->i_prealloc_list) && needed) { - pa =3D list_entry(ei->i_prealloc_list.prev, - struct ext4_prealloc_space, pa_node.inode_list); + write_lock(&ei->i_prealloc_lock); + for (iter =3D rb_first(&ei->i_prealloc_node); iter && needed; iter =3D rb= _next(iter)) { + pa =3D rb_entry(iter, struct ext4_prealloc_space, + pa_node.inode_node); BUG_ON(pa->pa_node_lock.inode_lock !=3D &ei->i_prealloc_lock); + spin_lock(&pa->pa_lock); if (atomic_read(&pa->pa_count)) { /* this shouldn't happen often - nobody should * use preallocation while we're discarding it */ spin_unlock(&pa->pa_lock); - spin_unlock(&ei->i_prealloc_lock); + write_unlock(&ei->i_prealloc_lock); ext4_msg(sb, KERN_ERR, "uh-oh! used pa while discarding"); WARN_ON(1); @@ -5051,7 +5119,7 @@ void ext4_discard_preallocations(struct inode *inode,= unsigned int needed) if (pa->pa_deleted =3D=3D 0) { ext4_mb_mark_pa_deleted(sb, pa); spin_unlock(&pa->pa_lock); - list_del_rcu(&pa->pa_node.inode_list); + rb_erase(&pa->pa_node.inode_node, &ei->i_prealloc_node); list_add(&pa->u.pa_tmp_list, &list); needed--; continue; @@ -5059,7 +5127,7 @@ void ext4_discard_preallocations(struct inode *inode,= unsigned int needed) =20 /* someone is deleting pa right now */ spin_unlock(&pa->pa_lock); - spin_unlock(&ei->i_prealloc_lock); + write_unlock(&ei->i_prealloc_lock); =20 /* we have to wait here because pa_deleted * doesn't mean pa is already unlinked from @@ -5076,7 +5144,7 @@ void ext4_discard_preallocations(struct inode *inode,= unsigned int needed) schedule_timeout_uninterruptible(HZ); goto repeat; } - spin_unlock(&ei->i_prealloc_lock); + write_unlock(&ei->i_prealloc_lock); =20 list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) { BUG_ON(pa->pa_type !=3D MB_INODE_PA); @@ -5108,7 +5176,7 @@ void ext4_discard_preallocations(struct inode *inode,= unsigned int needed) put_bh(bitmap_bh); =20 list_del(&pa->u.pa_tmp_list); - call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback); + ext4_mb_pa_free(pa); } } =20 @@ -5484,7 +5552,6 @@ static void ext4_mb_trim_inode_pa(struct inode *inode) static int ext4_mb_release_context(struct ext4_allocation_context *ac) { struct inode *inode =3D ac->ac_inode; - struct ext4_inode_info *ei =3D EXT4_I(inode); struct ext4_sb_info *sbi =3D EXT4_SB(ac->ac_sb); struct ext4_prealloc_space *pa =3D ac->ac_pa; if (pa) { @@ -5511,16 +5578,6 @@ static int ext4_mb_release_context(struct ext4_alloc= ation_context *ac) } } =20 - if (pa->pa_type =3D=3D MB_INODE_PA) { - /* - * treat per-inode prealloc list as a lru list, then try - * to trim the least recently used PA. - */ - spin_lock(pa->pa_node_lock.inode_lock); - list_move(&pa->pa_node.inode_list, &ei->i_prealloc_list); - spin_unlock(pa->pa_node_lock.inode_lock); - } - ext4_mb_put_pa(ac, ac->ac_sb, pa); } if (ac->ac_bitmap_page) diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index 398a6688c341..f8e8ee493867 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -115,7 +115,7 @@ struct ext4_free_data { =20 struct ext4_prealloc_space { union { - struct list_head inode_list; /* for inode PAs */ + struct rb_node inode_node; /* for inode PA rbtree */ struct list_head lg_list; /* for lg PAs */ } pa_node; struct list_head pa_group_list; @@ -132,10 +132,10 @@ struct ext4_prealloc_space { ext4_grpblk_t pa_free; /* how many blocks are free */ unsigned short pa_type; /* pa type. inode or group */ union { - spinlock_t *inode_lock; /* locks the inode list holding this PA */ + rwlock_t *inode_lock; /* locks the rbtree holding this PA */ spinlock_t *lg_lock; /* locks the lg list holding this PA */ } pa_node_lock; - struct inode *pa_inode; /* hack, for history only */ + struct inode *pa_inode; /* used to get the inode during group discard */ }; =20 enum { diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 9a66abcca1a8..5e4fd4ea65fc 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1330,9 +1330,9 @@ static struct inode *ext4_alloc_inode(struct super_bl= ock *sb) =20 inode_set_iversion(&ei->vfs_inode, 1); spin_lock_init(&ei->i_raw_lock); - INIT_LIST_HEAD(&ei->i_prealloc_list); + ei->i_prealloc_node =3D RB_ROOT; atomic_set(&ei->i_prealloc_active, 0); - spin_lock_init(&ei->i_prealloc_lock); + rwlock_init(&ei->i_prealloc_lock); ext4_es_init_tree(&ei->i_es_tree); rwlock_init(&ei->i_es_lock); INIT_LIST_HEAD(&ei->i_es_list); --=20 2.31.1 From nobody Thu Apr 2 10:05:31 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A815C07E9D for ; Mon, 26 Sep 2022 07:08:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233971AbiIZHIn (ORCPT ); Mon, 26 Sep 2022 03:08:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233998AbiIZHHl (ORCPT ); Mon, 26 Sep 2022 03:07:41 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2599E33A2D; Mon, 26 Sep 2022 00:07:10 -0700 (PDT) Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28Q4ZO4W021610; Mon, 26 Sep 2022 07:07:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=wvASUROafhTjPlLltsCu0s76I1MN/8xuSSRTvmm6l1s=; b=bEImM+an8US2cZf81siB2YE3OfHOfMNV/AUgGrrJKCwzke0VFTSpqEzTrPz7nY3rmK8C AKNWfFnZUCs/A8BemLcnl4gkrjrajV4fAYv9vDjKYnVWK97hzsBhKpVVDu/+qud96VaV l7CoT5A2giIBpaEaRr5EJzLomNMi7jaL4sgBF+fCuKcPxou8UUIpSFoJCOVejtoIxDtr MkR/3jk/8K/LrKl2kCCArGxv7icJwZ1uW2YRPmr+er7j+uZpg4cB0JEblU5/CEMhlGVk +xXLOocuddC9YbipBFglac1jlCZlqSJQTkCGuj30rWzy2MIlDJCcMIUzafdRdIgbKrqj Ow== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtbtgvypa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:07:05 +0000 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 28Q6YWmf011954; Mon, 26 Sep 2022 07:07:05 GMT Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jtbtgvymn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:07:04 +0000 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 28Q76PFW013362; Mon, 26 Sep 2022 07:07:02 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma06fra.de.ibm.com with ESMTP id 3jss5hshaq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Sep 2022 07:07:02 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 28Q77RcE52625798 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Sep 2022 07:07:27 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 189BE11C04C; Mon, 26 Sep 2022 07:07:00 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3B19E11C04A; Mon, 26 Sep 2022 07:06:57 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.30.221]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 26 Sep 2022 07:06:56 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andreas Dilger , Jan Kara , Ritesh Harjani Subject: [RFC v2 8/8] ext4: Remove the logic to trim inode PAs Date: Mon, 26 Sep 2022 12:34:59 +0530 Message-Id: <00df6404d9a7c79e82cc0e170000cc43b1868467.1664172580.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 1FlYnbmNBeJZJpAffdpgYbaegiNcWKX_ X-Proofpoint-ORIG-GUID: WIi4ukNGxz4nJdeUEouZfvDc2GPnVo1R X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-26_04,2022-09-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 mlxlogscore=999 malwarescore=0 impostorscore=0 priorityscore=1501 phishscore=0 spamscore=0 bulkscore=0 clxscore=1015 mlxscore=0 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2209260043 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Earlier, inode PAs were stored in a linked list. This caused a need to periodically trim the list down inorder to avoid growing it to a very large size, as this would severly affect performance during list iteration. Recent patches changed this list to an rbtree, and since the tree scales up much better, we no longer need to have the trim functionality, hence remove it. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) --- Documentation/admin-guide/ext4.rst | 3 --- fs/ext4/ext4.h | 1 - fs/ext4/mballoc.c | 20 -------------------- fs/ext4/mballoc.h | 5 ----- fs/ext4/sysfs.c | 2 -- 5 files changed, 31 deletions(-) diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide= /ext4.rst index 4c559e08d11e..5740d85439ff 100644 --- a/Documentation/admin-guide/ext4.rst +++ b/Documentation/admin-guide/ext4.rst @@ -489,9 +489,6 @@ Files in /sys/fs/ext4/: multiple of this tuning parameter if the stripe size is not set in= the ext4 superblock =20 - mb_max_inode_prealloc - The maximum length of per-inode ext4_prealloc_space list. - mb_max_to_scan The maximum number of extents the multiblock allocator will search= to find the best extent. diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index d54b972f1f0f..bca4b41cc192 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1612,7 +1612,6 @@ struct ext4_sb_info { unsigned int s_mb_stats; unsigned int s_mb_order2_reqs; unsigned int s_mb_group_prealloc; - unsigned int s_mb_max_inode_prealloc; unsigned int s_max_dir_size_kb; /* where last allocation was done - for stream allocation */ unsigned long s_mb_last_group; diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index ec66a5a094f0..973eb12768a1 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3420,7 +3420,6 @@ int ext4_mb_init(struct super_block *sb) sbi->s_mb_stats =3D MB_DEFAULT_STATS; sbi->s_mb_stream_request =3D MB_DEFAULT_STREAM_THRESHOLD; sbi->s_mb_order2_reqs =3D MB_DEFAULT_ORDER2_REQS; - sbi->s_mb_max_inode_prealloc =3D MB_DEFAULT_MAX_INODE_PREALLOC; /* * The default group preallocation is 512, which for 4k block * sizes translates to 2 megabytes. However for bigalloc file @@ -5529,29 +5528,11 @@ static void ext4_mb_add_n_trim(struct ext4_allocati= on_context *ac) return ; } =20 -/* - * if per-inode prealloc list is too long, trim some PA - */ -static void ext4_mb_trim_inode_pa(struct inode *inode) -{ - struct ext4_inode_info *ei =3D EXT4_I(inode); - struct ext4_sb_info *sbi =3D EXT4_SB(inode->i_sb); - int count, delta; - - count =3D atomic_read(&ei->i_prealloc_active); - delta =3D (sbi->s_mb_max_inode_prealloc >> 2) + 1; - if (count > sbi->s_mb_max_inode_prealloc + delta) { - count -=3D sbi->s_mb_max_inode_prealloc; - ext4_discard_preallocations(inode, count); - } -} - /* * release all resource we used in allocation */ static int ext4_mb_release_context(struct ext4_allocation_context *ac) { - struct inode *inode =3D ac->ac_inode; struct ext4_sb_info *sbi =3D EXT4_SB(ac->ac_sb); struct ext4_prealloc_space *pa =3D ac->ac_pa; if (pa) { @@ -5587,7 +5568,6 @@ static int ext4_mb_release_context(struct ext4_alloca= tion_context *ac) if (ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC) mutex_unlock(&ac->ac_lg->lg_mutex); ext4_mb_collect_stats(ac); - ext4_mb_trim_inode_pa(inode); return 0; } =20 diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index f8e8ee493867..6d85ee8674a6 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -73,11 +73,6 @@ */ #define MB_DEFAULT_GROUP_PREALLOC 512 =20 -/* - * maximum length of inode prealloc list - */ -#define MB_DEFAULT_MAX_INODE_PREALLOC 512 - /* * Number of groups to search linearly before performing group scanning * optimization. diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c index d233c24ea342..f0d42cf44c71 100644 --- a/fs/ext4/sysfs.c +++ b/fs/ext4/sysfs.c @@ -214,7 +214,6 @@ EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan); EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs); EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request); EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc); -EXT4_RW_ATTR_SBI_UI(mb_max_inode_prealloc, s_mb_max_inode_prealloc); EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups); EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb); EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error); @@ -264,7 +263,6 @@ static struct attribute *ext4_attrs[] =3D { ATTR_LIST(mb_order2_req), ATTR_LIST(mb_stream_req), ATTR_LIST(mb_group_prealloc), - ATTR_LIST(mb_max_inode_prealloc), ATTR_LIST(mb_max_linear_groups), ATTR_LIST(max_writeback_mb_bump), ATTR_LIST(extent_max_zeroout_kb), --=20 2.31.1