From nobody Thu Sep 11 18:11:48 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD117C6FD1F for ; Sat, 25 Mar 2023 08:14:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230399AbjCYIOk (ORCPT ); Sat, 25 Mar 2023 04:14:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229722AbjCYIOY (ORCPT ); Sat, 25 Mar 2023 04:14:24 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 738FCFF08; Sat, 25 Mar 2023 01:14:11 -0700 (PDT) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32P7KIOc009886; Sat, 25 Mar 2023 08:14:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : mime-version; s=pp1; bh=137IwdRd25qQg7zap7GpaN8RZvC7sCCGJZQuGgmDSWA=; b=pr4ppiY45utNjorXp0gnbnsy6lpLwRHTq8kldvMo9QkM6vLTIfbDjTBp6iaPJ2Wq2xfq ipAMRN+dqX6bu38ykxsFCHxFzBR0VxtZBzRUKyDs3B0VflFevsbqTACkjYqEFwXb3t2t PVPy3zE+Fpuj6OTIRQ/+h9p51wNnSBWnp+SV1CYZEFXT+KpI5d2mG4uneh5idChMqeHa T2WyEX1TuoHlsGeMiAT7f/Lq0pN1mrnvdvP6J0Gfmmt3FcyLy4E66i/ahwia7dUZ7AXH SmCBk10b5K77oL5gfXNF91REaAvnGfIsBtGQG/UK0oAjNKirejvVfam3hA01ySGWwp2k 0Q== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3phvfgrmh1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 25 Mar 2023 08:14:07 +0000 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 32P8CFE0018015; Sat, 25 Mar 2023 08:14:06 GMT Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3phvfgrmgp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 25 Mar 2023 08:14:06 +0000 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 32P2tKaN015705; Sat, 25 Mar 2023 08:14:04 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma01fra.de.ibm.com (PPS) with ESMTPS id 3phrk6g78u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 25 Mar 2023 08:14:04 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 32P8E12R47776366 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 25 Mar 2023 08:14:01 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9B55420040; Sat, 25 Mar 2023 08:14:01 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A4ABD20043; Sat, 25 Mar 2023 08:13:59 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.63.61]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Sat, 25 Mar 2023 08:13:59 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , rookxu , Ritesh Harjani Subject: [PATCH v6 6/9] ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa() Date: Sat, 25 Mar 2023 13:43:39 +0530 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: xji1ey5cfpT1jojSzSI5T0_193ahK7dY X-Proofpoint-GUID: pwNUWzkSlNPWvDyXwRr2afakyUmcwWSo Content-Transfer-Encoding: quoted-printable X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-24_11,2023-03-24_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 clxscore=1015 bulkscore=0 mlxscore=0 priorityscore=1501 suspectscore=0 lowpriorityscore=0 adultscore=0 spamscore=0 impostorscore=0 malwarescore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2303250065 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When the length of best extent found is less than the length of goal extent we need to make sure that the best extent atleast covers the start of the original request. This is done by adjusting the ac_b_ex.fe_logical (logical start) of the extent. While doing so, the current logic sometimes results in the best extent's logical range overflowing the goal extent. Since this best extent is later added to the inode preallocation list, we have a possibility of introducing overlapping preallocations. This is discussed in detail here [1]. As per Jan's suggestion, to fix this, replace the existing logic with the below logic for adjusting best extent as it keeps fragmentation in check while ensuring logical range of best extent doesn't overflow out of goal extent: 1. Check if best extent can be kept at end of goal range and still cover original start. 2. Else, check if best extent can be kept at start of goal range and still cover original start. 3. Else, keep the best extent at start of original request. Also, add a few extra BUG_ONs that might help catch errors faster. [1] https://lore.kernel.org/r/Y+OGkVvzPN0RMv0O@li-bb2b2a4c-3307-11b2-a85c-8= fa5c3a69313.ibm.com Suggested-by: Jan Kara Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 49 ++++++++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 37bf6507cbfd..1304c95d8c59 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4328,6 +4328,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocati= on_context *ac, BUG_ON(start < pa->pa_pstart); BUG_ON(end > pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len)); BUG_ON(pa->pa_free < len); + BUG_ON(ac->ac_b_ex.fe_len <=3D 0); pa->pa_free -=3D len; =20 mb_debug(ac->ac_sb, "use %llu/%d from inode pa %p\n", start, len, pa); @@ -4666,10 +4667,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context = *ac) pa =3D ac->ac_pa; =20 if (ac->ac_b_ex.fe_len < ac->ac_g_ex.fe_len) { - int winl; - int wins; - int win; - int offs; + int new_bex_start; + int new_bex_end; =20 /* we can't allocate as much as normalizer wants. * so, found space must get proper lstart @@ -4677,26 +4676,40 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context= *ac) BUG_ON(ac->ac_g_ex.fe_logical > ac->ac_o_ex.fe_logical); BUG_ON(ac->ac_g_ex.fe_len < ac->ac_o_ex.fe_len); =20 - /* we're limited by original request in that - * logical block must be covered any way - * winl is window we can move our chunk within */ - winl =3D ac->ac_o_ex.fe_logical - ac->ac_g_ex.fe_logical; + /* + * Use the below logic for adjusting best extent as it keeps + * fragmentation in check while ensuring logical range of best + * extent doesn't overflow out of goal extent: + * + * 1. Check if best ex can be kept at end of goal and still + * cover original start + * 2. Else, check if best ex can be kept at start of goal and + * still cover original start + * 3. Else, keep the best ex at start of original request. + */ + new_bex_end =3D ac->ac_g_ex.fe_logical + + EXT4_C2B(sbi, ac->ac_g_ex.fe_len); + new_bex_start =3D new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); + if (ac->ac_o_ex.fe_logical >=3D new_bex_start) + goto adjust_bex; =20 - /* also, we should cover whole original request */ - wins =3D EXT4_C2B(sbi, ac->ac_b_ex.fe_len - ac->ac_o_ex.fe_len); + new_bex_start =3D ac->ac_g_ex.fe_logical; + new_bex_end =3D + new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); + if (ac->ac_o_ex.fe_logical < new_bex_end) + goto adjust_bex; =20 - /* the smallest one defines real window */ - win =3D min(winl, wins); + new_bex_start =3D ac->ac_o_ex.fe_logical; + new_bex_end =3D + new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); =20 - offs =3D ac->ac_o_ex.fe_logical % - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); - if (offs && offs < win) - win =3D offs; +adjust_bex: + ac->ac_b_ex.fe_logical =3D new_bex_start; =20 - ac->ac_b_ex.fe_logical =3D ac->ac_o_ex.fe_logical - - EXT4_NUM_B2C(sbi, win); BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); + BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + + EXT4_C2B(sbi, ac->ac_g_ex.fe_len))); } =20 pa->pa_lstart =3D ac->ac_b_ex.fe_logical; --=20 2.31.1