From nobody Sun Dec 14 05:53:24 2025 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADC1630F7F1 for ; Fri, 12 Dec 2025 07:45:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765525519; cv=none; b=epbensdGQIF1H5Lz5QRVy4VUUJ+ESh+Kq6QZTE/7q/rxrlSTO6frky45kC7N6RMiU8dp1DwfJ9Q12Oz253ZQWoCwslHit7T/n22CtNXXYX8628cx6KJBIYDMZBMnhZvYXmadfmYHBCa/HbNsYdW8cXzcrQQl1WiszosVnDrEyKg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765525519; c=relaxed/simple; bh=MFi/erZFW/WIYd6hrvxk7ChTVOOgkAQH/Yx3B0zM2tg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=etkE6/aarc5y2T+NBNOMJHwApVgJ3ztxBhtbvzasrueMSJdDUg9nonUANvapjBz07vmRMLjgqn8lPiFjwc9mX7hOM30yGJYPP1i41cEND7l12H9Dm6jwQZRQiC+uOCMvXZVKkjk1kbV1xEtv9uYRZBX17rTrUU/9EJzgkFD44qg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=JRGxjucW; arc=none smtp.client-ip=209.85.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="JRGxjucW" Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-477985aea2bso465375e9.3 for ; Thu, 11 Dec 2025 23:45:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1765525515; x=1766130315; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=P05KKxYK3bHA72OQxEvz6zLFou6nxqg/1zGwgDAaaQY=; b=JRGxjucWAmVvITNUAPj8yk3LQqc+2GXjurhWgqLCdSbub6ldB1A23XgaT8Yuyut5pF JmvBbPCVD3Z9VCp3eCvehjlPXiO0jasEBBaH1sKFPG4QV3V8T6KvUvTfgAsXJ57bbw0v lGzZpheWtKV+UztMdwn8Oa2AB8mFHgPa5QIR+Qs8Q7wuX+ONIsVMaZ20DiWaDmp+4vfq fYB4ORQSIkPbfmCGnGRi7YKSfi08aIPXp/6S/dgGbNSJWR5G5I5E07IrNrjU/hjjfJ54 Nb/q8T3kQzHnP7wif7j7hawl88TSUhdCZgcpZZrqsLHUI+ojaHVxzPiEUQvdSycgF30C 63yg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765525515; x=1766130315; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=P05KKxYK3bHA72OQxEvz6zLFou6nxqg/1zGwgDAaaQY=; b=ahNoCaTtt9PV57GA1SUtabRXrYp3nOW9+cklKafTUlQ8McWCr5ogelsF/T1yCMdrj9 ZMCb/CMsagcDkStdLsRjvZmamj7yW/CLUx433qvxmfZ60i5TbI6zD/z1A+RU3SohF+fT YVmDcckGaxFoueDg1ZCqr0TDAyYzqrqyUM03sb5fx8fa925k2EwkE2hbT6KAJqhGyaAs IN8tQZReBYsRjNEm3Slb4qz9rW2u4WjsEgwTFK/4NT3QwdOeqcw4fj1hkB98B2NJSHgF lbyxzJDrs/NgDqrT2yT/79xNp82pcP/rSM28W4Fh/UQEs/r4Y/5JRob7Ms2XcKSHFFva 7FZg== X-Forwarded-Encrypted: i=1; AJvYcCUwle097llcnGvP61Tb32RioquAH/7rCgX+DgJdSm+72MWO+3xqBDkBS1j28MeIChWo8o1eTmtOlGBdFkY=@vger.kernel.org X-Gm-Message-State: AOJu0YxuiU5bqSTn/h1uBzntSNaYkuhI1wVPYbzOpirotR1S+ErroGqW H3X0CcsOA+0WwvBoqdFfLlvKsOfHMXtLw1Ob8i0tVFvDB7KYUTeqsooAE+RlyGLXZ/w= X-Gm-Gg: AY/fxX4v5jJuHF6BV00n/iv+QoAwuqvOmv/WkPnAbTCXiX7aKlF73NbBlqDWLQ3LWqq tIUllcaES7BWU04ReKjq4gYPKGt5U8VW8D31PWBRqWixOjtEI24TsgLsgxOBhlU7wKRaUPsIJZO o1ODPtF42ZIgfHTB4mfhKH3Z48nBVco8rK64S2jPvA6TxSlZFmXLp8K2yRar/TnSTBK6pOzGt6a iJm4LXoHOGRzPMhAxblMYAA75dJXt9fsKH/0gRzj776Wslac/rL/NfrWBMGgfSFVq6HKcjCZzKR Z7sdt/BGtx63fU6FJONiJwEtAmjNmvRluTegug9bYHTbS0KjginVff2B6iEH1DQVBc8DEYFX3Vt ryPGaJmmuLHQCggpif92p1G/RET5hgEcqQ+fNvhA+6oxTqF2BWAgfHyi9D9KODDhv0UerN7Fc3v wbFyqfNumKBxA= X-Google-Smtp-Source: AGHT+IGNm6YypEjkra2IJ8KHE/QqydrnksUWxhI7edfs9SXj213SizfF+KrBCDE0zvZNLJXWD6H77w== X-Received: by 2002:a05:600c:3e85:b0:477:9a4d:b92d with SMTP id 5b1f17b1804b1-47a8f90c7a7mr6035815e9.5.1765525514923; Thu, 11 Dec 2025 23:45:14 -0800 (PST) Received: from p15.suse.cz ([202.127.77.110]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-29ee9d38adcsm45916975ad.30.2025.12.11.23.45.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Dec 2025 23:45:13 -0800 (PST) From: Heming Zhao To: joseph.qi@linux.alibaba.com, mark@fasheh.com, jlbec@evilplan.org Cc: Heming Zhao , ocfs2-devel@lists.linux.dev, linux-kernel@vger.kernel.org, glass.su@suse.com Subject: [PATCH v6 1/2] ocfs2: give ocfs2 the ability to reclaim suballocator free bg Date: Fri, 12 Dec 2025 15:45:03 +0800 Message-ID: <20251212074505.25962-2-heming.zhao@suse.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251212074505.25962-1-heming.zhao@suse.com> References: <20251212074505.25962-1-heming.zhao@suse.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The current ocfs2 code can't reclaim suballocator block group space. In some cases, this causes ocfs2 to hold onto a lot of space. For example, when creating lots of small files, the space is held/managed by the '//inode_alloc'. After the user deletes all the small files, the space never returns to the '//global_bitmap'. This issue prevents ocfs2 from providing the needed space even when there is enough free space in a small ocfs2 volume. This patch gives ocfs2 the ability to reclaim suballocator free space when the block group is freed. For performance reasons, this patch keeps the first suballocator block group active. Signed-off-by: Heming Zhao Reviewed-by: Su Yue Reviewed-by: Joseph Qi --- fs/ocfs2/suballoc.c | 308 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 299 insertions(+), 9 deletions(-) diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c index 6ac4dcd54588..9a19f5230c8c 100644 --- a/fs/ocfs2/suballoc.c +++ b/fs/ocfs2/suballoc.c @@ -294,6 +294,74 @@ static int ocfs2_validate_group_descriptor(struct supe= r_block *sb, return ocfs2_validate_gd_self(sb, bh, 0); } =20 +/* + * The hint group descriptor (gd) may already have been released + * in _ocfs2_free_suballoc_bits(). We first check the gd signature, + * then perform the standard ocfs2_read_group_descriptor() jobs. + * + * If the gd signature is invalid, we return 'rc=3D0' and set + * '*released=3D1'. The caller is expected to handle this specific case. + * Otherwise, we return the actual error code. + * + * We treat gd signature corruption case as a release case. The + * caller ocfs2_claim_suballoc_bits() will use ocfs2_search_chain() + * to search each gd block. The code will eventually find this + * corrupted gd block - Late, but not missed. + * + * Note: + * The caller is responsible for initializing the '*released' status. + */ +static int ocfs2_read_hint_group_descriptor(struct inode *inode, + struct ocfs2_dinode *di, u64 gd_blkno, + struct buffer_head **bh, int *released) +{ + int rc; + struct buffer_head *tmp =3D *bh; + struct ocfs2_group_desc *gd; + + rc =3D ocfs2_read_block(INODE_CACHE(inode), gd_blkno, &tmp, NULL); + if (rc) + goto out; + + gd =3D (struct ocfs2_group_desc *) tmp->b_data; + if (!OCFS2_IS_VALID_GROUP_DESC(gd)) { + /* + * Invalid gd cache was set in ocfs2_read_block(), + * which will affect block_group allocation. + * Path: + * ocfs2_reserve_suballoc_bits + * ocfs2_block_group_alloc + * ocfs2_block_group_alloc_contig + * ocfs2_set_new_buffer_uptodate + */ + ocfs2_remove_from_cache(INODE_CACHE(inode), tmp); + *released =3D 1; /* we return 'rc=3D0' for this case */ + goto free_bh; + } + + /* below jobs same with ocfs2_read_group_descriptor() */ + if (!buffer_jbd(tmp)) { + rc =3D ocfs2_validate_group_descriptor(inode->i_sb, tmp); + if (rc) + goto free_bh; + } + + rc =3D ocfs2_validate_gd_parent(inode->i_sb, di, tmp, 0); + if (rc) + goto free_bh; + + /* If ocfs2_read_block() got us a new bh, pass it up. */ + if (!*bh) + *bh =3D tmp; + + return rc; + +free_bh: + brelse(tmp); +out: + return rc; +} + int ocfs2_read_group_descriptor(struct inode *inode, struct ocfs2_dinode *= di, u64 gd_blkno, struct buffer_head **bh) { @@ -1724,7 +1792,7 @@ static int ocfs2_search_one_group(struct ocfs2_alloc_= context *ac, u32 bits_wanted, u32 min_bits, struct ocfs2_suballoc_result *res, - u16 *bits_left) + u16 *bits_left, int *released) { int ret; struct buffer_head *group_bh =3D NULL; @@ -1732,9 +1800,11 @@ static int ocfs2_search_one_group(struct ocfs2_alloc= _context *ac, struct ocfs2_dinode *di =3D (struct ocfs2_dinode *)ac->ac_bh->b_data; struct inode *alloc_inode =3D ac->ac_inode; =20 - ret =3D ocfs2_read_group_descriptor(alloc_inode, di, - res->sr_bg_blkno, &group_bh); - if (ret < 0) { + ret =3D ocfs2_read_hint_group_descriptor(alloc_inode, di, + res->sr_bg_blkno, &group_bh, released); + if (*released) { + return 0; + } else if (ret < 0) { mlog_errno(ret); return ret; } @@ -1949,6 +2019,7 @@ static int ocfs2_claim_suballoc_bits(struct ocfs2_all= oc_context *ac, struct ocfs2_suballoc_result *res) { int status; + int released =3D 0; u16 victim, i; u16 bits_left =3D 0; u64 hint =3D ac->ac_last_group; @@ -1975,6 +2046,7 @@ static int ocfs2_claim_suballoc_bits(struct ocfs2_all= oc_context *ac, goto bail; } =20 + /* the hint bg may already be released, we quiet search this group. */ res->sr_bg_blkno =3D hint; if (res->sr_bg_blkno) { /* Attempt to short-circuit the usual search mechanism @@ -1982,7 +2054,12 @@ static int ocfs2_claim_suballoc_bits(struct ocfs2_al= loc_context *ac, * allocation group. This helps us maintain some * contiguousness across allocations. */ status =3D ocfs2_search_one_group(ac, handle, bits_wanted, - min_bits, res, &bits_left); + min_bits, res, &bits_left, + &released); + if (released) { + res->sr_bg_blkno =3D 0; + goto chain_search; + } if (!status) goto set_hint; if (status < 0 && status !=3D -ENOSPC) { @@ -1990,7 +2067,7 @@ static int ocfs2_claim_suballoc_bits(struct ocfs2_all= oc_context *ac, goto bail; } } - +chain_search: cl =3D (struct ocfs2_chain_list *) &fe->id2.i_chain; =20 victim =3D ocfs2_find_victim_chain(cl); @@ -2102,6 +2179,12 @@ int ocfs2_claim_metadata(handle_t *handle, return status; } =20 +/* + * after ocfs2 has the ability to release block group unused space, + * the ->ip_last_used_group may be invalid. so this function returns + * ac->ac_last_group need to verify. + * refer the 'hint' in ocfs2_claim_suballoc_bits() for more details. + */ static void ocfs2_init_inode_ac_group(struct inode *dir, struct buffer_head *parent_di_bh, struct ocfs2_alloc_context *ac) @@ -2540,6 +2623,198 @@ static int ocfs2_block_group_clear_bits(handle_t *h= andle, return status; } =20 +/* + * Reclaim the suballocator managed space to main bitmap. + * This function first works on the suballocator to perform the + * cleanup rec/alloc_inode job, then switches to the main bitmap + * to reclaim released space. + * + * handle: The transaction handle + * alloc_inode: The suballoc inode + * alloc_bh: The buffer_head of suballoc inode + * group_bh: The group descriptor buffer_head of suballocator managed. + * Caller should release the input group_bh. + */ +static int _ocfs2_reclaim_suballoc_to_main(handle_t *handle, + struct inode *alloc_inode, + struct buffer_head *alloc_bh, + struct buffer_head *group_bh) +{ + int idx, status =3D 0; + int i, next_free_rec, len =3D 0; + __le16 old_bg_contig_free_bits =3D 0; + u16 start_bit; + u32 tmp_used; + u64 bg_blkno, start_blk; + unsigned int count; + struct ocfs2_chain_rec *rec; + struct buffer_head *main_bm_bh =3D NULL; + struct inode *main_bm_inode =3D NULL; + struct ocfs2_super *osb =3D OCFS2_SB(alloc_inode->i_sb); + struct ocfs2_dinode *fe =3D (struct ocfs2_dinode *) alloc_bh->b_data; + struct ocfs2_chain_list *cl =3D &fe->id2.i_chain; + struct ocfs2_group_desc *group =3D (struct ocfs2_group_desc *) group_bh->= b_data; + + idx =3D le16_to_cpu(group->bg_chain); + rec =3D &(cl->cl_recs[idx]); + + status =3D ocfs2_extend_trans(handle, + ocfs2_calc_group_alloc_credits(osb->sb, + le16_to_cpu(cl->cl_cpg))); + if (status) { + mlog_errno(status); + goto bail; + } + status =3D ocfs2_journal_access_di(handle, INODE_CACHE(alloc_inode), + alloc_bh, OCFS2_JOURNAL_ACCESS_WRITE); + if (status < 0) { + mlog_errno(status); + goto bail; + } + + /* + * Only clear the suballocator rec item in-place. + * + * If idx is not the last, we don't compress (remove the empty item) + * the cl_recs[]. If not, we need to do lots jobs. + * + * Compress cl_recs[] code example: + * if (idx !=3D cl->cl_next_free_rec - 1) + * memmove(&cl->cl_recs[idx], &cl->cl_recs[idx + 1], + * sizeof(struct ocfs2_chain_rec) * + * (cl->cl_next_free_rec - idx - 1)); + * for(i =3D idx; i < cl->cl_next_free_rec-1; i++) { + * group->bg_chain =3D "later group->bg_chain"; + * group->bg_blkno =3D xxx; + * ... ... + * } + */ + + tmp_used =3D le32_to_cpu(fe->id1.bitmap1.i_total); + fe->id1.bitmap1.i_total =3D cpu_to_le32(tmp_used - le32_to_cpu(rec->c_tot= al)); + + /* Substraction 1 for the block group itself */ + tmp_used =3D le32_to_cpu(fe->id1.bitmap1.i_used); + fe->id1.bitmap1.i_used =3D cpu_to_le32(tmp_used - 1); + + tmp_used =3D le32_to_cpu(fe->i_clusters); + fe->i_clusters =3D cpu_to_le32(tmp_used - le16_to_cpu(cl->cl_cpg)); + + spin_lock(&OCFS2_I(alloc_inode)->ip_lock); + OCFS2_I(alloc_inode)->ip_clusters -=3D le32_to_cpu(fe->i_clusters); + fe->i_size =3D cpu_to_le64(ocfs2_clusters_to_bytes(alloc_inode->i_sb, + le32_to_cpu(fe->i_clusters))); + spin_unlock(&OCFS2_I(alloc_inode)->ip_lock); + i_size_write(alloc_inode, le64_to_cpu(fe->i_size)); + alloc_inode->i_blocks =3D ocfs2_inode_sector_count(alloc_inode); + + ocfs2_journal_dirty(handle, alloc_bh); + ocfs2_update_inode_fsync_trans(handle, alloc_inode, 0); + + start_blk =3D le64_to_cpu(rec->c_blkno); + count =3D le32_to_cpu(rec->c_total) / le16_to_cpu(cl->cl_bpc); + + /* + * If the rec is the last one, let's compress the chain list by + * removing the empty cl_recs[] at the end. + */ + next_free_rec =3D le16_to_cpu(cl->cl_next_free_rec); + if (idx =3D=3D (next_free_rec - 1)) { + len++; /* the last item should be counted first */ + for (i =3D (next_free_rec - 2); i > 0; i--) { + if (cl->cl_recs[i].c_free =3D=3D cl->cl_recs[i].c_total) + len++; + else + break; + } + } + le16_add_cpu(&cl->cl_next_free_rec, -len); + + rec->c_free =3D 0; + rec->c_total =3D 0; + rec->c_blkno =3D 0; + ocfs2_remove_from_cache(INODE_CACHE(alloc_inode), group_bh); + memset(group, 0, sizeof(struct ocfs2_group_desc)); + + /* prepare job for reclaim clusters */ + main_bm_inode =3D ocfs2_get_system_file_inode(osb, + GLOBAL_BITMAP_SYSTEM_INODE, + OCFS2_INVALID_SLOT); + if (!main_bm_inode) + goto bail; /* ignore the error in reclaim path */ + + inode_lock(main_bm_inode); + + status =3D ocfs2_inode_lock(main_bm_inode, &main_bm_bh, 1); + if (status < 0) + goto free_bm_inode; /* ignore the error in reclaim path */ + + ocfs2_block_to_cluster_group(main_bm_inode, start_blk, &bg_blkno, + &start_bit); + fe =3D (struct ocfs2_dinode *) main_bm_bh->b_data; + cl =3D &fe->id2.i_chain; + /* reuse group_bh, caller will release the input group_bh */ + group_bh =3D NULL; + + /* reclaim clusters to global_bitmap */ + status =3D ocfs2_read_group_descriptor(main_bm_inode, fe, bg_blkno, + &group_bh); + if (status < 0) { + mlog_errno(status); + goto free_bm_bh; + } + group =3D (struct ocfs2_group_desc *) group_bh->b_data; + + if ((count + start_bit) > le16_to_cpu(group->bg_bits)) { + ocfs2_error(alloc_inode->i_sb, + "reclaim length (%d) beyands block group length (%d)", + count + start_bit, le16_to_cpu(group->bg_bits)); + goto free_group_bh; + } + + old_bg_contig_free_bits =3D group->bg_contig_free_bits; + status =3D ocfs2_block_group_clear_bits(handle, main_bm_inode, + group, group_bh, + start_bit, count, 0, + _ocfs2_clear_bit); + if (status < 0) { + mlog_errno(status); + goto free_group_bh; + } + + status =3D ocfs2_journal_access_di(handle, INODE_CACHE(main_bm_inode), + main_bm_bh, OCFS2_JOURNAL_ACCESS_WRITE); + if (status < 0) { + mlog_errno(status); + ocfs2_block_group_set_bits(handle, main_bm_inode, group, group_bh, + start_bit, count, + le16_to_cpu(old_bg_contig_free_bits), 1); + goto free_group_bh; + } + + idx =3D le16_to_cpu(group->bg_chain); + rec =3D &(cl->cl_recs[idx]); + + le32_add_cpu(&rec->c_free, count); + tmp_used =3D le32_to_cpu(fe->id1.bitmap1.i_used); + fe->id1.bitmap1.i_used =3D cpu_to_le32(tmp_used - count); + ocfs2_journal_dirty(handle, main_bm_bh); + +free_group_bh: + brelse(group_bh); + +free_bm_bh: + ocfs2_inode_unlock(main_bm_inode, 1); + brelse(main_bm_bh); + +free_bm_inode: + inode_unlock(main_bm_inode); + iput(main_bm_inode); + +bail: + return status; +} + /* * expects the suballoc inode to already be locked. */ @@ -2552,12 +2827,13 @@ static int _ocfs2_free_suballoc_bits(handle_t *hand= le, void (*undo_fn)(unsigned int bit, unsigned long *bitmap)) { - int status =3D 0; + int idx, status =3D 0; u32 tmp_used; struct ocfs2_dinode *fe =3D (struct ocfs2_dinode *) alloc_bh->b_data; struct ocfs2_chain_list *cl =3D &fe->id2.i_chain; struct buffer_head *group_bh =3D NULL; struct ocfs2_group_desc *group; + struct ocfs2_chain_rec *rec; __le16 old_bg_contig_free_bits =3D 0; =20 /* The alloc_bh comes from ocfs2_free_dinode() or @@ -2603,12 +2879,26 @@ static int _ocfs2_free_suballoc_bits(handle_t *hand= le, goto bail; } =20 - le32_add_cpu(&cl->cl_recs[le16_to_cpu(group->bg_chain)].c_free, - count); + idx =3D le16_to_cpu(group->bg_chain); + rec =3D &(cl->cl_recs[idx]); + + le32_add_cpu(&rec->c_free, count); tmp_used =3D le32_to_cpu(fe->id1.bitmap1.i_used); fe->id1.bitmap1.i_used =3D cpu_to_le32(tmp_used - count); ocfs2_journal_dirty(handle, alloc_bh); =20 + /* + * Reclaim suballocator free space. + * Bypass: global_bitmap, non empty rec, first rec in cl_recs[] + */ + if (ocfs2_is_cluster_bitmap(alloc_inode) || + (le32_to_cpu(rec->c_free) !=3D (le32_to_cpu(rec->c_total) - 1)) || + (le16_to_cpu(cl->cl_next_free_rec) =3D=3D 1)) { + goto bail; + } + + _ocfs2_reclaim_suballoc_to_main(handle, alloc_inode, alloc_bh, group_bh); + bail: brelse(group_bh); return status; --=20 2.43.0 From nobody Sun Dec 14 05:53:24 2025 Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6232E30F540 for ; Fri, 12 Dec 2025 07:45:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765525523; cv=none; b=HECZ/9Fn1raHM4DjFPG5Vj1F+U4S5aAhsoM59AYJ9o5rRqznAbMyIBXrFDaN0Da7HO+2XplwEG1pIiNg6Pk/snEo5YTMLc3WidzQbn+GPXNt4daPKP+tUx0yZ/h57IRQDk1hrQ6qcUFNjmFp1CO6xl+n2WT4Fz9dG5egxqwRDQM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765525523; c=relaxed/simple; bh=k0Da8MpybI0CYQLClcHeD9cgrvdH2g3rxNuqSdol5Zo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lLvi3jShU01wzvgiC8Ca95zpzGn3ZqqVxSYbnA4kJ5V8mLOElg02arXjiIsgdYNYH4EKjkNgARbv41qYTgfFxqrzpN2rtCQx+q0VEcRh74kZi60ybDMXSAI3ilKu0oz+1t88PgmhKMie76L5YXTnnJEX0TUmq2WyNKQd7hSIbgM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=KEDbxiJJ; arc=none smtp.client-ip=209.85.128.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="KEDbxiJJ" Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-47928022b93so1959615e9.0 for ; Thu, 11 Dec 2025 23:45:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1765525519; x=1766130319; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MjqsA3Rk0Xp0l1r6js0wRRyT/FYUpiPjVmpKyrwUjoc=; b=KEDbxiJJvK9/muEdEcSpD5y8SBRf8vGC5aY50PWytROhBo5RxcXwwAtAD9VsM9wLs3 hH5bcvh8keBEUe1jwRn1pn+ykhM9lMy7YPa3fE7Lcd/jB6eTcQ4U+Kp+nfi5MV8q2aNG FcUuYRF5sPhnI7omNrRvOZgmFpDDwp3r5kiLpa5N+QWrODSbpS8v67TnU2nMxtss+9zH Q6045NenaJI7YN2oN+jzMnbZXmeFGkUfJVwqCt1on381RP/1QmDBpqDbEy2XQZH1fusk 7teJRqYVWL0jLXdODQDO8BOWE81TpopyEyH0/g6Aw/vB4mYFkIxRe7s13h92XGK5SuK4 I42A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765525519; x=1766130319; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=MjqsA3Rk0Xp0l1r6js0wRRyT/FYUpiPjVmpKyrwUjoc=; b=YlpIgJrLzhnILTOKNEAJI8x5O6PP9jHnZQStU7dqWUgnjUzUvADGY/R2Q1e1tLt35T ijCcQyMZkUnwz4r+3neSCCj0dLHHiFr1pct87DXuLiBw4zmVAYFsdi9yh9D3XKXL19uG W6Xpbz6fYQ4poyjAFo2JW/UqDFgWfkoVwuREJlhGeTcNzdOFhDr/LSxwpyuY7wlLvqw3 ZAmNpydOIYbLIoGtE6yvFI5ZhwSok21Ntx/fKiUf30MPk2I0h82z7DAZ/GwA7ye/cCU9 Y570Icd84B3ftoSxspb6TZnmmsyQ89ombAsHMzeQihm5jKu+BFzyVMbUjLoetESd9z6/ Apbw== X-Forwarded-Encrypted: i=1; AJvYcCVB5Kx37swHxELIqThQDvL20EUzAc+yBhea9evw6iVNNVjSQd908irMY6ttQKN4R5qe7PKpjmHLyo+nDrE=@vger.kernel.org X-Gm-Message-State: AOJu0Yw/IO6IMMeLGlgaIsztAieJfc5Dwl1/avygdVTlomCxsKzh2NAR f1hJqfgQ5P3YMCTryqiAwSVQwYrmaCr1UE8CGB+ZNmuN9Qjs51CHI+nrGY0ncfLkzMA= X-Gm-Gg: AY/fxX4cGiBFZXQSrAV+q7haD4UVtczM7xtH08eX7nL4TVJuj3PkviLhHCuJvVT5c59 5oll5WdsnOITXcWkVmotMok0rQwTzbQGA9ASu2oisJuK/gp+fiz4rx0FqqVO3flViZHmhZ1kAPs p4KJE93f/EAuWECp/KK7KImSvxKcNdF/SbRClZMjiStIhP6+6TMozvNppPw+i1YUieWr64svnQU C3Tx/tCqj6afp1dRZqziwIEhE+x+lFhr4Ifz57MD7RhZZ7XLmmluLlaMoPLWhK3eyXiTFSnhzrq IB4tULfeJQ1znb5YKlDv9oJaCSIqgnpOICgCTFRk/E3dgP5jIjMphoyxtu+326YczDfQ/O6Lkjk QpRubgP8+D+4gChFSQppQObpkJh78GuWkAVMRrRo3+sY1ewImSG9awXmdXVCDueeJcNhH+naev4 n2lDKykRuqsG4= X-Google-Smtp-Source: AGHT+IEjuwAfIHrDjNbY3jvvOm9cGk1SelVLWKi9CsNogqTqb7sT5nmS5zEZUcqgR+NfnkAOy6BG1A== X-Received: by 2002:a05:600c:4f84:b0:477:a6f1:499d with SMTP id 5b1f17b1804b1-47a8f9187c7mr6616485e9.3.1765525518589; Thu, 11 Dec 2025 23:45:18 -0800 (PST) Received: from p15.suse.cz ([202.127.77.110]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-29ee9d38adcsm45916975ad.30.2025.12.11.23.45.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Dec 2025 23:45:17 -0800 (PST) From: Heming Zhao To: joseph.qi@linux.alibaba.com, mark@fasheh.com, jlbec@evilplan.org Cc: Heming Zhao , ocfs2-devel@lists.linux.dev, linux-kernel@vger.kernel.org, glass.su@suse.com Subject: [PATCH v6 2/2] ocfs2: detect released suballocator BG for fh_to_[dentry|parent] Date: Fri, 12 Dec 2025 15:45:04 +0800 Message-ID: <20251212074505.25962-3-heming.zhao@suse.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251212074505.25962-1-heming.zhao@suse.com> References: <20251212074505.25962-1-heming.zhao@suse.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" After ocfs2 gained the ability to reclaim suballocator free block group (BGs), a suballocator block group may be released. This change causes the xfstest case generic/426 to fail. generic/426 expects return value -ENOENT or -ESTALE, but the current code triggers -EROFS. Call stack before ocfs2 gained the ability to reclaim bg: ocfs2_fh_to_dentry //or ocfs2_fh_to_parent ocfs2_get_dentry + ocfs2_test_inode_bit | ocfs2_test_suballoc_bit | + ocfs2_read_group_descriptor //Since ocfs2 never releases the bg, | | //the bg block was always found. | + *res =3D ocfs2_test_bit //unlink was called, and the bit is zero | + if (!set) //because the above *res is 0 status =3D -ESTALE //the generic/426 expected return value Current call stack that triggers -EROFS: ocfs2_get_dentry ocfs2_test_inode_bit ocfs2_test_suballoc_bit ocfs2_read_group_descriptor + if reading a released bg, validation fails and triggers -EROFS How to fix: Since the read BG is already released, we must avoid triggering -EROFS. With this commit, we use ocfs2_read_hint_group_descriptor() to detect the released BG block. This approach quietly handles this type of error and returns -EINVAL, which triggers the caller's existing conversion path to -ESTALE. Signed-off-by: Heming Zhao Reviewed-by: Su Yue Reviewed-by: Joseph Qi --- fs/ocfs2/export.c | 6 ++++-- fs/ocfs2/suballoc.c | 26 +++++++++++++++++--------- 2 files changed, 21 insertions(+), 11 deletions(-) diff --git a/fs/ocfs2/export.c b/fs/ocfs2/export.c index b95724b767e1..9c2665dd24e2 100644 --- a/fs/ocfs2/export.c +++ b/fs/ocfs2/export.c @@ -74,8 +74,9 @@ static struct dentry *ocfs2_get_dentry(struct super_block= *sb, * nice */ status =3D -ESTALE; - } else + } else if (status !=3D -ESTALE) { mlog(ML_ERROR, "test inode bit failed %d\n", status); + } goto unlock_nfs_sync; } =20 @@ -162,8 +163,9 @@ static struct dentry *ocfs2_get_parent(struct dentry *c= hild) if (status < 0) { if (status =3D=3D -EINVAL) { status =3D -ESTALE; - } else + } else if (status !=3D -ESTALE) { mlog(ML_ERROR, "test inode bit failed %d\n", status); + } parent =3D ERR_PTR(status); goto bail_unlock; } diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c index 9a19f5230c8c..9b0ae1bc445b 100644 --- a/fs/ocfs2/suballoc.c +++ b/fs/ocfs2/suballoc.c @@ -3152,7 +3152,7 @@ static int ocfs2_test_suballoc_bit(struct ocfs2_super= *osb, struct ocfs2_group_desc *group; struct buffer_head *group_bh =3D NULL; u64 bg_blkno; - int status; + int status, quiet =3D 0, released; =20 trace_ocfs2_test_suballoc_bit((unsigned long long)blkno, (unsigned int)bit); @@ -3168,9 +3168,13 @@ static int ocfs2_test_suballoc_bit(struct ocfs2_supe= r *osb, =20 bg_blkno =3D group_blkno ? group_blkno : ocfs2_which_suballoc_group(blkno, bit); - status =3D ocfs2_read_group_descriptor(suballoc, alloc_di, bg_blkno, - &group_bh); - if (status < 0) { + status =3D ocfs2_read_hint_group_descriptor(suballoc, alloc_di, bg_blkno, + &group_bh, &released); + if (released) { + quiet =3D 1; + status =3D -ESTALE; + goto bail; + } else if (status < 0) { mlog(ML_ERROR, "read group %llu failed %d\n", (unsigned long long)bg_blkno, status); goto bail; @@ -3182,7 +3186,7 @@ static int ocfs2_test_suballoc_bit(struct ocfs2_super= *osb, bail: brelse(group_bh); =20 - if (status) + if (status && !quiet) mlog_errno(status); return status; } @@ -3202,7 +3206,7 @@ static int ocfs2_test_suballoc_bit(struct ocfs2_super= *osb, */ int ocfs2_test_inode_bit(struct ocfs2_super *osb, u64 blkno, int *res) { - int status; + int status, quiet =3D 0; u64 group_blkno =3D 0; u16 suballoc_bit =3D 0, suballoc_slot =3D 0; struct inode *inode_alloc_inode; @@ -3244,8 +3248,12 @@ int ocfs2_test_inode_bit(struct ocfs2_super *osb, u6= 4 blkno, int *res) =20 status =3D ocfs2_test_suballoc_bit(osb, inode_alloc_inode, alloc_bh, group_blkno, blkno, suballoc_bit, res); - if (status < 0) - mlog(ML_ERROR, "test suballoc bit failed %d\n", status); + if (status < 0) { + if (status =3D=3D -ESTALE) + quiet =3D 1; + else + mlog(ML_ERROR, "test suballoc bit failed %d\n", status); + } =20 ocfs2_inode_unlock(inode_alloc_inode, 0); inode_unlock(inode_alloc_inode); @@ -3253,7 +3261,7 @@ int ocfs2_test_inode_bit(struct ocfs2_super *osb, u64= blkno, int *res) iput(inode_alloc_inode); brelse(alloc_bh); bail: - if (status) + if (status && !quiet) mlog_errno(status); return status; } --=20 2.43.0