From nobody Sat Nov 30 16:35:53 2024 Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 243AF170836 for ; Sun, 8 Sep 2024 14:07:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725804441; cv=none; b=rEkQFp141ZcmSCy4ImTXBQ+VhhZUn/sk7LBVRthP6mIG93Yl86VVLdKjUswI4Ao3s6nzhez1bjAUEhgaiIuMpnPebTZ2uzm99GlBdZlkmUaGh46vnpkLBf0tq3NykcWK3Tyq6iakOmGb8MW7xQn9rkSRl9QO98SbIDm+jCjqEaU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725804441; c=relaxed/simple; bh=gMwj85pB7Gl01P8y2LOkEt1m1asUOHHDXxzYdMCulbw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=A14nBVJEtVkUPmM58IFDoIZkqPgY3CB3+MJd7p+VRFWCyB7Dj70vcjh9j+y73Es0FArKd6xqy1X/czo7TpStJvkx+PxOx9uZx/VNmAqMwrm5eYy3pCdXuTVC75UXbDjP1xlGtBHxerBa/gF+HOEBcUR+kD+J1K96/KfSCuTbDSQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=gZwVqYtc; arc=none smtp.client-ip=209.85.128.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="gZwVqYtc" Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-42cb1e3b449so1179525e9.3 for ; Sun, 08 Sep 2024 07:07:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1725804437; x=1726409237; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jnHpTlMy8PKbCha0o0D1VFUrpGJyJX2GaCuCOV7xSpE=; b=gZwVqYtcJr0fm1Bg+XYDHZoarjNGF0ZuUIdPwKCUp5Bzs2RXzO/9MLA6R/AdhWF37E /RRmC8rXX1bf+oe8ktZ22BRPtB01hz4Tcq1fWUoB0Y36oM6zoBti3SFN/cxIX8Mqzv/X rR2RVaPMiaYGuMv15bsaRwFYDMUgAWS3dzjY4Eysm8QOIFTt8x7kJxlfz843a+rg5Yhv 4StSNqy9LlOAWG+praBNXfeLmL3moFQvpwEJaDQExQ2Tw0fT5BtmqYLax8iSeG2CBRWN eWvBmDHZPveEq1ie77jcNEX+UtuoULoU0U1Bxl1UlaQ1/Sbv0l9S/m2WBtuNZIIca9r/ RlCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725804437; x=1726409237; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jnHpTlMy8PKbCha0o0D1VFUrpGJyJX2GaCuCOV7xSpE=; b=id/ne67lXVY8b8qctsjhibIc7z+Ol2FIvcT8VzAyOKKn62YxKX9IWW6FPpbYwJF6x7 O+v+/DDEBnoDozLkhUiVusAgHlTJTWdpoiuc2NrD7eJeDFEWsjfTtme+9q5sXx6WufEl 6iSaUsKMqNPNjKQcQbpEW1WtP0SDCIaIbKoywKvGjrQmJ+DKQpL1qD/wWJHaeizo/Q1F cqv5jWQriKpd1sgZ91R41p1hzqtjOfDnxdfbD5ts+WQKQZQGqbWQcGc6Ho82HFtk8pHW l6XFcfzFO8o3QFAmQ+lcT3EsdP6yrrGbhye2adVbj8utYdbE3yDA94ybqtRuM1M5+Qtq k4MA== X-Forwarded-Encrypted: i=1; AJvYcCUI7elaejLre+UtV+n/LM4OpcxPgoejG3P4ZmtIu95HQ4XXTw2Ly1p3Pkqr7ycnkG/Jbov9JNPCe7Qz964=@vger.kernel.org X-Gm-Message-State: AOJu0YxgLGLUu/WRSIt6gdekpzz7IdUWuyawXDvsYJMM0OKThh93IxUl BGC7a3HfZlftnKYDsY+D3ahWzYSZf310Iw2wiXBlVvCSFsK5xovHYBBcnIFi8kQ= X-Google-Smtp-Source: AGHT+IEESgZ+WmZEnwkW/95HyfkyQrgtLBWzFYQAhPcB2oWUrSMpmeqxZwQVZtmBHJt0Z7Qqw8kgtA== X-Received: by 2002:a05:600c:3582:b0:42c:aeee:d8ed with SMTP id 5b1f17b1804b1-42caeeed92dmr13227105e9.7.1725804436463; Sun, 08 Sep 2024 07:07:16 -0700 (PDT) Received: from localhost.localdomain ([202.127.77.110]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dadc074599sm4863371a91.31.2024.09.08.07.07.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 Sep 2024 07:07:15 -0700 (PDT) From: Heming Zhao To: joseph.qi@linux.alibaba.com, glass.su@suse.com Cc: Heming Zhao , ocfs2-devel@lists.linux.dev, linux-kernel@vger.kernel.org Subject: [PATCH v3 1/3] ocfs2: give ocfs2 the ability to reclaim suballoc free bg Date: Sun, 8 Sep 2024 22:07:03 +0800 Message-Id: <20240908140705.19169-2-heming.zhao@suse.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20240908140705.19169-1-heming.zhao@suse.com> References: <20240908140705.19169-1-heming.zhao@suse.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The current ocfs2 code can't reclaim suballocator block group space. This cause ocfs2 to hold onto a lot of space in some cases. for example, when creating lots of small files, the space is held/managed by '//inode_alloc'. After the user deletes all the small files, the space never returns to '//global_bitmap'. This issue prevents ocfs2 from providing the needed space even when there is enough free space in a small ocfs2 volume. This patch gives ocfs2 the ability to reclaim suballoc free space when the block group is freed. For performance reasons, this patch keeps the first suballocator block group. Signed-off-by: Heming Zhao Reviewed-by: Su Yue --- fs/ocfs2/suballoc.c | 302 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 292 insertions(+), 10 deletions(-) diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c index f7b483f0de2a..d62010166c34 100644 --- a/fs/ocfs2/suballoc.c +++ b/fs/ocfs2/suballoc.c @@ -294,6 +294,68 @@ static int ocfs2_validate_group_descriptor(struct supe= r_block *sb, return ocfs2_validate_gd_self(sb, bh, 0); } =20 +/* + * hint gd may already be released in _ocfs2_free_suballoc_bits(), + * we first check gd descriptor signature, then do the + * ocfs2_read_group_descriptor() jobs. + * + * When the group descriptor is invalid, we return 'rc=3D0' and + * '*released=3D1'. The caller should handle this case. Otherwise, + * we return the real error code. + */ +static int ocfs2_read_hint_group_descriptor(struct inode *inode, + struct ocfs2_dinode *di, u64 gd_blkno, + struct buffer_head **bh, int *released) +{ + int rc; + struct buffer_head *tmp =3D *bh; + struct ocfs2_group_desc *gd; + + *released =3D 0; + + rc =3D ocfs2_read_block(INODE_CACHE(inode), gd_blkno, &tmp, NULL); + if (rc) + goto out; + + gd =3D (struct ocfs2_group_desc *) tmp->b_data; + if (!OCFS2_IS_VALID_GROUP_DESC(gd)) { + /* + * Invalid gd cache was set in ocfs2_read_block(), + * which will affect block_group allocation. + * Path: + * ocfs2_reserve_suballoc_bits + * ocfs2_block_group_alloc + * ocfs2_block_group_alloc_contig + * ocfs2_set_new_buffer_uptodate + */ + ocfs2_remove_from_cache(INODE_CACHE(inode), tmp); + *released =3D 1; /* we return 'rc=3D0' for this case */ + goto free_bh; + } + + /* below jobs same with ocfs2_read_group_descriptor() */ + if (!buffer_jbd(tmp)) { + rc =3D ocfs2_validate_group_descriptor(inode->i_sb, tmp); + if (rc) + goto free_bh; + } + + rc =3D ocfs2_validate_gd_parent(inode->i_sb, di, tmp, 0); + if (rc) + goto free_bh; + + /* If ocfs2_read_block() got us a new bh, pass it up. */ + if (!*bh) + *bh =3D tmp; + + return rc; + +free_bh: + brelse(tmp); +out: + return rc; +} + int ocfs2_read_group_descriptor(struct inode *inode, struct ocfs2_dinode *= di, u64 gd_blkno, struct buffer_head **bh) { @@ -1722,7 +1784,7 @@ static int ocfs2_search_one_group(struct ocfs2_alloc_= context *ac, u32 bits_wanted, u32 min_bits, struct ocfs2_suballoc_result *res, - u16 *bits_left) + u16 *bits_left, int *released) { int ret; struct buffer_head *group_bh =3D NULL; @@ -1730,9 +1792,11 @@ static int ocfs2_search_one_group(struct ocfs2_alloc= _context *ac, struct ocfs2_dinode *di =3D (struct ocfs2_dinode *)ac->ac_bh->b_data; struct inode *alloc_inode =3D ac->ac_inode; =20 - ret =3D ocfs2_read_group_descriptor(alloc_inode, di, - res->sr_bg_blkno, &group_bh); - if (ret < 0) { + ret =3D ocfs2_read_hint_group_descriptor(alloc_inode, di, + res->sr_bg_blkno, &group_bh, released); + if (*released) { + return 0; + } else if (ret < 0) { mlog_errno(ret); return ret; } @@ -1934,7 +1998,7 @@ static int ocfs2_claim_suballoc_bits(struct ocfs2_all= oc_context *ac, u32 min_bits, struct ocfs2_suballoc_result *res) { - int status; + int status, released; u16 victim, i; u16 bits_left =3D 0; u64 hint =3D ac->ac_last_group; @@ -1961,6 +2025,7 @@ static int ocfs2_claim_suballoc_bits(struct ocfs2_all= oc_context *ac, goto bail; } =20 + /* the hint bg may already be released, we quiet search this group. */ res->sr_bg_blkno =3D hint; if (res->sr_bg_blkno) { /* Attempt to short-circuit the usual search mechanism @@ -1968,7 +2033,12 @@ static int ocfs2_claim_suballoc_bits(struct ocfs2_al= loc_context *ac, * allocation group. This helps us maintain some * contiguousness across allocations. */ status =3D ocfs2_search_one_group(ac, handle, bits_wanted, - min_bits, res, &bits_left); + min_bits, res, &bits_left, + &released); + if (released) { + res->sr_bg_blkno =3D 0; + goto chain_search; + } if (!status) goto set_hint; if (status < 0 && status !=3D -ENOSPC) { @@ -1976,7 +2046,7 @@ static int ocfs2_claim_suballoc_bits(struct ocfs2_all= oc_context *ac, goto bail; } } - +chain_search: cl =3D (struct ocfs2_chain_list *) &fe->id2.i_chain; =20 victim =3D ocfs2_find_victim_chain(cl); @@ -2077,6 +2147,12 @@ int ocfs2_claim_metadata(handle_t *handle, return status; } =20 +/* + * after ocfs2 has the ability to release block group unused space, + * the ->ip_last_used_group may be invalid. so this function returns + * ac->ac_last_group need to verify. + * refer the 'hint' in ocfs2_claim_suballoc_bits() for more details. + */ static void ocfs2_init_inode_ac_group(struct inode *dir, struct buffer_head *parent_di_bh, struct ocfs2_alloc_context *ac) @@ -2514,6 +2590,197 @@ static int ocfs2_block_group_clear_bits(handle_t *h= andle, return status; } =20 +/* + * Reclaim the suballocator managed space to main bitmap. + * This function first works on the suballocator then switch to the + * main bitmap. + * + * handle: The transaction handle + * alloc_inode: The suballoc inode + * alloc_bh: The buffer_head of suballoc inode + * group_bh: The group descriptor buffer_head of suballocator managed. + * Caller should release the input group_bh. + */ +static int _reclaim_to_main_bm(handle_t *handle, + struct inode *alloc_inode, + struct buffer_head *alloc_bh, + struct buffer_head *group_bh) +{ + int idx, status =3D 0; + int i, next_free_rec, len =3D 0; + __le16 old_bg_contig_free_bits =3D 0; + u16 start_bit; + u32 tmp_used; + u64 bg_blkno, start_blk; + unsigned int count; + struct ocfs2_chain_rec *rec; + struct buffer_head *main_bm_bh =3D NULL; + struct inode *main_bm_inode =3D NULL; + struct ocfs2_super *osb =3D OCFS2_SB(alloc_inode->i_sb); + struct ocfs2_dinode *fe =3D (struct ocfs2_dinode *) alloc_bh->b_data; + struct ocfs2_chain_list *cl =3D &fe->id2.i_chain; + struct ocfs2_group_desc *group =3D (struct ocfs2_group_desc *) group_bh->= b_data; + + idx =3D le16_to_cpu(group->bg_chain); + rec =3D &(cl->cl_recs[idx]); + + status =3D ocfs2_extend_trans(handle, + ocfs2_calc_group_alloc_credits(osb->sb, + le16_to_cpu(cl->cl_cpg))); + if (status) { + mlog_errno(status); + goto bail; + } + status =3D ocfs2_journal_access_di(handle, INODE_CACHE(alloc_inode), + alloc_bh, OCFS2_JOURNAL_ACCESS_WRITE); + if (status < 0) { + mlog_errno(status); + goto bail; + } + + /* + * Only clear the suballocator rec item in-place. + * + * If idx is not the last, we don't compress (remove the empty item) + * the cl_recs[]. If not, we need to do lots jobs. + * + * Compress cl_recs[] code example: + * if (idx !=3D cl->cl_next_free_rec - 1) + * memmove(&cl->cl_recs[idx], &cl->cl_recs[idx + 1], + * sizeof(struct ocfs2_chain_rec) * + * (cl->cl_next_free_rec - idx - 1)); + * for(i =3D idx; i < cl->cl_next_free_rec-1; i++) { + * group->bg_chain =3D "later group->bg_chain"; + * group->bg_blkno =3D xxx; + * ... ... + * } + */ + + tmp_used =3D le32_to_cpu(fe->id1.bitmap1.i_total); + fe->id1.bitmap1.i_total =3D cpu_to_le32(tmp_used - le32_to_cpu(rec->c_tot= al)); + + /* Substraction 1 for the block group itself */ + tmp_used =3D le32_to_cpu(fe->id1.bitmap1.i_used); + fe->id1.bitmap1.i_used =3D cpu_to_le32(tmp_used - 1); + + tmp_used =3D le32_to_cpu(fe->i_clusters); + fe->i_clusters =3D cpu_to_le32(tmp_used - le16_to_cpu(cl->cl_cpg)); + + spin_lock(&OCFS2_I(alloc_inode)->ip_lock); + OCFS2_I(alloc_inode)->ip_clusters -=3D le32_to_cpu(fe->i_clusters); + fe->i_size =3D cpu_to_le64(ocfs2_clusters_to_bytes(alloc_inode->i_sb, + le32_to_cpu(fe->i_clusters))); + spin_unlock(&OCFS2_I(alloc_inode)->ip_lock); + i_size_write(alloc_inode, le64_to_cpu(fe->i_size)); + alloc_inode->i_blocks =3D ocfs2_inode_sector_count(alloc_inode); + + ocfs2_journal_dirty(handle, alloc_bh); + ocfs2_update_inode_fsync_trans(handle, alloc_inode, 0); + + start_blk =3D le64_to_cpu(rec->c_blkno); + count =3D le32_to_cpu(rec->c_total) / le16_to_cpu(cl->cl_bpc); + + /* + * If the rec is the last one, let's compress the chain list by + * removing the empty cl_recs[] at the end. + */ + next_free_rec =3D le16_to_cpu(cl->cl_next_free_rec); + if (idx =3D=3D (next_free_rec - 1)) { + len++; /* the last item should be counted first */ + for (i =3D (next_free_rec - 2); i > 0; i--) { + if (cl->cl_recs[i].c_free =3D=3D cl->cl_recs[i].c_total) + len++; + else + break; + } + } + le16_add_cpu(&cl->cl_next_free_rec, -len); + + rec->c_free =3D 0; + rec->c_total =3D 0; + rec->c_blkno =3D 0; + ocfs2_remove_from_cache(INODE_CACHE(alloc_inode), group_bh); + memset(group, 0, sizeof(struct ocfs2_group_desc)); + + /* prepare job for reclaim clusters */ + main_bm_inode =3D ocfs2_get_system_file_inode(osb, + GLOBAL_BITMAP_SYSTEM_INODE, + OCFS2_INVALID_SLOT); + if (!main_bm_inode) + goto bail; /* ignore the error in reclaim path */ + + inode_lock(main_bm_inode); + + status =3D ocfs2_inode_lock(main_bm_inode, &main_bm_bh, 1); + if (status < 0) + goto free_bm_inode; /* ignore the error in reclaim path */ + + ocfs2_block_to_cluster_group(main_bm_inode, start_blk, &bg_blkno, + &start_bit); + fe =3D (struct ocfs2_dinode *) main_bm_bh->b_data; + cl =3D &fe->id2.i_chain; + /* reuse group_bh, caller will release the input group_bh */ + group_bh =3D NULL; + + /* reclaim clusters to global_bitmap */ + status =3D ocfs2_read_group_descriptor(main_bm_inode, fe, bg_blkno, + &group_bh); + if (status < 0) { + mlog_errno(status); + goto free_bm_bh; + } + group =3D (struct ocfs2_group_desc *) group_bh->b_data; + + if ((count + start_bit) > le16_to_cpu(group->bg_bits)) { + ocfs2_error(alloc_inode->i_sb, + "reclaim length (%d) beyands block group length (%d)", + count + start_bit, le16_to_cpu(group->bg_bits)); + goto free_group_bh; + } + + old_bg_contig_free_bits =3D group->bg_contig_free_bits; + status =3D ocfs2_block_group_clear_bits(handle, main_bm_inode, + group, group_bh, + start_bit, count, 0, + _ocfs2_clear_bit); + if (status < 0) { + mlog_errno(status); + goto free_group_bh; + } + + status =3D ocfs2_journal_access_di(handle, INODE_CACHE(main_bm_inode), + main_bm_bh, OCFS2_JOURNAL_ACCESS_WRITE); + if (status < 0) { + mlog_errno(status); + ocfs2_block_group_set_bits(handle, main_bm_inode, group, group_bh, + start_bit, count, + le16_to_cpu(old_bg_contig_free_bits), 1); + goto free_group_bh; + } + + idx =3D le16_to_cpu(group->bg_chain); + rec =3D &(cl->cl_recs[idx]); + + le32_add_cpu(&rec->c_free, count); + tmp_used =3D le32_to_cpu(fe->id1.bitmap1.i_used); + fe->id1.bitmap1.i_used =3D cpu_to_le32(tmp_used - count); + ocfs2_journal_dirty(handle, main_bm_bh); + +free_group_bh: + brelse(group_bh); + +free_bm_bh: + ocfs2_inode_unlock(main_bm_inode, 1); + brelse(main_bm_bh); + +free_bm_inode: + inode_unlock(main_bm_inode); + iput(main_bm_inode); + +bail: + return status; +} + /* * expects the suballoc inode to already be locked. */ @@ -2526,12 +2793,13 @@ static int _ocfs2_free_suballoc_bits(handle_t *hand= le, void (*undo_fn)(unsigned int bit, unsigned long *bitmap)) { - int status =3D 0; + int idx, status =3D 0; u32 tmp_used; struct ocfs2_dinode *fe =3D (struct ocfs2_dinode *) alloc_bh->b_data; struct ocfs2_chain_list *cl =3D &fe->id2.i_chain; struct buffer_head *group_bh =3D NULL; struct ocfs2_group_desc *group; + struct ocfs2_chain_rec *rec; __le16 old_bg_contig_free_bits =3D 0; =20 /* The alloc_bh comes from ocfs2_free_dinode() or @@ -2577,12 +2845,26 @@ static int _ocfs2_free_suballoc_bits(handle_t *hand= le, goto bail; } =20 - le32_add_cpu(&cl->cl_recs[le16_to_cpu(group->bg_chain)].c_free, - count); + idx =3D le16_to_cpu(group->bg_chain); + rec =3D &(cl->cl_recs[idx]); + + le32_add_cpu(&rec->c_free, count); tmp_used =3D le32_to_cpu(fe->id1.bitmap1.i_used); fe->id1.bitmap1.i_used =3D cpu_to_le32(tmp_used - count); ocfs2_journal_dirty(handle, alloc_bh); =20 + /* + * Reclaim suballocator free space. + * Bypass: global_bitmap, not empty rec, first rec in cl_recs[] + */ + if (ocfs2_is_cluster_bitmap(alloc_inode) || + (le32_to_cpu(rec->c_free) !=3D (le32_to_cpu(rec->c_total) - 1)) || + (le16_to_cpu(cl->cl_next_free_rec) =3D=3D 1)) { + goto bail; + } + + _reclaim_to_main_bm(handle, alloc_inode, alloc_bh, group_bh); + bail: brelse(group_bh); return status; --=20 2.35.3 From nobody Sat Nov 30 16:35:53 2024 Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D521D170A12 for ; Sun, 8 Sep 2024 14:07:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725804442; cv=none; b=RPajFRlV2YMvr6EFKJNDrPq6QTCYAoLAF0YGbpobp/1MEZYK6i5kSkzeXTstsczKMAv37CJgZq0L2m5EDEcn6NL67JZdQR00LjatJKKHwnOlHcdnq3m5Vl1374/uY6lc15i8G5ETdwltzBoJvkAmxUtSUCZulzPSsRy/D8j3A7E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725804442; c=relaxed/simple; bh=Zxej2+RfME3K222sBAiQeMPUVM72vjM2LFlvxrbl7gQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Ae9/HA6doHTtkHNBigQMYNVawJlXvw/sSw5r175WFDrXU8SDuoc0GwFek14r3vmL1bz/DcNm5keqFoNaDB7hXpaPiDPBma4fYegXtaY/Plw0UDKeE9E30NirLgryXWvWvCByptUT2wkM5D6IX1tB9d7qc0ITCr944MQdtZSgjoc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=Er4GvlMO; arc=none smtp.client-ip=209.85.128.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="Er4GvlMO" Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-42cb1866c8fso1293405e9.3 for ; Sun, 08 Sep 2024 07:07:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1725804439; x=1726409239; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pD76QrcvtGoS8LAgez4yfl7ZobGyguVxNj5LqFJ82RU=; b=Er4GvlMO7ZCFInb6xqdUDeA/+D5qjnkhwgI9/0WU6OxYnNzQSWrXSh2sE/kSaZcee2 hkOC/vkvO7QjNEeVRqZnE8O9HdHbPHVG+GJEveHRYG85nlfb2mFnoqxi0NHnKJ46Swc2 qYowe+KnkLbwiFq2BBa3mZ37z48gZPE4WVljlR1LyiDv3QLZNb/wxa5ZhD82DmaeuDhm wIBi6Z5Fbqf/NNXQOeXWXogYDpCGS/u/lSSDtP3AMocoCzIeJUDgQgtV5FGvco+wtzOW gVJlsHISLoRYmHzmML4LoGPv0dQQfDL8ughzMYcn+bSWvYD9B9LTA70UfmSsqu7Dxptg ZnlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725804439; x=1726409239; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pD76QrcvtGoS8LAgez4yfl7ZobGyguVxNj5LqFJ82RU=; b=gSerip6kA/7CcZz/L6/VdhYepghmO+lWq5GBb14CO7+GY006xwFxfBiYqvzv6llSkL 5nZftadgw5oyJUTeC1x3Q+WFnO48M0XRmTv5qlaLBCPyMcTCTRs0pKgexb4l490hk3is Q8z+d7TsAF9SO5jT0DmBC+6V0+x34gcdoL5RblnR1vHZBIGi/Yz3kzH8WrY+X5UCNqW8 OyIy8YaTTkuBMxYx1DPZYiCz9llLuKgLZhilspSAdiTCBqbZepmVNFhxHC41CVl5UJ6a sGxfK4oNinMACPUd3YyyHlJUe+PRWuD2rNnNkMUxWKva+IJF6Y/PBy4qL9Ps5dqvMFb6 Cr0g== X-Forwarded-Encrypted: i=1; AJvYcCW2I0xd/zZB6wOdsg/aBIAZjJxJTmyVVvgYYWptZkbLFEed34lpZxCsYRWVhOZCGwFojhhc2Gn3n4opnBE=@vger.kernel.org X-Gm-Message-State: AOJu0YxHAgZ2HY+nVFDnNu7wGlPDgDG8Zv462s/Wxqa8/tQIrt34CN4q 4E8Nxxhl+yweDgv6Kc5aRPE8B7PCA6oNvYaRqUVcbwuKs+fGJUyRbTD3Sm3kgRw= X-Google-Smtp-Source: AGHT+IGJAcrrz0nQ3n+I7PfPJ77mMXJZlyi6m4sZ0dF/GjJ/GFC76+EHDCQVX8nyW1opyCljOBAe5A== X-Received: by 2002:a05:6000:1545:b0:374:cc10:bb42 with SMTP id ffacd0b85a97d-378895c641bmr2967036f8f.2.1725804438882; Sun, 08 Sep 2024 07:07:18 -0700 (PDT) Received: from localhost.localdomain ([202.127.77.110]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dadc074599sm4863371a91.31.2024.09.08.07.07.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 Sep 2024 07:07:18 -0700 (PDT) From: Heming Zhao To: joseph.qi@linux.alibaba.com, glass.su@suse.com Cc: Heming Zhao , ocfs2-devel@lists.linux.dev, linux-kernel@vger.kernel.org Subject: [PATCH v3 2/3] ocfs2: detect released suballocator bg for fh_to_[dentry|parent] Date: Sun, 8 Sep 2024 22:07:04 +0800 Message-Id: <20240908140705.19169-3-heming.zhao@suse.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20240908140705.19169-1-heming.zhao@suse.com> References: <20240908140705.19169-1-heming.zhao@suse.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" After ocfs2 has the ability to reclaim suballoc free bg, the suballocator block group may be released. This change makes xfstest case 426 failed. The existed code call stack: ocfs2_fh_to_dentry //or ocfs2_fh_to_parent ocfs2_get_dentry ocfs2_test_inode_bit ocfs2_test_suballoc_bit ocfs2_read_group_descriptor + read released bg, triggers validate fails, then cause -EROFS how to fix: The read bg failure is expectation, we should ignore this error. Signed-off-by: Heming Zhao Reviewed-by: Su Yue --- fs/ocfs2/suballoc.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c index d62010166c34..9e847f59c9ef 100644 --- a/fs/ocfs2/suballoc.c +++ b/fs/ocfs2/suballoc.c @@ -3118,7 +3118,7 @@ static int ocfs2_test_suballoc_bit(struct ocfs2_super= *osb, struct ocfs2_group_desc *group; struct buffer_head *group_bh =3D NULL; u64 bg_blkno; - int status; + int status, quiet =3D 0, released; =20 trace_ocfs2_test_suballoc_bit((unsigned long long)blkno, (unsigned int)bit); @@ -3134,11 +3134,15 @@ static int ocfs2_test_suballoc_bit(struct ocfs2_sup= er *osb, =20 bg_blkno =3D group_blkno ? group_blkno : ocfs2_which_suballoc_group(blkno, bit); - status =3D ocfs2_read_group_descriptor(suballoc, alloc_di, bg_blkno, - &group_bh); - if (status < 0) { + status =3D ocfs2_read_hint_group_descriptor(suballoc, alloc_di, bg_blkno, + &group_bh, &released); + if (released) { + quiet =3D 1; + status =3D -EINVAL; + goto bail; + } else if (status < 0) { mlog(ML_ERROR, "read group %llu failed %d\n", - (unsigned long long)bg_blkno, status); + (unsigned long long)bg_blkno, status); goto bail; } =20 @@ -3148,7 +3152,7 @@ static int ocfs2_test_suballoc_bit(struct ocfs2_super= *osb, bail: brelse(group_bh); =20 - if (status) + if (status && (!quiet)) mlog_errno(status); return status; } @@ -3168,7 +3172,7 @@ static int ocfs2_test_suballoc_bit(struct ocfs2_super= *osb, */ int ocfs2_test_inode_bit(struct ocfs2_super *osb, u64 blkno, int *res) { - int status; + int status, quiet =3D 0; u64 group_blkno =3D 0; u16 suballoc_bit =3D 0, suballoc_slot =3D 0; struct inode *inode_alloc_inode; @@ -3210,8 +3214,12 @@ int ocfs2_test_inode_bit(struct ocfs2_super *osb, u6= 4 blkno, int *res) =20 status =3D ocfs2_test_suballoc_bit(osb, inode_alloc_inode, alloc_bh, group_blkno, blkno, suballoc_bit, res); - if (status < 0) - mlog(ML_ERROR, "test suballoc bit failed %d\n", status); + if (status < 0) { + if (status =3D=3D -EINVAL) + quiet =3D 1; + else + mlog(ML_ERROR, "test suballoc bit failed %d\n", status); + } =20 ocfs2_inode_unlock(inode_alloc_inode, 0); inode_unlock(inode_alloc_inode); @@ -3219,7 +3227,7 @@ int ocfs2_test_inode_bit(struct ocfs2_super *osb, u64= blkno, int *res) iput(inode_alloc_inode); brelse(alloc_bh); bail: - if (status) + if (status && !quiet) mlog_errno(status); return status; } --=20 2.35.3 From nobody Sat Nov 30 16:35:53 2024 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0A3A171E5A for ; Sun, 8 Sep 2024 14:07:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725804445; cv=none; b=KMw1I//eL11vyjToZvHpoNG6/j9l0q2/iHSH5I0xFrslWQJxlM3X2f9I9+tHGotQSnUTAgMKzLwvReo7qJ7Io+M8OH19UyNehdb89Hov9HzvpwuyBPizbVjUdPMgLFUCyBnqKSfkssX4DhA5SQP+Y2wMqXF5suL5smpcgLGtYe4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725804445; c=relaxed/simple; bh=VdJeHcRa81uNgktXVnAIXziYQCFEfA5cn+PUdZ+gLkg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=icC8XBW2fB10DIo4KeojxzU2WZbBZDLS/ggXzGVI7C/yzZANvIow/nowPF9Bdoj7pH9apHihcTkMdd9dATKqBbA9BSzhs94Z/vlLt49bcpsWf6X5l1zTRafi2a+b30Aqhkyw3v0Vjlzl+BDMGdAixYSEpeW96uLbZbptwo88apg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=e8WaERJg; arc=none smtp.client-ip=209.85.128.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="e8WaERJg" Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-42cb6eebfa1so271705e9.0 for ; Sun, 08 Sep 2024 07:07:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1725804442; x=1726409242; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vR6N4fCC++oBDLKVVVJ7HiCM6XtmAM6PYHQUGBS1rhs=; b=e8WaERJgQ6IuO2P3WFlE7zeJAaG0/TM4iOW7XCJsNp68ELSNhzuGa/GETvbKkbUXGs W3+iK53K4e1cfdT5+47OlFijupW+FZ2YJrp51GQAMDCFCGw37SlI4idYHcHN8797iCGu u3M64y5NRrU/kRIPZqFHqBFo5VLSDdUGfTJVsZ7tlzNEcfoMF1wOWFokFq5BcuTGNbEa mWflKbnpgJhxCQIfUMTpcjBYxblAzLr5e9KetxzMp+y3w76PZqpZVj5ggxpLR7ZQz1Ob T/ZgnK63GF3y8pJnTZ1fwMW0qyo4blRTMoPG4w7vW46kPn9KttNWT26eWStG0pD1hAcA ZLoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725804442; x=1726409242; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vR6N4fCC++oBDLKVVVJ7HiCM6XtmAM6PYHQUGBS1rhs=; b=S4833dWheSd1LJVHAMbl6Et1f/wMf/Y8XU1+Ed7HsdJo4YuPlaO2mnp0SczwJd/wRa fuVXSfwjljp9JqcZUMZTD5ZkPcgbnRbSq4tEQHdUYQcN0FYd+DEYqp7LK3FKDLi6nBwT WWjWD3LxG41vHUEpq3vpGYdbj+ohWrRN6EPBmkDRlczKo0EJBiOZgmkVD7j4pBSosJug dvyH0wLZlr4uyaW9JWMSQPiO5gOh2G/hVsZdfJ2gy09/yjNNAnoYg5Arx9S3lezhRo99 2SJJXOutwKSq2aMUl8z+ueG5bmNky6VoYMVJ/7wep4kEHbZCUW/Pdc/Ltz0nQxnj8Ixp ghjQ== X-Forwarded-Encrypted: i=1; AJvYcCXI7DzLD8o0ex3yOoVTwTTFvR2N8uAOwmDT53+rKeqf0VPPK6NuBw/LNf1w1ToU25SFYDAbIFnpQuNJQEs=@vger.kernel.org X-Gm-Message-State: AOJu0YxwX2+Hxucwab97eR3kjraKRHacY2Gx5udcMEBbz3TB+wfThb7g 0mePbOLlZMOwLgciiK+UcJfA/f/55HKk2/A2QWNOnkFANRLUgBNjxl0YpNkfmMM= X-Google-Smtp-Source: AGHT+IEjD0/PC0WO2GOqUMpTA2/eZmSm46TAshTjVAAAEN0En/GnWtBMyhgW+2LIQa8czyBy0jD9zw== X-Received: by 2002:a05:600c:4707:b0:426:6f48:415e with SMTP id 5b1f17b1804b1-42c9f97b9ddmr28933705e9.1.1725804441252; Sun, 08 Sep 2024 07:07:21 -0700 (PDT) Received: from localhost.localdomain ([202.127.77.110]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dadc074599sm4863371a91.31.2024.09.08.07.07.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 Sep 2024 07:07:20 -0700 (PDT) From: Heming Zhao To: joseph.qi@linux.alibaba.com, glass.su@suse.com Cc: Heming Zhao , ocfs2-devel@lists.linux.dev, linux-kernel@vger.kernel.org Subject: [PATCH v3 3/3] ocfs2: adjust spinlock_t ip_lock protection scope Date: Sun, 8 Sep 2024 22:07:05 +0800 Message-Id: <20240908140705.19169-4-heming.zhao@suse.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20240908140705.19169-1-heming.zhao@suse.com> References: <20240908140705.19169-1-heming.zhao@suse.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some of the spinlock_t ip_lock protection scopes are incorrect and should follow the usage in 'struct ocfs2_inode_info'. Signed-off-by: Heming Zhao Reviewed-by: Su Yue --- fs/ocfs2/dlmglue.c | 3 ++- fs/ocfs2/inode.c | 5 +++-- fs/ocfs2/resize.c | 4 ++-- fs/ocfs2/suballoc.c | 2 +- 4 files changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index da78a04d6f0b..4a5900c8dc8f 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -2232,6 +2232,8 @@ static int ocfs2_refresh_inode_from_lvb(struct inode = *inode) else inode->i_blocks =3D ocfs2_inode_sector_count(inode); =20 + spin_unlock(&oi->ip_lock); + i_uid_write(inode, be32_to_cpu(lvb->lvb_iuid)); i_gid_write(inode, be32_to_cpu(lvb->lvb_igid)); inode->i_mode =3D be16_to_cpu(lvb->lvb_imode); @@ -2242,7 +2244,6 @@ static int ocfs2_refresh_inode_from_lvb(struct inode = *inode) inode_set_mtime_to_ts(inode, ts); ocfs2_unpack_timespec(&ts, be64_to_cpu(lvb->lvb_ictime_packed)); inode_set_ctime_to_ts(inode, ts); - spin_unlock(&oi->ip_lock); return 0; } =20 diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c index 2cc5c99fe941..4af9a6dfddd2 100644 --- a/fs/ocfs2/inode.c +++ b/fs/ocfs2/inode.c @@ -1348,14 +1348,15 @@ void ocfs2_refresh_inode(struct inode *inode, inode->i_blocks =3D 0; else inode->i_blocks =3D ocfs2_inode_sector_count(inode); + + spin_unlock(&OCFS2_I(inode)->ip_lock); + inode_set_atime(inode, le64_to_cpu(fe->i_atime), le32_to_cpu(fe->i_atime_nsec)); inode_set_mtime(inode, le64_to_cpu(fe->i_mtime), le32_to_cpu(fe->i_mtime_nsec)); inode_set_ctime(inode, le64_to_cpu(fe->i_ctime), le32_to_cpu(fe->i_ctime_nsec)); - - spin_unlock(&OCFS2_I(inode)->ip_lock); } =20 int ocfs2_validate_inode_block(struct super_block *sb, diff --git a/fs/ocfs2/resize.c b/fs/ocfs2/resize.c index c4a4016d3866..b29f71357d63 100644 --- a/fs/ocfs2/resize.c +++ b/fs/ocfs2/resize.c @@ -153,8 +153,8 @@ static int ocfs2_update_last_group_and_inode(handle_t *= handle, =20 spin_lock(&OCFS2_I(bm_inode)->ip_lock); OCFS2_I(bm_inode)->ip_clusters =3D le32_to_cpu(fe->i_clusters); - le64_add_cpu(&fe->i_size, (u64)new_clusters << osb->s_clustersize_bits); spin_unlock(&OCFS2_I(bm_inode)->ip_lock); + le64_add_cpu(&fe->i_size, (u64)new_clusters << osb->s_clustersize_bits); i_size_write(bm_inode, le64_to_cpu(fe->i_size)); =20 ocfs2_journal_dirty(handle, bm_bh); @@ -564,8 +564,8 @@ int ocfs2_group_add(struct inode *inode, struct ocfs2_n= ew_group_input *input) =20 spin_lock(&OCFS2_I(main_bm_inode)->ip_lock); OCFS2_I(main_bm_inode)->ip_clusters =3D le32_to_cpu(fe->i_clusters); - le64_add_cpu(&fe->i_size, (u64)input->clusters << osb->s_clustersize_bits= ); spin_unlock(&OCFS2_I(main_bm_inode)->ip_lock); + le64_add_cpu(&fe->i_size, (u64)input->clusters << osb->s_clustersize_bits= ); i_size_write(main_bm_inode, le64_to_cpu(fe->i_size)); =20 ocfs2_update_super_and_backups(main_bm_inode, input->clusters); diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c index 9e847f59c9ef..3f91615d8702 100644 --- a/fs/ocfs2/suballoc.c +++ b/fs/ocfs2/suballoc.c @@ -798,9 +798,9 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *= osb, =20 spin_lock(&OCFS2_I(alloc_inode)->ip_lock); OCFS2_I(alloc_inode)->ip_clusters =3D le32_to_cpu(fe->i_clusters); + spin_unlock(&OCFS2_I(alloc_inode)->ip_lock); fe->i_size =3D cpu_to_le64(ocfs2_clusters_to_bytes(alloc_inode->i_sb, le32_to_cpu(fe->i_clusters))); - spin_unlock(&OCFS2_I(alloc_inode)->ip_lock); i_size_write(alloc_inode, le64_to_cpu(fe->i_size)); alloc_inode->i_blocks =3D ocfs2_inode_sector_count(alloc_inode); ocfs2_update_inode_fsync_trans(handle, alloc_inode, 0); --=20 2.35.3