From nobody Tue Apr 7 17:16:27 2026 Received: from sender4-op-o15.zoho.com (sender4-op-o15.zoho.com [136.143.188.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 076B83A7834; Thu, 26 Feb 2026 10:18:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=pass smtp.client-ip=136.143.188.15 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772101111; cv=pass; b=fZxu3OoACjfqhp1ZcQCbGM1WMENVWP1nj3D7AdhiUzD4QdpYeTNaWpRKRZHNd0//NPBWeMW+uB6fEUOzkIqXdLX6htUE+QneY56pB8F3IasoFTo2LQBi1pDeh8rSYHsN0LtiHaa1rBTOHcX7vdqOjV08jlds0xov3GDcQsxMPN0= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772101111; c=relaxed/simple; bh=LR1A1sVj98yRHfklQsoSfgF7AoPXlGjwhfJsflSC34I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=S97yrcFI4ezFlfC/mXv+y+5y1f8BlbRDOLno7mynAqxV4s+agzTYjc+3H5U+b7ro0cNuJbY+YBWY8bLZF7/wx707qf+WMNcMhKQheKR9/vny+my2WyB03ZqH2avpu2OEDDccGScDw5ytcNweiYwz2eqspJk59/ts/548pJNUTMI= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.beauty; spf=pass smtp.mailfrom=linux.beauty; dkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty header.b=iVY8uOmi; arc=pass smtp.client-ip=136.143.188.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.beauty Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.beauty Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty header.b="iVY8uOmi" ARC-Seal: i=1; a=rsa-sha256; t=1772101089; cv=none; d=zohomail.com; s=zohoarc; b=jFVEpSO/i/5PiaHgsJnD9wwCwbu23PJe+Tb/1QI9hve0XJYc77h9v5lQA0GRj0vTQfkIo1EORzKKpo2ccoVNZbxcZMZG5cbTUQrJECySjGQYQgpqacxKQ6R8hQDalRVHIZkTcG6QSq9FFEvnDum9o85RIeAVu42LLFiXVTVMjmc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1772101089; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; bh=+TO5qO2m+VCH2iEuDsLwTaboZDhB3f74KLvsUyaJqU4=; b=IW0y+wG5bS6M6SQonAZKIOO+0hQISqFnuW6cm1cbTigR604fAONPADLriaj35fdWN0f9VIQjISeCs/c9OrxdK2jjjugroDC2N/WToowMjQLFZboBWHGwQyiK/5gVR3rKXcUvR6iQZnN7IIr0dBkqJvVEgVhq741/pIaX2zEB91w= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=linux.beauty; spf=pass smtp.mailfrom=me@linux.beauty; dmarc=pass header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1772101089; s=zmail; d=linux.beauty; i=me@linux.beauty; h=From:From:To:To:Cc:Cc:Subject:Subject:Date:Date:Message-ID:In-Reply-To:References:MIME-Version:Content-Transfer-Encoding:Message-Id:Reply-To; bh=+TO5qO2m+VCH2iEuDsLwTaboZDhB3f74KLvsUyaJqU4=; b=iVY8uOmi6jFlIdvngn7gIkFMNX/F5JuAKqVApt6/CijQadlD8u9zfQSyjdSeQV7C b5nPRSLAWDxmwGyq742kJ9YD4UbZbk031+Gundb5XHla+mVI6Iy3C/W/ljcxzfaXYkB q9A5nl4Cv4/56RU3Xf42QM65dUkptYPK3PV3cv/k= Received: by mx.zohomail.com with SMTPS id 1772101088242426.3470871388149; Thu, 26 Feb 2026 02:18:08 -0800 (PST) From: Li Chen To: linux-ext4@vger.kernel.org, "Theodore Ts'o" , Andreas Dilger , linux-kernel@vger.kernel.org Cc: Harshad Shirwadkar , Li Chen Subject: [RFC PATCH 4/4] ext4: fast_commit: replay DAX ByteLog records Date: Thu, 26 Feb 2026 18:17:32 +0800 Message-ID: <20260226101736.2271952-5-me@linux.beauty> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260226101736.2271952-1-me@linux.beauty> References: <20260226101736.2271952-1-me@linux.beauty> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External Content-Type: text/plain; charset="utf-8" Add replay support for EXT4_FC_TAG_DAX_BYTELOG_ANCHOR. The anchor TLV describes a ByteLog window in the DAX-mapped fast commit area, which is validated and then replayed using existing TLV handlers. Signed-off-by: Li Chen --- fs/ext4/fast_commit.c | 246 ++++++++++++++++++++++++++++++++++++++++++ fs/ext4/fast_commit.h | 9 ++ 2 files changed, 255 insertions(+) diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c index 2f7b7ea29df2..6370505ecc86 100644 --- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -12,6 +12,7 @@ #include "ext4_extents.h" #include "mballoc.h" =20 +#include #include /* * Ext4 Fast Commits @@ -2172,10 +2173,228 @@ static bool ext4_fc_value_len_isvalid(struct ext4_= sb_info *sbi, return len >=3D sizeof(struct ext4_fc_tail); case EXT4_FC_TAG_HEAD: return len =3D=3D sizeof(struct ext4_fc_head); + case EXT4_FC_TAG_DAX_BYTELOG_ANCHOR: + return len =3D=3D sizeof(struct ext4_fc_bytelog_entry); } return false; } =20 +static void ext4_fc_reset_bytelog_state(struct ext4_fc_bytelog_state *stat= e) +{ + state->cursor =3D 0; + state->next_seq =3D 0; + state->ring_crc =3D ~0U; + state->initialized =3D false; +} + +typedef int (*ext4_fc_bytelog_cb_t)(struct super_block *sb, + struct ext4_fc_tl_mem *tl, + u8 *val, void *data); + +static int ext4_fc_bytelog_iterate(struct super_block *sb, + struct ext4_fc_bytelog_state *iter, + const struct ext4_fc_bytelog_anchor *anchor, + ext4_fc_bytelog_cb_t fn, void *data) +{ + struct ext4_sb_info *sbi =3D EXT4_SB(sb); + struct ext4_fc_bytelog *log =3D &sbi->s_fc_bytelog; + u8 *base =3D log->kaddr; + u64 cursor, end; + int ret; + + if (!log->mapped || !base) + return -EOPNOTSUPP; + if (anchor->head > log->size_bytes) + return -EFSCORRUPTED; + + iter->cursor =3D anchor->tail; + iter->next_seq =3D 0; + iter->ring_crc =3D ~0U; + iter->initialized =3D true; + cursor =3D iter->cursor; + end =3D anchor->head; + + if (cursor < log->base_off) + return -EFSCORRUPTED; + if (cursor > end || cursor > log->size_bytes) + return -EFSCORRUPTED; + + while (cursor < end) { + struct ext4_fc_bytelog_hdr *hdr; + size_t remaining; + u32 payload_len, record_len; + u16 record_tag; + u8 *payload; + struct ext4_fc_tl_mem tl; + + if (end - cursor > SIZE_MAX) + return -E2BIG; + remaining =3D end - cursor; + if (cursor > log->size_bytes - sizeof(*hdr)) + return -EFSCORRUPTED; + + hdr =3D (struct ext4_fc_bytelog_hdr *)(base + cursor); + payload =3D (u8 *)hdr + sizeof(*hdr); + ret =3D ext4_fc_bytelog_validate_hdr(hdr, remaining, payload); + if (ret) + return ret; + if (!ext4_fc_bytelog_record_committed(hdr)) + return -EUCLEAN; + if (ext4_fc_bytelog_seq(hdr) !=3D iter->next_seq) + return -EUCLEAN; + + payload_len =3D ext4_fc_bytelog_payload_len(hdr); + if (payload_len < EXT4_FC_TAG_BASE_LEN) + return -EFSCORRUPTED; + + record_tag =3D le16_to_cpu(hdr->tag); + if (record_tag =3D=3D EXT4_FC_BYTELOG_TAG_BATCH) { + u32 pos =3D 0; + + while (pos < payload_len) { + u32 value_len; + + if (payload_len - pos < EXT4_FC_TAG_BASE_LEN) + return -EFSCORRUPTED; + + ext4_fc_get_tl(&tl, payload + pos); + value_len =3D tl.fc_len; + if (value_len > + payload_len - pos - EXT4_FC_TAG_BASE_LEN) + return -EFSCORRUPTED; + if (!ext4_fc_value_len_isvalid(sbi, tl.fc_tag, + tl.fc_len)) + return -EFSCORRUPTED; + if (fn) { + ret =3D fn(sb, &tl, + payload + pos + + EXT4_FC_TAG_BASE_LEN, + data); + if (ret) + return ret; + } + pos +=3D EXT4_FC_TAG_BASE_LEN + value_len; + } + } else { + u32 value_len; + + ext4_fc_get_tl(&tl, payload); + value_len =3D payload_len - EXT4_FC_TAG_BASE_LEN; + if (tl.fc_len !=3D value_len) + return -EFSCORRUPTED; + if (record_tag !=3D tl.fc_tag) + return -EFSCORRUPTED; + if (!ext4_fc_value_len_isvalid(sbi, tl.fc_tag, tl.fc_len)) + return -EFSCORRUPTED; + if (fn) { + ret =3D fn(sb, &tl, + payload + EXT4_FC_TAG_BASE_LEN, + data); + if (ret) + return ret; + } + } + + iter->ring_crc =3D crc32c(iter->ring_crc, payload, payload_len); + record_len =3D ext4_fc_bytelog_record_len(hdr); + cursor +=3D record_len; + iter->next_seq++; + } + + if (cursor !=3D end) + return -EFSCORRUPTED; + iter->cursor =3D cursor; + if (iter->next_seq !=3D anchor->seq) + return -EUCLEAN; + if (iter->ring_crc !=3D anchor->crc) + return -EFSBADCRC; + return 0; +} + +static int ext4_fc_bytelog_scan_cb(struct super_block *sb, + struct ext4_fc_tl_mem *tl, u8 *val, + void *data) +{ + struct ext4_fc_add_range ext; + struct ext4_extent *ex; + + (void)data; + switch (tl->fc_tag) { + case EXT4_FC_TAG_ADD_RANGE: + memcpy(&ext, val, sizeof(ext)); + ex =3D (struct ext4_extent *)&ext.fc_ex; + return ext4_fc_record_regions(sb, le32_to_cpu(ext.fc_ino), + le32_to_cpu(ex->ee_block), + ext4_ext_pblock(ex), + ext4_ext_get_actual_len(ex), 0); + case EXT4_FC_TAG_DEL_RANGE: + case EXT4_FC_TAG_LINK: + case EXT4_FC_TAG_UNLINK: + case EXT4_FC_TAG_CREAT: + case EXT4_FC_TAG_INODE: + return 0; + default: + return -EOPNOTSUPP; + } +} + +static int ext4_fc_bytelog_replay_cb(struct super_block *sb, + struct ext4_fc_tl_mem *tl, u8 *val, + void *data) +{ + (void)data; + switch (tl->fc_tag) { + case EXT4_FC_TAG_LINK: + return ext4_fc_replay_link(sb, tl, val); + case EXT4_FC_TAG_UNLINK: + return ext4_fc_replay_unlink(sb, tl, val); + case EXT4_FC_TAG_ADD_RANGE: + return ext4_fc_replay_add_range(sb, tl, val); + case EXT4_FC_TAG_CREAT: + return ext4_fc_replay_create(sb, tl, val); + case EXT4_FC_TAG_DEL_RANGE: + return ext4_fc_replay_del_range(sb, tl, val); + case EXT4_FC_TAG_INODE: + return ext4_fc_replay_inode(sb, tl, val); + default: + return -EOPNOTSUPP; + } +} + +static int ext4_fc_replay_scan_bytelog(struct super_block *sb, + struct ext4_fc_replay_state *state, + const struct ext4_fc_bytelog_anchor *anchor) +{ + int ret; + + ret =3D ext4_fc_bytelog_iterate(sb, &state->fc_bytelog_scan, anchor, + ext4_fc_bytelog_scan_cb, state); + if (ret) + return ret; + return JBD2_FC_REPLAY_CONTINUE; +} + +static int ext4_fc_replay_apply_bytelog(struct super_block *sb, + struct ext4_fc_replay_state *state, + const struct ext4_fc_bytelog_anchor *anchor) +{ + return ext4_fc_bytelog_iterate(sb, &state->fc_bytelog_replay, anchor, + ext4_fc_bytelog_replay_cb, NULL); +} + +static int ext4_fc_replay_bytelog_anchor(struct super_block *sb, + struct ext4_fc_replay_state *state, + struct ext4_fc_tl_mem *tl, u8 *val) +{ + struct ext4_fc_bytelog_entry entry; + struct ext4_fc_bytelog_anchor anchor; + + (void)tl; + memcpy(&entry, val, sizeof(entry)); + ext4_fc_bytelog_anchor_from_disk(&anchor, &entry); + return ext4_fc_replay_apply_bytelog(sb, state, &anchor); +} + /* * Recovery Scan phase handler * @@ -2206,6 +2425,8 @@ static int ext4_fc_replay_scan(journal_t *journal, struct ext4_fc_tail tail; __u8 *start, *end, *cur, *val; struct ext4_fc_head head; + struct ext4_fc_bytelog_entry entry; + struct ext4_fc_bytelog_anchor anchor; struct ext4_extent *ex; =20 state =3D &sbi->s_fc_replay_state; @@ -2220,6 +2441,8 @@ static int ext4_fc_replay_scan(journal_t *journal, state->fc_regions =3D NULL; state->fc_regions_valid =3D state->fc_regions_used =3D state->fc_regions_size =3D 0; + ext4_fc_reset_bytelog_state(&state->fc_bytelog_scan); + ext4_fc_reset_bytelog_state(&state->fc_bytelog_replay); /* Check if we can stop early */ if (le16_to_cpu(((struct ext4_fc_tl *)start)->fc_tag) !=3D EXT4_FC_TAG_HEAD) @@ -2278,6 +2501,9 @@ static int ext4_fc_replay_scan(journal_t *journal, state->fc_replay_num_tags =3D state->fc_cur_tag; state->fc_regions_valid =3D state->fc_regions_used; + if (ext4_fc_bytelog_active(sbi) || + state->fc_bytelog_scan.initialized) + ret =3D JBD2_FC_REPLAY_STOP; } else { ret =3D state->fc_replay_num_tags ? JBD2_FC_REPLAY_STOP : -EFSBADCRC; @@ -2299,6 +2525,15 @@ static int ext4_fc_replay_scan(journal_t *journal, state->fc_crc =3D ext4_chksum(state->fc_crc, cur, EXT4_FC_TAG_BASE_LEN + tl.fc_len); break; + case EXT4_FC_TAG_DAX_BYTELOG_ANCHOR: + state->fc_cur_tag++; + state->fc_crc =3D ext4_chksum(state->fc_crc, cur, + EXT4_FC_TAG_BASE_LEN + + tl.fc_len); + memcpy(&entry, val, sizeof(entry)); + ext4_fc_bytelog_anchor_from_disk(&anchor, &entry); + ret =3D ext4_fc_replay_scan_bytelog(sb, state, &anchor); + break; default: ret =3D state->fc_replay_num_tags ? JBD2_FC_REPLAY_STOP : -ECANCELED; @@ -2335,6 +2570,8 @@ static int ext4_fc_replay(journal_t *journal, struct = buffer_head *bh, if (state->fc_current_pass !=3D pass) { state->fc_current_pass =3D pass; sbi->s_mount_state |=3D EXT4_FC_REPLAY; + if (pass =3D=3D PASS_REPLAY) + ext4_fc_reset_bytelog_state(&state->fc_bytelog_replay); } if (!sbi->s_fc_replay_state.fc_replay_num_tags) { ext4_debug("Replay stops\n"); @@ -2393,9 +2630,18 @@ static int ext4_fc_replay(journal_t *journal, struct= buffer_head *bh, 0, tl.fc_len, 0); memcpy(&tail, val, sizeof(tail)); WARN_ON(le32_to_cpu(tail.fc_tid) !=3D expected_tid); + if ((ext4_fc_bytelog_active(sbi) || + state->fc_bytelog_scan.initialized) && + state->fc_replay_num_tags =3D=3D 0) { + ext4_fc_set_bitmaps_and_counters(sb); + return JBD2_FC_REPLAY_STOP; + } break; case EXT4_FC_TAG_HEAD: break; + case EXT4_FC_TAG_DAX_BYTELOG_ANCHOR: + ret =3D ext4_fc_replay_bytelog_anchor(sb, state, &tl, val); + break; default: trace_ext4_fc_replay(sb, tl.fc_tag, 0, tl.fc_len, 0); ret =3D -ECANCELED; diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h index fb51e19b9778..224d718150c4 100644 --- a/fs/ext4/fast_commit.h +++ b/fs/ext4/fast_commit.h @@ -153,6 +153,13 @@ struct ext4_fc_alloc_region { int ino, len; }; =20 +struct ext4_fc_bytelog_state { + u64 cursor; + u64 next_seq; + u32 ring_crc; + bool initialized; +}; + /* * Fast commit replay state. */ @@ -166,6 +173,8 @@ struct ext4_fc_replay_state { int fc_regions_size, fc_regions_used, fc_regions_valid; int *fc_modified_inodes; int fc_modified_inodes_used, fc_modified_inodes_size; + struct ext4_fc_bytelog_state fc_bytelog_scan; + struct ext4_fc_bytelog_state fc_bytelog_replay; }; =20 #define region_last(__region) (((__region)->lblk) + ((__region)->len) - 1) --=20 2.52.0