From nobody Tue Apr 7 15:27:36 2026 Received: from sender4-op-o15.zoho.com (sender4-op-o15.zoho.com [136.143.188.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE2BD2C17A0; Thu, 26 Feb 2026 10:18:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=pass smtp.client-ip=136.143.188.15 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772101087; cv=pass; b=SswM+x0aFY3dMPdYzAj0T/MfI34Ty5Smje8UQ+UAK+EsOgCMcRdFwgaPcboV8qlcaKU/ij7o93HkQOZWPScfWv0WrpRLjqWDKjKVTRui2dIgMeRSaOIwsgq6/aRCwCCis10XIE/xYVAx60bIWwJ2pETYsPQKYmWN401TjZZvgHg= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772101087; c=relaxed/simple; bh=UEQmXGAxIvlp56ExLyUU2lC9fus+XZbPdPdKRCzb06w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ajXozlfhvunR9tS2WsPGvp3890jjJDAsWYmih/8jXnNn33zZDqb6D9y1RtRNgqe+7FBodfzhHlh1XkL/jGzveWSQ1SDFQi3geZcdkWlWBsItAdLrYdzrLLhZJfrdKlUUN+oiqPvD+2UyVPQnyYXg2edG8Jwowioa1t1sWLV+zQk= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.beauty; spf=pass smtp.mailfrom=linux.beauty; dkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty header.b=n+kFwvFs; arc=pass smtp.client-ip=136.143.188.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.beauty Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.beauty Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty header.b="n+kFwvFs" ARC-Seal: i=1; a=rsa-sha256; t=1772101078; cv=none; d=zohomail.com; s=zohoarc; b=PMaWo+xgy6qXLJlx0oy4W3aouq7PuEac9nAzy/tFAFz/V8OMwZCVohnPzzB0aaoHlpSDeswDQs27mcyLAutyWMF4PU8bTQC5UBYOUJhhR/JiyrYwZu0YPu4feuFC2VmjWNho4JOhvwpmtYXAF/na7ZeHQKELqIHQCzhW21OQsSw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1772101078; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; bh=XA/KQLnF9AaqoWm40JxaGmtWbCeOCAlyOUa1aSgza4w=; b=dQ3oARwr3J4SAy2xNOojNGjJBmRfM48WEPG5VCy2x7pw1zJhdt0N+L5kcrPK4hHvOXEtak3m62lspnNNePz6vHutz8RxmR2bWcNNweNI6TzIR5aUMsNJJSohD+DTlEkGmaZmyQ9nFc/PFYaH5ufsQmLOLxIgim2OfLtC0fM0aRg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=linux.beauty; spf=pass smtp.mailfrom=me@linux.beauty; dmarc=pass header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1772101078; s=zmail; d=linux.beauty; i=me@linux.beauty; h=From:From:To:To:Cc:Cc:Subject:Subject:Date:Date:Message-ID:In-Reply-To:References:MIME-Version:Content-Transfer-Encoding:Message-Id:Reply-To; bh=XA/KQLnF9AaqoWm40JxaGmtWbCeOCAlyOUa1aSgza4w=; b=n+kFwvFsFinuex0jhx328QssJLo0BGzi7BUNkwfPARjx30J3h8G/j1xvjWFMFmXG enn6v64xqV+LMzuB7Ap1Hn/xUhyaqmgf2PT5su62BuV3VQXyAgnjOxkHGQ7BvDTepnb CNIfo6HCe622gS1aFzMvFVD/UCmgYH6KkhhIjB3I= Received: by mx.zohomail.com with SMTPS id 1772101076978703.0884008912927; Thu, 26 Feb 2026 02:17:56 -0800 (PST) From: Li Chen To: linux-ext4@vger.kernel.org, "Theodore Ts'o" , Andreas Dilger , linux-kernel@vger.kernel.org Cc: Harshad Shirwadkar , Li Chen Subject: [RFC PATCH 1/4] ext4: introduce DAX fast commit ByteLog backend Date: Thu, 26 Feb 2026 18:17:29 +0800 Message-ID: <20260226101736.2271952-2-me@linux.beauty> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260226101736.2271952-1-me@linux.beauty> References: <20260226101736.2271952-1-me@linux.beauty> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External Content-Type: text/plain; charset="utf-8" Add a ByteLog backend that can append fast commit records directly into a DAX-mapped fast commit area, avoiding bufferhead based writes. The backend provides a simple record format with CRC32C and helpers for batching and persisting records. Signed-off-by: Li Chen --- MAINTAINERS | 1 + fs/ext4/Makefile | 2 +- fs/ext4/ext4.h | 9 +- fs/ext4/fast_commit_bytelog.c | 780 ++++++++++++++++++++++++++++++++++ fs/ext4/fast_commit_bytelog.h | 147 +++++++ 5 files changed, 937 insertions(+), 2 deletions(-) create mode 100644 fs/ext4/fast_commit_bytelog.c create mode 100644 fs/ext4/fast_commit_bytelog.h diff --git a/MAINTAINERS b/MAINTAINERS index 71f76fddebbf..5a26b99aac63 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9627,6 +9627,7 @@ Q: http://patchwork.ozlabs.org/project/linux-ext4/lis= t/ T: git git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git F: Documentation/filesystems/ext4/ F: fs/ext4/ +F: fs/ext4/fast_commit_bytelog* F: include/trace/events/ext4.h F: include/uapi/linux/ext4.h =20 diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile index 72206a292676..3df51f100536 100644 --- a/fs/ext4/Makefile +++ b/fs/ext4/Makefile @@ -10,7 +10,7 @@ ext4-y :=3D balloc.o bitmap.o block_validity.o dir.o ext4= _jbd2.o extents.o \ indirect.o inline.o inode.o ioctl.o mballoc.o migrate.o \ mmp.o move_extent.o namei.o page-io.o readpage.o resize.o \ super.o symlink.o sysfs.o xattr.o xattr_hurd.o xattr_trusted.o \ - xattr_user.o fast_commit.o orphan.o + xattr_user.o fast_commit.o fast_commit_bytelog.o orphan.o =20 ext4-$(CONFIG_EXT4_FS_POSIX_ACL) +=3D acl.o ext4-$(CONFIG_EXT4_FS_SECURITY) +=3D xattr_security.o diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 293f698b7042..1b0746bf4869 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -999,6 +999,7 @@ do { \ =20 #include "extents_status.h" #include "fast_commit.h" +#include "fast_commit_bytelog.h" =20 /* * Lock subclasses for i_data_sem in the ext4_inode_info structure. @@ -1282,6 +1283,8 @@ struct ext4_inode_info { * scanning in mballoc */ #define EXT4_MOUNT2_ABORT 0x00000100 /* Abort filesystem */ +#define EXT4_MOUNT2_DAX_FC_BYTELOG 0x00000200 /* Use DAX ByteLog FC backen= d */ +#define EXT4_MOUNT2_DAX_FC_BYTELOG_FORCE 0x00000400 /* Ignore feature bit = */ =20 #define clear_opt(sb, opt) EXT4_SB(sb)->s_mount_opt &=3D \ ~EXT4_MOUNT_##opt @@ -1797,6 +1800,7 @@ struct ext4_sb_info { int s_fc_debug_max_replay; #endif struct ext4_fc_replay_state s_fc_replay_state; + struct ext4_fc_bytelog s_fc_bytelog; }; =20 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb) @@ -2125,6 +2129,7 @@ static inline bool ext4_inode_orphan_tracked(struct i= node *inode) #define EXT4_FEATURE_INCOMPAT_INLINE_DATA 0x8000 /* data in inode */ #define EXT4_FEATURE_INCOMPAT_ENCRYPT 0x10000 #define EXT4_FEATURE_INCOMPAT_CASEFOLD 0x20000 +#define EXT4_FEATURE_INCOMPAT_DAX_FC_BYTELOG 0x40000 =20 extern void ext4_update_dynamic_rev(struct super_block *sb); =20 @@ -2224,6 +2229,7 @@ EXT4_FEATURE_INCOMPAT_FUNCS(largedir, LARGEDIR) EXT4_FEATURE_INCOMPAT_FUNCS(inline_data, INLINE_DATA) EXT4_FEATURE_INCOMPAT_FUNCS(encrypt, ENCRYPT) EXT4_FEATURE_INCOMPAT_FUNCS(casefold, CASEFOLD) +EXT4_FEATURE_INCOMPAT_FUNCS(dax_fc_bytelog, DAX_FC_BYTELOG) =20 #define EXT2_FEATURE_COMPAT_SUPP EXT4_FEATURE_COMPAT_EXT_ATTR #define EXT2_FEATURE_INCOMPAT_SUPP (EXT4_FEATURE_INCOMPAT_FILETYPE| \ @@ -2254,7 +2260,8 @@ EXT4_FEATURE_INCOMPAT_FUNCS(casefold, CASEFOLD) EXT4_FEATURE_INCOMPAT_ENCRYPT | \ EXT4_FEATURE_INCOMPAT_CASEFOLD | \ EXT4_FEATURE_INCOMPAT_CSUM_SEED | \ - EXT4_FEATURE_INCOMPAT_LARGEDIR) + EXT4_FEATURE_INCOMPAT_LARGEDIR | \ + EXT4_FEATURE_INCOMPAT_DAX_FC_BYTELOG) #define EXT4_FEATURE_RO_COMPAT_SUPP (EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \ EXT4_FEATURE_RO_COMPAT_LARGE_FILE| \ EXT4_FEATURE_RO_COMPAT_GDT_CSUM| \ diff --git a/fs/ext4/fast_commit_bytelog.c b/fs/ext4/fast_commit_bytelog.c new file mode 100644 index 000000000000..64ba3edddbcb --- /dev/null +++ b/fs/ext4/fast_commit_bytelog.c @@ -0,0 +1,780 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "ext4.h" +#include "fast_commit_bytelog.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define EXT4_FC_BYTELOG_META_BLOCKS 1 + +static void ext4_fc_bytelog_reset_batch(struct ext4_fc_bytelog *log); +static int ext4_fc_bytelog_flush_batch(struct super_block *sb, u32 tid); + +#define EXT4_FC_CRC32C_POLY 0x82f63b78 +#define EXT4_FC_CRC32C_SHIFT_BITS (sizeof(size_t) * 8) + +static u32 ext4_fc_crc32c_shift_mats[EXT4_FC_CRC32C_SHIFT_BITS][32]; +static bool ext4_fc_crc32c_shift_mats_ready; + +static u32 ext4_fc_gf2_matrix_times(const u32 *mat, u32 vec) +{ + u32 sum =3D 0; + int i; + + for (i =3D 0; i < 32; i++) { + if (vec & 1) + sum ^=3D mat[i]; + vec >>=3D 1; + } + + return sum; +} + +static void ext4_fc_gf2_matrix_square(u32 *square, const u32 *mat) +{ + int i; + + for (i =3D 0; i < 32; i++) + square[i] =3D ext4_fc_gf2_matrix_times(mat, mat[i]); +} + +static void ext4_fc_crc32c_shift_mats_init_once(void) +{ + static DEFINE_MUTEX(lock); + u32 even[32], odd[32], one_byte[32]; + u32 row =3D 1; + int i; + + if (READ_ONCE(ext4_fc_crc32c_shift_mats_ready)) + return; + + mutex_lock(&lock); + if (ext4_fc_crc32c_shift_mats_ready) + goto out; + + /* + * Build the GF(2) matrix operator for shifting by one byte of zeros, + * then square it repeatedly to get powers of two. + */ + odd[0] =3D EXT4_FC_CRC32C_POLY; + for (i =3D 1; i < 32; i++) { + odd[i] =3D row; + row <<=3D 1; + } + ext4_fc_gf2_matrix_square(even, odd); /* 2 zero bits */ + ext4_fc_gf2_matrix_square(odd, even); /* 4 zero bits */ + ext4_fc_gf2_matrix_square(one_byte, odd); /* 8 zero bits */ + + memcpy(ext4_fc_crc32c_shift_mats[0], one_byte, sizeof(one_byte)); + for (i =3D 1; i < EXT4_FC_CRC32C_SHIFT_BITS; i++) + ext4_fc_gf2_matrix_square(ext4_fc_crc32c_shift_mats[i], + ext4_fc_crc32c_shift_mats[i - 1]); + + WRITE_ONCE(ext4_fc_crc32c_shift_mats_ready, true); +out: + mutex_unlock(&lock); +} + +static u32 ext4_fc_crc32c_shift_zeros(u32 crc, size_t len) +{ + size_t shift =3D len; + int bit =3D 0; + + while (shift) { + if (shift & 1) + crc =3D ext4_fc_gf2_matrix_times(ext4_fc_crc32c_shift_mats[bit], crc); + shift >>=3D 1; + bit++; + } + + return crc; +} + +u32 ext4_fc_bytelog_crc32(const void *buf, size_t len) +{ + return crc32c(~0, buf, len); +} + +bool ext4_fc_bytelog_mapped(struct ext4_sb_info *sbi) +{ + return READ_ONCE(sbi->s_fc_bytelog.mapped); +} + +bool ext4_fc_bytelog_active(struct ext4_sb_info *sbi) +{ + struct ext4_fc_bytelog *log =3D &sbi->s_fc_bytelog; + + return log->mapped && log->enabled; +} + +size_t ext4_fc_bytelog_record_size(size_t payload_len) +{ + size_t len =3D sizeof(struct ext4_fc_bytelog_hdr) + payload_len; + + return ALIGN(len, EXT4_FC_BYTELOG_ALIGN); +} + +void ext4_fc_bytelog_prep_hdr(struct ext4_fc_bytelog_hdr *hdr, u16 tag, + u16 flags, u32 tid, u64 seq, u32 payload_len) +{ + memset(hdr, 0, sizeof(*hdr)); + + hdr->magic =3D cpu_to_le32(EXT4_FC_BYTELOG_MAGIC); + hdr->version =3D cpu_to_le16(EXT4_FC_BYTELOG_VERSION); + hdr->hdr_len =3D cpu_to_le16(sizeof(*hdr)); + hdr->tid =3D cpu_to_le32(tid); + hdr->tag =3D cpu_to_le16(tag); + hdr->flags =3D cpu_to_le16(flags & ~EXT4_FC_BYTELOG_COMMITTED); + hdr->payload_len =3D cpu_to_le32(payload_len); + hdr->record_len =3D cpu_to_le32(ext4_fc_bytelog_record_size(payload_len)); + hdr->seq =3D cpu_to_le64(seq); +} + +void ext4_fc_bytelog_finalize_hdr_crc(struct ext4_fc_bytelog_hdr *hdr, + u32 payload_crc) +{ + struct ext4_fc_bytelog_hdr tmp; + u32 crc; + + hdr->payload_crc =3D cpu_to_le32(payload_crc); + hdr->header_crc =3D 0; + + tmp =3D *hdr; + tmp.header_crc =3D 0; + crc =3D ext4_fc_bytelog_crc32(&tmp, sizeof(tmp)); + hdr->header_crc =3D cpu_to_le32(crc); +} + +static bool ext4_fc_bytelog_record_sane(const struct ext4_fc_bytelog_hdr *= hdr, + size_t remaining) +{ + u32 record_len =3D le32_to_cpu(hdr->record_len); + u32 payload_len =3D le32_to_cpu(hdr->payload_len); + u16 hdr_len =3D le16_to_cpu(hdr->hdr_len); + + if (le32_to_cpu(hdr->magic) !=3D EXT4_FC_BYTELOG_MAGIC) + return false; + if (le16_to_cpu(hdr->version) !=3D EXT4_FC_BYTELOG_VERSION) + return false; + if (hdr_len !=3D sizeof(*hdr)) + return false; + if (!record_len || record_len > remaining) + return false; + if (!IS_ALIGNED(record_len, EXT4_FC_BYTELOG_ALIGN)) + return false; + if (record_len < hdr_len) + return false; + if (payload_len > record_len - hdr_len) + return false; + + return true; +} + +int ext4_fc_bytelog_validate_hdr(const struct ext4_fc_bytelog_hdr *hdr, + size_t remaining, const void *payload) +{ + struct ext4_fc_bytelog_hdr tmp; + u32 payload_len =3D le32_to_cpu(hdr->payload_len); + u32 crc; + + if (!ext4_fc_bytelog_record_sane(hdr, remaining)) + return -EINVAL; + + tmp =3D *hdr; + tmp.header_crc =3D 0; + crc =3D ext4_fc_bytelog_crc32(&tmp, sizeof(tmp)); + if (crc !=3D le32_to_cpu(hdr->header_crc)) + return -EFSBADCRC; + + if (!payload_len) + return 0; + if (!payload) + return -EINVAL; + + crc =3D ext4_fc_bytelog_crc32(payload, payload_len); + if (crc !=3D le32_to_cpu(hdr->payload_crc)) + return -EFSBADCRC; + + return 0; +} + +void ext4_fc_bytelog_mark_committed(struct ext4_fc_bytelog_hdr *hdr) +{ + u16 flags =3D le16_to_cpu(hdr->flags); + struct ext4_fc_bytelog_hdr tmp; + u32 crc; + + flags |=3D EXT4_FC_BYTELOG_COMMITTED; + hdr->flags =3D cpu_to_le16(flags); + + tmp =3D *hdr; + tmp.header_crc =3D 0; + crc =3D ext4_fc_bytelog_crc32(&tmp, sizeof(tmp)); + hdr->header_crc =3D cpu_to_le32(crc); +} + +void ext4_fc_bytelog_flush_persist(void *addr, size_t len) +{ + u8 *p =3D addr; + size_t off =3D 0; + + if (!len) + return; + + /* + * Large flushes can be very bursty. Chunk the flush so other tasks + * can make progress between chunks. + */ + if (len <=3D 65536) { + arch_wb_cache_pmem(p, len); + return; + } + + while (off < len) { + size_t n =3D min(len - off, (size_t)65536); + + arch_wb_cache_pmem(p + off, n); + off +=3D n; + cond_resched(); + } +} + +void ext4_fc_bytelog_persist_barrier(void) +{ + pmem_wmb(); +} + +static int ext4_fc_bytelog_map_ring(struct super_block *sb, + journal_t *journal, + struct ext4_fc_bytelog *log) +{ + struct ext4_sb_info *sbi =3D EXT4_SB(sb); + unsigned long long first, anchor; + unsigned long fc_blocks; + unsigned long ring_blocks; + u64 start_bytes, ring_bytes, start_offset; + pgoff_t start_pgoff; + unsigned long ring_pages; + void *addr =3D NULL; + int ret; + int blkbits =3D sb->s_blocksize_bits; + + if (!journal->j_inode) + return -EOPNOTSUPP; + + if (journal->j_fc_last <=3D journal->j_fc_first + 1) + return -ENOSPC; + + fc_blocks =3D journal->j_fc_last - journal->j_fc_first; + ring_blocks =3D fc_blocks - 1; + if (ring_blocks <=3D EXT4_FC_BYTELOG_META_BLOCKS) + return -ENOSPC; + + ret =3D jbd2_journal_bmap(journal, journal->j_fc_first, &first); + if (ret) + return ret; + + ret =3D jbd2_journal_bmap(journal, journal->j_fc_last - 1, &anchor); + if (ret) + return ret; + + start_bytes =3D (u64)first << blkbits; + ring_bytes =3D (u64)ring_blocks << blkbits; + if (!ring_bytes) + return -ENOSPC; + if (ring_bytes & (PAGE_SIZE - 1)) + return -EOPNOTSUPP; + if (start_bytes > U64_MAX - sbi->s_dax_part_off) + return -ERANGE; + + start_offset =3D start_bytes + sbi->s_dax_part_off; + if (!IS_ALIGNED(start_offset, PAGE_SIZE)) + return -EINVAL; + + start_pgoff =3D start_offset >> PAGE_SHIFT; + ring_pages =3D ring_bytes >> PAGE_SHIFT; + if (!ring_pages || ring_pages > LONG_MAX) + return -E2BIG; + +#if IS_ENABLED(CONFIG_DAX) + { + long mapped; + int dax_id =3D dax_read_lock(); + + mapped =3D dax_direct_access(sbi->s_daxdev, start_pgoff, + ring_pages, DAX_ACCESS, &addr, + NULL); + dax_read_unlock(dax_id); + if (mapped < 0) + return mapped; + if (mapped < ring_pages) + return -ENXIO; + } +#else + return -EOPNOTSUPP; +#endif + + log->kaddr =3D addr; + log->size_bytes =3D ring_bytes; + log->base_off =3D (u64)EXT4_FC_BYTELOG_META_BLOCKS << blkbits; + log->persist_off =3D log->base_off; + log->blocks =3D ring_blocks; + log->blocksize =3D sb->s_blocksize; + log->start_pblk =3D first; + log->anchor_pblk =3D anchor; + + return 0; +} + +int ext4_fc_bytelog_init(struct super_block *sb, journal_t *journal) +{ + struct ext4_sb_info *sbi =3D EXT4_SB(sb); + struct ext4_fc_bytelog *log =3D &sbi->s_fc_bytelog; + bool have_feature =3D ext4_has_feature_dax_fc_bytelog(sb); + bool requested =3D test_opt2(sb, DAX_FC_BYTELOG); + bool force =3D test_opt2(sb, DAX_FC_BYTELOG_FORCE); + bool need_map =3D have_feature || requested || force; + u32 batch_max; + int ret; + + if (!need_map) { + log->enabled =3D false; + log->last_error =3D -EOPNOTSUPP; + return 0; + } + + ext4_fc_crc32c_shift_mats_init_once(); + + if (log->mapped) + goto enable; + + batch_max =3D log->batch_max; + memset(log, 0, sizeof(*log)); + log->batch_max =3D batch_max ? batch_max : + EXT4_FC_BYTELOG_BATCH_MAX_DEFAULT; + log->last_error =3D -EOPNOTSUPP; + + if (!journal || !test_opt2(sb, JOURNAL_FAST_COMMIT)) { + if (requested) + ext4_msg(sb, KERN_INFO, + "dax_fc_bytelog requires fast commits enabled"); + return -EOPNOTSUPP; + } + + /* + * ext4_fc_bytelog_init() is called once before jbd2_journal_load() so + * that existing ByteLog records can be replayed. On a fresh + * filesystem, the JBD2 fast-commit feature may not be enabled on the + * journal yet, so there is no fast-commit area to map at this stage. + * + * If the on-disk feature bit is set, lack of journal fast-commit + * support indicates an inconsistent filesystem and must be fatal. + * Otherwise, defer mapping until the post-journal-load init path. + */ + if (!jbd2_has_feature_fast_commit(journal)) { + if (have_feature) { + ext4_msg(sb, KERN_ERR, + "dax_fc_bytelog requires JBD2 fast commits enabled"); + return -EINVAL; + } + + log->enabled =3D false; + log->last_error =3D -EOPNOTSUPP; + return 0; + } + + /* + * When dax_fc_bytelog=3Don is specified without the incompat feature + * bit, refuse to enable ByteLog. dax_fc_bytelog=3Dforce overrides this + * check and is intended only for testing. + */ + if (!have_feature && requested && !force) { + ext4_msg(sb, KERN_INFO, + "dax_fc_bytelog=3Don requires INCOMPAT_DAX_FC_BYTELOG"); + return -EOPNOTSUPP; + } + if (!have_feature && force) + ext4_warning(sb, + "forcing dax_fc_bytelog without INCOMPAT_DAX_FC_BYTELOG; older ker= nels cannot safely mount this filesystem"); + + if (test_opt2(sb, DAX_NEVER)) { + ext4_msg(sb, KERN_INFO, + "dax_fc_bytelog requires DAX, but dax=3Dnever is set"); + return -EOPNOTSUPP; + } + if (!sbi->s_daxdev) { + ext4_msg(sb, KERN_INFO, + "dax_fc_bytelog requires a dax-capable filesystem device"); + return -EOPNOTSUPP; + } + if (sb->s_blocksize !=3D PAGE_SIZE) { + ext4_msg(sb, KERN_INFO, + "dax_fc_bytelog requires blocksize =3D=3D PAGE_SIZE"); + return -EOPNOTSUPP; + } + + ret =3D ext4_fc_bytelog_map_ring(sb, journal, log); + if (ret) { + log->last_error =3D ret; + ext4_msg(sb, KERN_INFO, + "dax_fc_bytelog disabled: unable to map fast-commit ring (err=3D%d)", + ret); + ext4_debug("ByteLog mapping unavailable (err=3D%d)\n", ret); + return ret; + } + + log->head =3D log->base_off; + log->tail =3D log->base_off; + log->seq =3D 0; + log->ring_crc =3D ~0; + log->dirty =3D false; + log->persist_off =3D log->base_off; + ext4_fc_bytelog_reset_batch(log); + log->mapped =3D true; + log->last_error =3D 0; +enable: + log->enabled =3D requested || force; + return 0; +} + +void ext4_fc_bytelog_release(struct super_block *sb) +{ + struct ext4_sb_info *sbi =3D EXT4_SB(sb); + + memset(&sbi->s_fc_bytelog, 0, sizeof(sbi->s_fc_bytelog)); +} + +void ext4_fc_bytelog_reset(struct super_block *sb, bool full) +{ + struct ext4_fc_bytelog *log =3D &EXT4_SB(sb)->s_fc_bytelog; + + if (!log->mapped) + return; + if (!full) + return; + + log->head =3D log->base_off; + log->tail =3D log->base_off; + log->seq =3D 0; + log->ring_crc =3D ~0; + log->dirty =3D false; + log->persist_off =3D log->base_off; + ext4_fc_bytelog_reset_batch(log); +} + +void ext4_fc_bytelog_begin_commit(struct super_block *sb) +{ + struct ext4_sb_info *sbi =3D EXT4_SB(sb); + struct ext4_fc_bytelog *log =3D &sbi->s_fc_bytelog; + + if (!log->mapped || !log->enabled) + return; + + log->head =3D log->base_off; + log->tail =3D log->base_off; + log->seq =3D 0; + log->ring_crc =3D ~0; + log->dirty =3D false; + log->persist_off =3D log->base_off; + ext4_fc_bytelog_reset_batch(log); +} + +int ext4_fc_bytelog_end_commit(struct super_block *sb) +{ + struct ext4_sb_info *sbi =3D EXT4_SB(sb); + struct ext4_fc_bytelog *log =3D &sbi->s_fc_bytelog; + journal_t *journal =3D sbi->s_journal; + u8 *base; + u64 cursor, end; + u32 tid; + int ret; + + if (!log->mapped || !log->enabled) + return 0; + + if (!journal || !journal->j_running_transaction) + return -EINVAL; + tid =3D journal->j_running_transaction->t_tid; + + ret =3D ext4_fc_bytelog_flush_batch(sb, tid); + if (ret) { + log->last_error =3D ret; + return ret; + } + + if (!log->dirty) + return 0; + + base =3D log->kaddr; + if (!base) + return -EOPNOTSUPP; + + cursor =3D log->persist_off; + end =3D log->head; + if (end <=3D cursor) + return 0; + + ext4_fc_bytelog_flush_persist(base + cursor, end - cursor); + ext4_fc_bytelog_persist_barrier(); + + log->persist_off =3D end; + log->dirty =3D false; + return 0; +} + +static inline bool ext4_fc_bytelog_has_space(struct ext4_fc_bytelog *log, + size_t len) +{ + if (log->head < log->base_off) + return false; + if (len > log->size_bytes - log->base_off) + return false; + return log->head + len <=3D log->size_bytes; +} + +static void ext4_fc_bytelog_reset_batch(struct ext4_fc_bytelog *log) +{ + log->batch_first_tag =3D 0; + log->batch_len =3D 0; + log->batch_tlvs =3D 0; + log->batch_payload_crc =3D ~0U; +} + +static int ext4_fc_bytelog_commit_record(struct super_block *sb, u32 tid, = u16 tag, + size_t payload_len, u32 payload_crc) +{ + struct ext4_sb_info *sbi =3D EXT4_SB(sb); + struct ext4_fc_bytelog *log =3D &sbi->s_fc_bytelog; + struct ext4_fc_bytelog_hdr hdr; + size_t total_len, off; + u32 ring_crc; + u8 *dst; + u8 *payload; + u64 seq; + bool mats_ready; + + total_len =3D ext4_fc_bytelog_record_size(payload_len); + if (!ext4_fc_bytelog_has_space(log, total_len)) + return -ENOSPC; + + seq =3D log->seq; + ring_crc =3D log->ring_crc; + + mats_ready =3D READ_ONCE(ext4_fc_crc32c_shift_mats_ready); + ext4_fc_bytelog_prep_hdr(&hdr, tag, 0, tid, seq, payload_len); + dst =3D (u8 *)log->kaddr + log->head; + off =3D sizeof(hdr); + payload =3D dst + off; + + if (payload_len) { + if (likely(mats_ready)) { + ring_crc =3D ext4_fc_crc32c_shift_zeros(ring_crc ^ ~0U, payload_len); + ring_crc ^=3D payload_crc; + } else { + ring_crc =3D crc32c(ring_crc, payload, payload_len); + } + off +=3D payload_len; + } else { + payload_crc =3D ext4_fc_bytelog_crc32(NULL, 0); + } + + if (off < total_len) { + size_t pad =3D total_len - off; + + memset(dst + off, 0, pad); + } + + hdr.flags =3D cpu_to_le16(le16_to_cpu(hdr.flags) | EXT4_FC_BYTELOG_COMMIT= TED); + ext4_fc_bytelog_finalize_hdr_crc(&hdr, payload_crc); + memcpy(dst, &hdr, sizeof(hdr)); + + log->head +=3D total_len; + log->seq++; + log->dirty =3D true; + log->ring_crc =3D ring_crc; + + return 0; +} + +static size_t ext4_fc_bytelog_copy_vecs(u8 *dst, + struct ext4_fc_bytelog_vec *vecs, + int nvec, u32 *crc) +{ + size_t off =3D 0; + u32 crc_val =3D crc ? *crc : 0; + int i; + + for (i =3D 0; i < nvec; i++) { + const u8 *src =3D vecs[i].base; + size_t len =3D vecs[i].len; + + if (!len) + continue; + + while (i + 1 < nvec && vecs[i + 1].len && + vecs[i + 1].base =3D=3D src + len) { + len +=3D vecs[i + 1].len; + i++; + } + + if (crc) + crc_val =3D crc32c(crc_val, src, len); + memcpy(dst + off, src, len); + off +=3D len; + } + + if (crc) + *crc =3D crc_val; + return off; +} + +static int ext4_fc_bytelog_append_vec_direct(struct super_block *sb, u32 t= id, u16 tag, + struct ext4_fc_bytelog_vec *vecs, + int nvec, size_t payload_len) +{ + struct ext4_sb_info *sbi =3D EXT4_SB(sb); + struct ext4_fc_bytelog *log =3D &sbi->s_fc_bytelog; + size_t total_len; + u32 payload_crc =3D ~0U; + u8 *dst; + + total_len =3D ext4_fc_bytelog_record_size(payload_len); + if (!ext4_fc_bytelog_has_space(log, total_len)) + return -ENOSPC; + + dst =3D (u8 *)log->kaddr + log->head + sizeof(struct ext4_fc_bytelog_hdr); + ext4_fc_bytelog_copy_vecs(dst, vecs, nvec, &payload_crc); + return ext4_fc_bytelog_commit_record(sb, tid, tag, payload_len, + payload_crc); +} + +static int ext4_fc_bytelog_flush_batch(struct super_block *sb, u32 tid) +{ + struct ext4_fc_bytelog *log =3D &EXT4_SB(sb)->s_fc_bytelog; + u32 payload_crc =3D ~0U; + u16 tag; + int ret; + + if (!log->batch_len) + return 0; + + tag =3D log->batch_first_tag; + if (log->batch_tlvs > 1) + tag =3D EXT4_FC_BYTELOG_TAG_BATCH; + + if (!log->kaddr) + return -EOPNOTSUPP; + + payload_crc =3D log->batch_payload_crc; + ret =3D ext4_fc_bytelog_commit_record(sb, tid, tag, log->batch_len, + payload_crc); + ext4_fc_bytelog_reset_batch(log); + return ret; +} + +int ext4_fc_bytelog_append_vec(struct super_block *sb, u16 tag, + struct ext4_fc_bytelog_vec *vecs, int nvec) +{ + struct ext4_sb_info *sbi =3D EXT4_SB(sb); + struct ext4_fc_bytelog *log =3D &sbi->s_fc_bytelog; + struct journal_s *journal =3D sbi->s_journal; + size_t payload_len =3D 0; + u32 batch_max =3D log->batch_max; + u32 tid; + int i; + u8 *base; + u8 *dst; + + if (!ext4_fc_bytelog_active(sbi)) + return -EOPNOTSUPP; + + if (!journal || !journal->j_running_transaction) + return -EINVAL; + tid =3D journal->j_running_transaction->t_tid; + + for (i =3D 0; i < nvec; i++) + payload_len +=3D vecs[i].len; + + base =3D log->kaddr; + if (!base) + return -EOPNOTSUPP; + + if (!batch_max) { + int ret; + + ret =3D ext4_fc_bytelog_flush_batch(sb, tid); + if (ret) + return ret; + return ext4_fc_bytelog_append_vec_direct(sb, tid, tag, vecs, + nvec, payload_len); + } + + if (payload_len > batch_max) { + int ret; + + ret =3D ext4_fc_bytelog_flush_batch(sb, tid); + if (ret) + return ret; + return ext4_fc_bytelog_append_vec_direct(sb, tid, tag, vecs, + nvec, payload_len); + } + + if (log->batch_len && log->batch_len + payload_len > batch_max) { + int ret; + + ret =3D ext4_fc_bytelog_flush_batch(sb, tid); + if (ret) + return ret; + } + + if (!log->batch_len) + log->batch_first_tag =3D tag; + + if (!ext4_fc_bytelog_has_space(log, + ext4_fc_bytelog_record_size(log->batch_len + + payload_len))) { + int ret; + + ret =3D ext4_fc_bytelog_flush_batch(sb, tid); + if (ret) + return ret; + log->batch_first_tag =3D tag; + } + + if (!ext4_fc_bytelog_has_space(log, + ext4_fc_bytelog_record_size(log->batch_len + + payload_len))) + return -ENOSPC; + + dst =3D base + log->head + sizeof(struct ext4_fc_bytelog_hdr) + + log->batch_len; + log->batch_len +=3D ext4_fc_bytelog_copy_vecs(dst, vecs, nvec, &log->batc= h_payload_crc); + log->batch_tlvs++; + log->dirty =3D true; + return 0; +} + +void ext4_fc_bytelog_build_anchor(struct super_block *sb, + struct ext4_fc_bytelog_anchor *anchor, + u32 tid) +{ + struct ext4_fc_bytelog *log =3D &EXT4_SB(sb)->s_fc_bytelog; + + memset(anchor, 0, sizeof(*anchor)); + anchor->tid =3D tid; + anchor->head =3D log->head; + anchor->tail =3D log->tail; + anchor->seq =3D log->seq; + anchor->crc =3D log->ring_crc; +} diff --git a/fs/ext4/fast_commit_bytelog.h b/fs/ext4/fast_commit_bytelog.h new file mode 100644 index 000000000000..d52754890222 --- /dev/null +++ b/fs/ext4/fast_commit_bytelog.h @@ -0,0 +1,147 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _EXT4_FAST_COMMIT_BYTELOG_H +#define _EXT4_FAST_COMMIT_BYTELOG_H + +#include +#include +#include + +struct super_block; +struct journal_s; +struct ext4_sb_info; + +#define EXT4_FC_BYTELOG_MAGIC 0x4c424346 /* "FCBL" */ +#define EXT4_FC_BYTELOG_VERSION 1 +#define EXT4_FC_BYTELOG_ALIGN 64 +#define EXT4_FC_BYTELOG_BATCH_MAX_DEFAULT 4096 + +/* + * Record header @tag for a batched TLV payload stream. + * + * In this case the payload is a stream of standard fast-commit TLVs + * (struct ext4_fc_tl + value). + */ +#define EXT4_FC_BYTELOG_TAG_BATCH 0xffff + +/* Record flag bits */ +#define EXT4_FC_BYTELOG_COMMITTED BIT(0) + +/** + * struct ext4_fc_bytelog_hdr - On-media header for a ByteLog record + * @magic: Magic identifying the record + * @version: On-disk header format version + * @hdr_len: Length of this header in bytes + * @tid: JBD2 transaction identifier + * @tag: Ext4 fast-commit tag (or EXT4_FC_BYTELOG_TAG_BATCH) + * @flags: Record flags (EXT4_FC_BYTELOG_*) + * @payload_len:Length of payload bytes following the header + * @payload_crc:CRC32C of the payload + * @record_len: Entire record length including header, payload and padding + * @header_crc: CRC32C of the header with @header_crc zeroed + * @seq: Monotonic sequence number assigned by the ByteLog writer + * @reserved: Future fields, currently zeroed + * + * The structure is padded to 64 bytes to keep each record 64B aligned. + */ +struct ext4_fc_bytelog_hdr { + __le32 magic; + __le16 version; + __le16 hdr_len; + __le32 tid; + __le16 tag; + __le16 flags; + __le32 payload_len; + __le32 payload_crc; + __le32 record_len; + __le32 header_crc; + __le64 seq; + __le64 reserved[3]; +} __packed; + +struct ext4_fc_bytelog_anchor { + u32 tid; + u64 head; + u64 tail; + u64 seq; + u32 crc; +}; + +struct ext4_fc_bytelog { + void *kaddr; + u64 size_bytes; + u64 base_off; + u64 persist_off; + u32 blocksize; + u32 blocks; + u64 start_pblk; + u64 anchor_pblk; + u64 head; + u64 tail; + u64 seq; + u32 ring_crc; + + u32 batch_max; + u16 batch_first_tag; + u32 batch_len; + u32 batch_tlvs; + u32 batch_payload_crc; + + bool mapped; + bool enabled; + bool dirty; + int last_error; +}; + +struct ext4_fc_bytelog_vec { + const void *base; + size_t len; +}; + +int ext4_fc_bytelog_init(struct super_block *sb, struct journal_s *journal= ); +void ext4_fc_bytelog_release(struct super_block *sb); +void ext4_fc_bytelog_reset(struct super_block *sb, bool full); +void ext4_fc_bytelog_begin_commit(struct super_block *sb); +int ext4_fc_bytelog_end_commit(struct super_block *sb); +bool ext4_fc_bytelog_active(struct ext4_sb_info *sbi); +bool ext4_fc_bytelog_mapped(struct ext4_sb_info *sbi); +int ext4_fc_bytelog_append_vec(struct super_block *sb, u16 tag, + struct ext4_fc_bytelog_vec *vecs, int nvec); +void ext4_fc_bytelog_build_anchor(struct super_block *sb, + struct ext4_fc_bytelog_anchor *anchor, + u32 tid); + +static inline bool ext4_fc_bytelog_record_committed(const struct ext4_fc_b= ytelog_hdr *hdr) +{ + return !!(le16_to_cpu(hdr->flags) & EXT4_FC_BYTELOG_COMMITTED); +} + +static inline u32 ext4_fc_bytelog_record_len(const struct ext4_fc_bytelog_= hdr *hdr) +{ + return le32_to_cpu(hdr->record_len); +} + +static inline u32 ext4_fc_bytelog_payload_len(const struct ext4_fc_bytelog= _hdr *hdr) +{ + return le32_to_cpu(hdr->payload_len); +} + +static inline u64 ext4_fc_bytelog_seq(const struct ext4_fc_bytelog_hdr *hd= r) +{ + return le64_to_cpu(hdr->seq); +} + +size_t ext4_fc_bytelog_record_size(size_t payload_len); +void ext4_fc_bytelog_prep_hdr(struct ext4_fc_bytelog_hdr *hdr, u16 tag, + u16 flags, u32 tid, u64 seq, u32 payload_len); +void ext4_fc_bytelog_finalize_hdr_crc(struct ext4_fc_bytelog_hdr *hdr, + u32 payload_crc); +int ext4_fc_bytelog_validate_hdr(const struct ext4_fc_bytelog_hdr *hdr, + size_t remaining, const void *payload); +void ext4_fc_bytelog_mark_committed(struct ext4_fc_bytelog_hdr *hdr); + +void ext4_fc_bytelog_flush_persist(void *addr, size_t len); +void ext4_fc_bytelog_persist_barrier(void); + +u32 ext4_fc_bytelog_crc32(const void *buf, size_t len); + +#endif /* _EXT4_FAST_COMMIT_BYTELOG_H */ --=20 2.52.0 From nobody Tue Apr 7 15:27:36 2026 Received: from sender4-op-o15.zoho.com (sender4-op-o15.zoho.com [136.143.188.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABAE42F7478; Thu, 26 Feb 2026 10:18:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=pass smtp.client-ip=136.143.188.15 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772101093; cv=pass; b=FDTPlnGUa4ILZDJLRstTFq9qpUeebFQdjPu/0u43rjS6tUTGT2cJE3fRuvNgLXqALS7KNJhitwy9mnhW1Ivft5Pw7zEdILcJNDAWA2rwDEW7eGhVtHSleqriRX7W7w7bWhkI5WQffVBoIAfLXA5nsSd78OURE7KxLZ94fIXP04A= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772101093; c=relaxed/simple; bh=FWFSzWyUz0H+83E5yNlUDk3PdKTV8IEvTJrHvmKb6tE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SUDwIoeMjFvOVktf1a7bjShAnaE7xyycMC6A5j/Jy8+U/SvpXN5R50E4TMEAlXQT9uNJd5N5hQH697sum+cVgWKAtpdlEZUfTyokUC8tgulhetuwnFEvJoydDXw5GdkxYCncbnouFNc1UWw8690DbbYRPf5dwxMN+s1/kHzt0ME= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.beauty; spf=pass smtp.mailfrom=linux.beauty; dkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty header.b=i5e07uBO; arc=pass smtp.client-ip=136.143.188.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.beauty Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.beauty Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty header.b="i5e07uBO" ARC-Seal: i=1; a=rsa-sha256; t=1772101083; cv=none; d=zohomail.com; s=zohoarc; b=I4bl4u/otnJPzvYLEuWbBWvXchGxqFdv0owTbR9VByh6R3moL2nvWEEWoMFMmcIDC9Y9U5ZErUThg/YXtmTkg61PFxzN//8a8NaIfIhpf2QjOKO34maVltVhENjJWu19BfWOBftPcSgfuAplJaUZqXhqFIwuwHJQy0+NoFr2NA8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1772101083; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; bh=Qyse1IS9Q6qz5bThx9u3W/bTyOdCiWCl40LcA4VrhHc=; b=W1hrPjKvmb0lzyTaKF2ew6gMTeMvF8iYr0IPkZJuIpSVNVwLLsvmH+ON2dxtCfJWMiBEpV85yQTfAGB7oiRCAPNxwV6AtP9TjXwAU9u9sihN/QODPOyuUA1enRR6rQ85VPEBj3LvMg1KCHUctyYXFViNp31nk071QxorEQ4LNLE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=linux.beauty; spf=pass smtp.mailfrom=me@linux.beauty; dmarc=pass header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1772101083; s=zmail; d=linux.beauty; i=me@linux.beauty; h=From:From:To:To:Cc:Cc:Subject:Subject:Date:Date:Message-ID:In-Reply-To:References:MIME-Version:Content-Transfer-Encoding:Message-Id:Reply-To; bh=Qyse1IS9Q6qz5bThx9u3W/bTyOdCiWCl40LcA4VrhHc=; b=i5e07uBODMV23QJ6rg8H2NXdGkCpgOBGdRB6eSAUMqtVzJcyufG0KpYHf4nbaaUE V1MdKCHbOId+GxdoXRycnplpvri2eQAS07wGpQQ9mcM2vHzHx9gxWWO4kfJTIdnIX3P CxTi8heJMnYoqD/uirIyhUiBvzU0jiNbg2LUBTos= Received: by mx.zohomail.com with SMTPS id 177210108066263.684691457057966; Thu, 26 Feb 2026 02:18:00 -0800 (PST) From: Li Chen To: linux-ext4@vger.kernel.org, "Theodore Ts'o" , Andreas Dilger , linux-kernel@vger.kernel.org Cc: Harshad Shirwadkar , Li Chen Subject: [RFC PATCH 2/4] ext4: add dax_fc_bytelog mount option Date: Thu, 26 Feb 2026 18:17:30 +0800 Message-ID: <20260226101736.2271952-3-me@linux.beauty> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260226101736.2271952-1-me@linux.beauty> References: <20260226101736.2271952-1-me@linux.beauty> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External Content-Type: text/plain; charset="utf-8" Add dax_fc_bytelog=3D{off,on,force} to control the DAX ByteLog fast commit backend. Initialize the ByteLog ring before fast commit replay and release it on unmount. Signed-off-by: Li Chen --- fs/ext4/super.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 76 insertions(+), 1 deletion(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 504148b2142b..3645456a61dd 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1368,6 +1368,7 @@ static void ext4_put_super(struct super_block *sb) sbi->s_ea_block_cache =3D NULL; =20 ext4_stop_mmpd(sbi); + ext4_fc_bytelog_release(sb); =20 brelse(sbi->s_sbh); sb->s_fs_info =3D NULL; @@ -1685,6 +1686,8 @@ enum { Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache, Opt_no_prefetch_block_bitmaps, Opt_mb_optimize_scan, Opt_errors, Opt_data, Opt_data_err, Opt_jqfmt, Opt_dax_type, + Opt_dax_fc_bytelog, Opt_dax_fc_bytelog_off, Opt_dax_fc_bytelog_on, + Opt_dax_fc_bytelog_force, #ifdef CONFIG_EXT4_DEBUG Opt_fc_debug_max_replay, Opt_fc_debug_force #endif @@ -1724,6 +1727,13 @@ static const struct constant_table ext4_param_dax[] = =3D { {} }; =20 +static const struct constant_table ext4_param_dax_fc_bytelog[] =3D { + {"off", Opt_dax_fc_bytelog_off}, + {"on", Opt_dax_fc_bytelog_on}, + {"force", Opt_dax_fc_bytelog_force}, + {} +}; + /* * Mount option specification * We don't use fsparam_flag_no because of the way we set the @@ -1780,6 +1790,8 @@ static const struct fs_parameter_spec ext4_param_spec= s[] =3D { fsparam_flag ("i_version", Opt_removed), fsparam_flag ("dax", Opt_dax), fsparam_enum ("dax", Opt_dax_type, ext4_param_dax), + fsparam_enum("dax_fc_bytelog", Opt_dax_fc_bytelog, + ext4_param_dax_fc_bytelog), fsparam_u32 ("stripe", Opt_stripe), fsparam_flag ("delalloc", Opt_delalloc), fsparam_flag ("nodelalloc", Opt_nodelalloc), @@ -1965,6 +1977,7 @@ ext4_sb_read_encoding(const struct ext4_super_block *= es) #define EXT4_SPEC_s_fc_debug_max_replay (1 << 17) #define EXT4_SPEC_s_sb_block (1 << 18) #define EXT4_SPEC_mb_optimize_scan (1 << 19) +#define EXT4_SPEC_s_dax_fc_bytelog BIT(20) =20 struct ext4_fs_context { char *s_qf_names[EXT4_MAXQUOTAS]; @@ -2370,6 +2383,26 @@ static int ext4_parse_param(struct fs_context *fc, s= truct fs_parameter *param) ext4_msg(NULL, KERN_INFO, "dax option not supported"); return -EINVAL; #endif + case Opt_dax_fc_bytelog: + switch (result.uint_32) { + case Opt_dax_fc_bytelog_off: + ctx_clear_mount_opt2(ctx, EXT4_MOUNT2_DAX_FC_BYTELOG); + ctx_clear_mount_opt2(ctx, + EXT4_MOUNT2_DAX_FC_BYTELOG_FORCE); + break; + case Opt_dax_fc_bytelog_on: + ctx_set_mount_opt2(ctx, EXT4_MOUNT2_DAX_FC_BYTELOG); + ctx_clear_mount_opt2(ctx, + EXT4_MOUNT2_DAX_FC_BYTELOG_FORCE); + break; + case Opt_dax_fc_bytelog_force: + ctx_set_mount_opt2(ctx, EXT4_MOUNT2_DAX_FC_BYTELOG); + ctx_set_mount_opt2(ctx, + EXT4_MOUNT2_DAX_FC_BYTELOG_FORCE); + break; + } + ctx->spec |=3D EXT4_SPEC_s_dax_fc_bytelog; + return 0; case Opt_data_err: if (result.uint_32 =3D=3D Opt_data_err_abort) ctx_set_mount_opt(ctx, m->mount_opt); @@ -2819,7 +2852,22 @@ static int ext4_check_opt_consistency(struct fs_cont= ext *fc, !(sbi->s_mount_opt2 & EXT4_MOUNT2_DAX_INODE))) { goto fail_dax_change_remount; } - } + + if (ctx->spec & EXT4_SPEC_s_dax_fc_bytelog) { + bool new_on =3D ctx_test_mount_opt2(ctx, + EXT4_MOUNT2_DAX_FC_BYTELOG); + bool new_force =3D ctx_test_mount_opt2(ctx, + EXT4_MOUNT2_DAX_FC_BYTELOG_FORCE); + bool cur_on =3D test_opt2(sb, DAX_FC_BYTELOG); + bool cur_force =3D test_opt2(sb, DAX_FC_BYTELOG_FORCE); + + if (new_on !=3D cur_on || new_force !=3D cur_force) { + ext4_msg(NULL, KERN_ERR, + "can't change dax_fc_bytelog mount option while remounting"); + return -EINVAL; + } + } + } =20 return ext4_check_quota_consistency(fc, sb); } @@ -3038,6 +3086,12 @@ static int _ext4_show_options(struct seq_file *seq, = struct super_block *sb, } else if (test_opt2(sb, DAX_INODE)) { SEQ_OPTS_PUTS("dax=3Dinode"); } + if (test_opt2(sb, DAX_FC_BYTELOG)) { + if (test_opt2(sb, DAX_FC_BYTELOG_FORCE)) + SEQ_OPTS_PUTS("dax_fc_bytelog=3Dforce"); + else + SEQ_OPTS_PUTS("dax_fc_bytelog=3Don"); + } =20 if (sbi->s_groups_count >=3D MB_DEFAULT_LINEAR_SCAN_THRESHOLD && !test_opt2(sb, MB_OPTIMIZE_SCAN)) { @@ -4950,6 +5004,8 @@ static int ext4_load_and_init_journal(struct super_bl= ock *sb, "Failed to set fast commit journal feature"); goto out; } + if (test_opt2(sb, JOURNAL_FAST_COMMIT)) + ext4_fc_bytelog_init(sb, sbi->s_journal); =20 /* We have now updated the journal if required, so we can * validate the data journaling mode. */ @@ -6124,10 +6180,29 @@ static int ext4_load_journal(struct super_block *sb, char *save =3D kmalloc(EXT4_S_ERR_LEN, GFP_KERNEL); __le16 orig_state; bool changed =3D false; + int fc_err; =20 if (save) memcpy(save, ((char *) es) + EXT4_S_ERR_START, EXT4_S_ERR_LEN); + + /* + * Map the ByteLog ring before fast-commit replay so that + * EXT4_FC_TAG_DAX_BYTELOG_ANCHOR records can be processed + * during jbd2_journal_load(). + * + * For filesystems with the INCOMPAT_DAX_FC_BYTELOG feature + * bit set, failing to initialize the ByteLog ring must be + * treated as fatal. + */ + if (test_opt2(sb, JOURNAL_FAST_COMMIT)) { + fc_err =3D ext4_fc_bytelog_init(sb, journal); + if (fc_err && ext4_has_feature_dax_fc_bytelog(sb)) { + kfree(save); + err =3D fc_err; + goto err_out; + } + } err =3D jbd2_journal_load(journal); if (save && memcmp(((char *) es) + EXT4_S_ERR_START, save, EXT4_S_ERR_LEN)) { --=20 2.52.0 From nobody Tue Apr 7 15:27:36 2026 Received: from sender4-op-o15.zoho.com (sender4-op-o15.zoho.com [136.143.188.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E29F2F7478; Thu, 26 Feb 2026 10:18:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=pass smtp.client-ip=136.143.188.15 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772101101; cv=pass; b=MXCU6QVeJp9epjAz83KC72lXviRK7PNDXh12iocgf5c/QMaElPaDcgpI7TgSC8QpDrbhv+02YtDZYVzNYre01Wu+1UY/DVj/SWXCaUNg6my3Eg2ZoIs9UFR5xbuh0MgrNhkG7p9/w77ihBnYwdca/MXtgEYGDn74qAzdcObVo60= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772101101; c=relaxed/simple; bh=Tpdnlt6D+PmOM1x/D2Ik/xsWYfMB+lrYAnLHNiRCDUU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=B7bXzHdyUwORTlDIn2RXzORR2MQ/zCwewaBZLua8yYe6fJpNGp1h0e5cFsNfbOwOq8069q9cH8f3JqvEGstxg+fdqsu4pHnHGtv+arzEzzGwO7zgOhUIKf6HbJ33PYRRNH4ligFz0IboCxW7y7n+mnawafwBG94dNix/ki2cRjI= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.beauty; spf=pass smtp.mailfrom=linux.beauty; dkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty header.b=aSqJaR3T; arc=pass smtp.client-ip=136.143.188.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.beauty Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.beauty Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty header.b="aSqJaR3T" ARC-Seal: i=1; a=rsa-sha256; t=1772101087; cv=none; d=zohomail.com; s=zohoarc; b=Ts2Sv0y3YzHfk5iSTn87oBLnz3RRq30f8WqvFBWdAB9UD/05U28jVHJ90WKkVd0B4VqzGxJwds9nTxWXSBcs0us2uk5+Bpt3nTDyYPkZ/5uhMZer+4KIpwEKJoq0Pbv7Jr17Y40HD1f+FrA0z/mroXQG+bOVKfKdGfN+SKhZxIg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1772101087; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; bh=zTPVi78IQtJM26qrgfteYcHS2lMMhOxy0IYfQx5hbYQ=; b=ZJ62HPUWWkUTxCZzHAinp9hYcPMMntgT5O0to9XqbBicBa9qoebD8pN9KXIDgj2R+XQyzv4KU3LilITV0DY9bZwmfMQYlwdKMEHByDk83eD9Nv0VHC4BdEhQ+lbG+/HX+0vFnAldUEJXmHzc9mvFXivrq7q+H06goDCWlm8R7II= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=linux.beauty; spf=pass smtp.mailfrom=me@linux.beauty; dmarc=pass header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1772101087; s=zmail; d=linux.beauty; i=me@linux.beauty; h=From:From:To:To:Cc:Cc:Subject:Subject:Date:Date:Message-ID:In-Reply-To:References:MIME-Version:Content-Transfer-Encoding:Message-Id:Reply-To; bh=zTPVi78IQtJM26qrgfteYcHS2lMMhOxy0IYfQx5hbYQ=; b=aSqJaR3TvhUXd5I1Je2WpZYLbNcTxeM85QW4Ewgf/dG9audABDkVzPnf0Fq6XxjU XblCPc7rFbGJFyhKTzhySHwrrEmPuaFdcMVMUBA9SHWEk//kuNdrH+ANy1yAT338ZA2 AaL6kGBqasgekya2XpqIXcLJ67Q59tg49q6l1Tc8= Received: by mx.zohomail.com with SMTPS id 1772101084249909.7428168439543; Thu, 26 Feb 2026 02:18:04 -0800 (PST) From: Li Chen To: linux-ext4@vger.kernel.org, "Theodore Ts'o" , Andreas Dilger , linux-kernel@vger.kernel.org Cc: Harshad Shirwadkar , Li Chen Subject: [RFC PATCH 3/4] ext4: fast_commit: write TLVs into DAX ByteLog Date: Thu, 26 Feb 2026 18:17:31 +0800 Message-ID: <20260226101736.2271952-4-me@linux.beauty> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260226101736.2271952-1-me@linux.beauty> References: <20260226101736.2271952-1-me@linux.beauty> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External Content-Type: text/plain; charset="utf-8" When dax_fc_bytelog is enabled, write fast commit TLVs directly into the DAX-mapped ByteLog ring. Keep traditional TLV writes confined to the reserved FC block and emit an anchor TLV to describe the ByteLog window. Signed-off-by: Li Chen --- fs/ext4/fast_commit.c | 124 +++++++++++++++++++++++++++++++++- fs/ext4/fast_commit.h | 13 ++++ fs/ext4/fast_commit_bytelog.c | 20 ++++++ fs/ext4/fast_commit_bytelog.h | 5 ++ 4 files changed, 159 insertions(+), 3 deletions(-) diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c index 64c0c4ba58b0..2f7b7ea29df2 100644 --- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -723,6 +723,12 @@ static u8 *ext4_fc_reserve_space(struct super_block *s= b, int len, u32 *crc) * leaving enough space for a PAD tlv. */ remaining =3D bsize - EXT4_FC_TAG_BASE_LEN - off; + if (ext4_fc_bytelog_active(sbi) && len > remaining) { + ext4_fc_mark_ineligible(sb, + EXT4_FC_REASON_BYTELOG_TLV_OVERFLOW, + NULL); + return NULL; + } if (len <=3D remaining) { sbi->s_fc_bytes +=3D len; return dst; @@ -806,6 +812,31 @@ static bool ext4_fc_add_tlv(struct super_block *sb, u1= 6 tag, u16 len, u8 *val, struct ext4_fc_tl tl; u8 *dst; =20 + if (ext4_fc_bytelog_active(EXT4_SB(sb)) && + (tag =3D=3D EXT4_FC_TAG_ADD_RANGE || tag =3D=3D EXT4_FC_TAG_DEL_RANGE= || + tag =3D=3D EXT4_FC_TAG_LINK || tag =3D=3D EXT4_FC_TAG_UNLINK || + tag =3D=3D EXT4_FC_TAG_CREAT || tag =3D=3D EXT4_FC_TAG_INODE)) { + struct ext4_fc_bytelog_vec vecs[2]; + int ret; + + tl.fc_tag =3D cpu_to_le16(tag); + tl.fc_len =3D cpu_to_le16(len); + vecs[0].base =3D &tl; + vecs[0].len =3D sizeof(tl); + vecs[1].base =3D val; + vecs[1].len =3D len; + + ret =3D ext4_fc_bytelog_append_vec(sb, tag, vecs, + ARRAY_SIZE(vecs)); + if (!ret) + return true; + if (ret =3D=3D -ENOSPC) + ext4_fc_mark_ineligible(sb, + EXT4_FC_REASON_BYTELOG_TLV_OVERFLOW, + NULL); + return false; + } + dst =3D ext4_fc_reserve_space(sb, EXT4_FC_TAG_BASE_LEN + len, crc); if (!dst) return false; @@ -819,6 +850,17 @@ static bool ext4_fc_add_tlv(struct super_block *sb, u1= 6 tag, u16 len, u8 *val, return true; } =20 +static bool ext4_fc_add_bytelog_anchor_tlv(struct super_block *sb, + struct ext4_fc_bytelog_anchor *anchor, + u32 *crc) +{ + struct ext4_fc_bytelog_entry entry; + + ext4_fc_bytelog_anchor_to_disk(&entry, anchor); + return ext4_fc_add_tlv(sb, EXT4_FC_TAG_DAX_BYTELOG_ANCHOR, + sizeof(entry), (u8 *)&entry, crc); +} + /* Same as above, but adds dentry tlv. */ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc, struct ext4_fc_dentry_update *fc_dentry) @@ -826,9 +868,40 @@ static bool ext4_fc_add_dentry_tlv(struct super_block = *sb, u32 *crc, struct ext4_fc_dentry_info fcd; struct ext4_fc_tl tl; int dlen =3D fc_dentry->fcd_name.name.len; - u8 *dst =3D ext4_fc_reserve_space(sb, - EXT4_FC_TAG_BASE_LEN + sizeof(fcd) + dlen, crc); + u8 *dst; + + if (ext4_fc_bytelog_active(EXT4_SB(sb)) && + (fc_dentry->fcd_op =3D=3D EXT4_FC_TAG_LINK || + fc_dentry->fcd_op =3D=3D EXT4_FC_TAG_UNLINK || + fc_dentry->fcd_op =3D=3D EXT4_FC_TAG_CREAT)) { + struct ext4_fc_bytelog_vec vecs[3]; + int ret; + + fcd.fc_parent_ino =3D cpu_to_le32(fc_dentry->fcd_parent); + fcd.fc_ino =3D cpu_to_le32(fc_dentry->fcd_ino); + tl.fc_tag =3D cpu_to_le16(fc_dentry->fcd_op); + tl.fc_len =3D cpu_to_le16(sizeof(fcd) + dlen); + + vecs[0].base =3D &tl; + vecs[0].len =3D sizeof(tl); + vecs[1].base =3D &fcd; + vecs[1].len =3D sizeof(fcd); + vecs[2].base =3D fc_dentry->fcd_name.name.name; + vecs[2].len =3D dlen; + + ret =3D ext4_fc_bytelog_append_vec(sb, fc_dentry->fcd_op, vecs, + ARRAY_SIZE(vecs)); + if (!ret) + return true; + if (ret =3D=3D -ENOSPC) + ext4_fc_mark_ineligible(sb, + EXT4_FC_REASON_BYTELOG_TLV_OVERFLOW, + NULL); + return false; + } =20 + dst =3D ext4_fc_reserve_space(sb, EXT4_FC_TAG_BASE_LEN + sizeof(fcd) + + dlen, crc); if (!dst) return false; =20 @@ -872,6 +945,25 @@ static int ext4_fc_write_inode(struct inode *inode, u3= 2 *crc) tl.fc_tag =3D cpu_to_le16(EXT4_FC_TAG_INODE); tl.fc_len =3D cpu_to_le16(inode_len + sizeof(fc_inode.fc_ino)); =20 + if (ext4_fc_bytelog_active(EXT4_SB(inode->i_sb))) { + struct ext4_fc_bytelog_vec vecs[3]; + + vecs[0].base =3D &tl; + vecs[0].len =3D sizeof(tl); + vecs[1].base =3D &fc_inode.fc_ino; + vecs[1].len =3D sizeof(fc_inode.fc_ino); + vecs[2].base =3D ext4_raw_inode(&iloc); + vecs[2].len =3D inode_len; + + ret =3D ext4_fc_bytelog_append_vec(inode->i_sb, EXT4_FC_TAG_INODE, + vecs, ARRAY_SIZE(vecs)); + if (ret =3D=3D -ENOSPC) + ext4_fc_mark_ineligible(inode->i_sb, + EXT4_FC_REASON_BYTELOG_TLV_OVERFLOW, + NULL); + goto err; + } + ret =3D -ECANCELED; dst =3D ext4_fc_reserve_space(inode->i_sb, EXT4_FC_TAG_BASE_LEN + inode_len + sizeof(fc_inode.fc_ino), crc); @@ -1147,6 +1239,8 @@ static int ext4_fc_perform_commit(journal_t *journal) } =20 /* Step 6.2: Now write all the dentry updates. */ + if (ext4_fc_bytelog_active(sbi)) + ext4_fc_bytelog_begin_commit(sb); ret =3D ext4_fc_commit_dentry_updates(journal, &crc); if (ret) goto out; @@ -1164,6 +1258,22 @@ static int ext4_fc_perform_commit(journal_t *journal) if (ret) goto out; } + + if (ext4_fc_bytelog_active(sbi)) { + struct ext4_fc_bytelog_anchor anchor; + + ret =3D ext4_fc_bytelog_end_commit(sb); + if (ret) + goto out; + if (sbi->s_fc_bytelog.seq) { + ext4_fc_bytelog_build_anchor(sb, &anchor, + sbi->s_journal->j_running_transaction->t_tid); + if (!ext4_fc_add_bytelog_anchor_tlv(sb, &anchor, &crc)) { + ret =3D -ENOSPC; + goto out; + } + } + } /* Step 6.4: Finally write tail tag to conclude this fast commit. */ ret =3D ext4_fc_write_tail(sb, crc); =20 @@ -1262,6 +1372,12 @@ int ext4_fc_commit(journal_t *journal, tid_t commit_= tid) else journal_ioprio =3D EXT4_DEF_JOURNAL_IOPRIO; set_task_ioprio(current, journal_ioprio); + + if (ext4_fc_bytelog_active(sbi)) { + journal->j_fc_off =3D 0; + sbi->s_fc_bytes =3D 0; + } + fc_bufs_before =3D (sbi->s_fc_bytes + bsize - 1) / bsize; ret =3D ext4_fc_perform_commit(journal); if (ret < 0) { @@ -1367,8 +1483,9 @@ static void ext4_fc_cleanup(journal_t *journal, int f= ull, tid_t tid) ext4_clear_mount_flag(sb, EXT4_MF_FC_INELIGIBLE); } =20 - if (full) + if (full || ext4_fc_bytelog_active(sbi)) sbi->s_fc_bytes =3D 0; + ext4_fc_bytelog_reset(sb, full); ext4_fc_unlock(sb, alloc_ctx); trace_ext4_fc_stats(sb); } @@ -2315,6 +2432,7 @@ static const char * const fc_ineligible_reasons[] =3D= { [EXT4_FC_REASON_FALLOC_RANGE] =3D "Falloc range op", [EXT4_FC_REASON_INODE_JOURNAL_DATA] =3D "Data journalling", [EXT4_FC_REASON_ENCRYPTED_FILENAME] =3D "Encrypted filename", + [EXT4_FC_REASON_BYTELOG_TLV_OVERFLOW] =3D "ByteLog TLV overflow", [EXT4_FC_REASON_MIGRATE] =3D "Inode format migration", [EXT4_FC_REASON_VERITY] =3D "fs-verity enable", [EXT4_FC_REASON_MOVE_EXT] =3D "Move extents", diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h index 2f77a37fb101..fb51e19b9778 100644 --- a/fs/ext4/fast_commit.h +++ b/fs/ext4/fast_commit.h @@ -18,6 +18,7 @@ #define EXT4_FC_TAG_PAD 0x0007 #define EXT4_FC_TAG_TAIL 0x0008 #define EXT4_FC_TAG_HEAD 0x0009 +#define EXT4_FC_TAG_DAX_BYTELOG_ANCHOR 0x000a =20 #define EXT4_FC_SUPPORTED_FEATURES 0x0 =20 @@ -70,6 +71,15 @@ struct ext4_fc_tail { __le32 fc_crc; }; =20 +/* Value structure for tag EXT4_FC_TAG_DAX_BYTELOG_ANCHOR. */ +struct ext4_fc_bytelog_entry { + __le32 fc_tid; + __le64 fc_head; + __le64 fc_tail; + __le64 fc_seq; + __le32 fc_crc; +}; + /* Tag base length */ #define EXT4_FC_TAG_BASE_LEN (sizeof(struct ext4_fc_tl)) =20 @@ -97,6 +107,7 @@ enum { EXT4_FC_REASON_FALLOC_RANGE, EXT4_FC_REASON_INODE_JOURNAL_DATA, EXT4_FC_REASON_ENCRYPTED_FILENAME, + EXT4_FC_REASON_BYTELOG_TLV_OVERFLOW, EXT4_FC_REASON_MIGRATE, EXT4_FC_REASON_VERITY, EXT4_FC_REASON_MOVE_EXT, @@ -181,6 +192,8 @@ static inline const char *tag2str(__u16 tag) return "TAIL"; case EXT4_FC_TAG_HEAD: return "HEAD"; + case EXT4_FC_TAG_DAX_BYTELOG_ANCHOR: + return "BYTELOG_ANCHOR"; default: return "ERROR"; } diff --git a/fs/ext4/fast_commit_bytelog.c b/fs/ext4/fast_commit_bytelog.c index 64ba3edddbcb..77ac1d9ef031 100644 --- a/fs/ext4/fast_commit_bytelog.c +++ b/fs/ext4/fast_commit_bytelog.c @@ -455,6 +455,26 @@ void ext4_fc_bytelog_release(struct super_block *sb) memset(&sbi->s_fc_bytelog, 0, sizeof(sbi->s_fc_bytelog)); } =20 +void ext4_fc_bytelog_anchor_to_disk(struct ext4_fc_bytelog_entry *dst, + const struct ext4_fc_bytelog_anchor *src) +{ + dst->fc_tid =3D cpu_to_le32(src->tid); + dst->fc_head =3D cpu_to_le64(src->head); + dst->fc_tail =3D cpu_to_le64(src->tail); + dst->fc_seq =3D cpu_to_le64(src->seq); + dst->fc_crc =3D cpu_to_le32(src->crc); +} + +void ext4_fc_bytelog_anchor_from_disk(struct ext4_fc_bytelog_anchor *dst, + const struct ext4_fc_bytelog_entry *src) +{ + dst->tid =3D le32_to_cpu(src->fc_tid); + dst->head =3D le64_to_cpu(src->fc_head); + dst->tail =3D le64_to_cpu(src->fc_tail); + dst->seq =3D le64_to_cpu(src->fc_seq); + dst->crc =3D le32_to_cpu(src->fc_crc); +} + void ext4_fc_bytelog_reset(struct super_block *sb, bool full) { struct ext4_fc_bytelog *log =3D &EXT4_SB(sb)->s_fc_bytelog; diff --git a/fs/ext4/fast_commit_bytelog.h b/fs/ext4/fast_commit_bytelog.h index d52754890222..d3e5b734a02e 100644 --- a/fs/ext4/fast_commit_bytelog.h +++ b/fs/ext4/fast_commit_bytelog.h @@ -9,6 +9,7 @@ struct super_block; struct journal_s; struct ext4_sb_info; +struct ext4_fc_bytelog_entry; =20 #define EXT4_FC_BYTELOG_MAGIC 0x4c424346 /* "FCBL" */ #define EXT4_FC_BYTELOG_VERSION 1 @@ -109,6 +110,10 @@ int ext4_fc_bytelog_append_vec(struct super_block *sb,= u16 tag, void ext4_fc_bytelog_build_anchor(struct super_block *sb, struct ext4_fc_bytelog_anchor *anchor, u32 tid); +void ext4_fc_bytelog_anchor_to_disk(struct ext4_fc_bytelog_entry *dst, + const struct ext4_fc_bytelog_anchor *src); +void ext4_fc_bytelog_anchor_from_disk(struct ext4_fc_bytelog_anchor *dst, + const struct ext4_fc_bytelog_entry *src); =20 static inline bool ext4_fc_bytelog_record_committed(const struct ext4_fc_b= ytelog_hdr *hdr) { --=20 2.52.0 From nobody Tue Apr 7 15:27:36 2026 Received: from sender4-op-o15.zoho.com (sender4-op-o15.zoho.com [136.143.188.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 076B83A7834; Thu, 26 Feb 2026 10:18:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=pass smtp.client-ip=136.143.188.15 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772101111; cv=pass; b=fZxu3OoACjfqhp1ZcQCbGM1WMENVWP1nj3D7AdhiUzD4QdpYeTNaWpRKRZHNd0//NPBWeMW+uB6fEUOzkIqXdLX6htUE+QneY56pB8F3IasoFTo2LQBi1pDeh8rSYHsN0LtiHaa1rBTOHcX7vdqOjV08jlds0xov3GDcQsxMPN0= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772101111; c=relaxed/simple; bh=LR1A1sVj98yRHfklQsoSfgF7AoPXlGjwhfJsflSC34I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=S97yrcFI4ezFlfC/mXv+y+5y1f8BlbRDOLno7mynAqxV4s+agzTYjc+3H5U+b7ro0cNuJbY+YBWY8bLZF7/wx707qf+WMNcMhKQheKR9/vny+my2WyB03ZqH2avpu2OEDDccGScDw5ytcNweiYwz2eqspJk59/ts/548pJNUTMI= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.beauty; spf=pass smtp.mailfrom=linux.beauty; dkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty header.b=iVY8uOmi; arc=pass smtp.client-ip=136.143.188.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.beauty Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.beauty Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty header.b="iVY8uOmi" ARC-Seal: i=1; a=rsa-sha256; t=1772101089; cv=none; d=zohomail.com; s=zohoarc; b=jFVEpSO/i/5PiaHgsJnD9wwCwbu23PJe+Tb/1QI9hve0XJYc77h9v5lQA0GRj0vTQfkIo1EORzKKpo2ccoVNZbxcZMZG5cbTUQrJECySjGQYQgpqacxKQ6R8hQDalRVHIZkTcG6QSq9FFEvnDum9o85RIeAVu42LLFiXVTVMjmc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1772101089; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; bh=+TO5qO2m+VCH2iEuDsLwTaboZDhB3f74KLvsUyaJqU4=; b=IW0y+wG5bS6M6SQonAZKIOO+0hQISqFnuW6cm1cbTigR604fAONPADLriaj35fdWN0f9VIQjISeCs/c9OrxdK2jjjugroDC2N/WToowMjQLFZboBWHGwQyiK/5gVR3rKXcUvR6iQZnN7IIr0dBkqJvVEgVhq741/pIaX2zEB91w= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=linux.beauty; spf=pass smtp.mailfrom=me@linux.beauty; dmarc=pass header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1772101089; s=zmail; d=linux.beauty; i=me@linux.beauty; h=From:From:To:To:Cc:Cc:Subject:Subject:Date:Date:Message-ID:In-Reply-To:References:MIME-Version:Content-Transfer-Encoding:Message-Id:Reply-To; bh=+TO5qO2m+VCH2iEuDsLwTaboZDhB3f74KLvsUyaJqU4=; b=iVY8uOmi6jFlIdvngn7gIkFMNX/F5JuAKqVApt6/CijQadlD8u9zfQSyjdSeQV7C b5nPRSLAWDxmwGyq742kJ9YD4UbZbk031+Gundb5XHla+mVI6Iy3C/W/ljcxzfaXYkB q9A5nl4Cv4/56RU3Xf42QM65dUkptYPK3PV3cv/k= Received: by mx.zohomail.com with SMTPS id 1772101088242426.3470871388149; Thu, 26 Feb 2026 02:18:08 -0800 (PST) From: Li Chen To: linux-ext4@vger.kernel.org, "Theodore Ts'o" , Andreas Dilger , linux-kernel@vger.kernel.org Cc: Harshad Shirwadkar , Li Chen Subject: [RFC PATCH 4/4] ext4: fast_commit: replay DAX ByteLog records Date: Thu, 26 Feb 2026 18:17:32 +0800 Message-ID: <20260226101736.2271952-5-me@linux.beauty> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260226101736.2271952-1-me@linux.beauty> References: <20260226101736.2271952-1-me@linux.beauty> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External Content-Type: text/plain; charset="utf-8" Add replay support for EXT4_FC_TAG_DAX_BYTELOG_ANCHOR. The anchor TLV describes a ByteLog window in the DAX-mapped fast commit area, which is validated and then replayed using existing TLV handlers. Signed-off-by: Li Chen --- fs/ext4/fast_commit.c | 246 ++++++++++++++++++++++++++++++++++++++++++ fs/ext4/fast_commit.h | 9 ++ 2 files changed, 255 insertions(+) diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c index 2f7b7ea29df2..6370505ecc86 100644 --- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -12,6 +12,7 @@ #include "ext4_extents.h" #include "mballoc.h" =20 +#include #include /* * Ext4 Fast Commits @@ -2172,10 +2173,228 @@ static bool ext4_fc_value_len_isvalid(struct ext4_= sb_info *sbi, return len >=3D sizeof(struct ext4_fc_tail); case EXT4_FC_TAG_HEAD: return len =3D=3D sizeof(struct ext4_fc_head); + case EXT4_FC_TAG_DAX_BYTELOG_ANCHOR: + return len =3D=3D sizeof(struct ext4_fc_bytelog_entry); } return false; } =20 +static void ext4_fc_reset_bytelog_state(struct ext4_fc_bytelog_state *stat= e) +{ + state->cursor =3D 0; + state->next_seq =3D 0; + state->ring_crc =3D ~0U; + state->initialized =3D false; +} + +typedef int (*ext4_fc_bytelog_cb_t)(struct super_block *sb, + struct ext4_fc_tl_mem *tl, + u8 *val, void *data); + +static int ext4_fc_bytelog_iterate(struct super_block *sb, + struct ext4_fc_bytelog_state *iter, + const struct ext4_fc_bytelog_anchor *anchor, + ext4_fc_bytelog_cb_t fn, void *data) +{ + struct ext4_sb_info *sbi =3D EXT4_SB(sb); + struct ext4_fc_bytelog *log =3D &sbi->s_fc_bytelog; + u8 *base =3D log->kaddr; + u64 cursor, end; + int ret; + + if (!log->mapped || !base) + return -EOPNOTSUPP; + if (anchor->head > log->size_bytes) + return -EFSCORRUPTED; + + iter->cursor =3D anchor->tail; + iter->next_seq =3D 0; + iter->ring_crc =3D ~0U; + iter->initialized =3D true; + cursor =3D iter->cursor; + end =3D anchor->head; + + if (cursor < log->base_off) + return -EFSCORRUPTED; + if (cursor > end || cursor > log->size_bytes) + return -EFSCORRUPTED; + + while (cursor < end) { + struct ext4_fc_bytelog_hdr *hdr; + size_t remaining; + u32 payload_len, record_len; + u16 record_tag; + u8 *payload; + struct ext4_fc_tl_mem tl; + + if (end - cursor > SIZE_MAX) + return -E2BIG; + remaining =3D end - cursor; + if (cursor > log->size_bytes - sizeof(*hdr)) + return -EFSCORRUPTED; + + hdr =3D (struct ext4_fc_bytelog_hdr *)(base + cursor); + payload =3D (u8 *)hdr + sizeof(*hdr); + ret =3D ext4_fc_bytelog_validate_hdr(hdr, remaining, payload); + if (ret) + return ret; + if (!ext4_fc_bytelog_record_committed(hdr)) + return -EUCLEAN; + if (ext4_fc_bytelog_seq(hdr) !=3D iter->next_seq) + return -EUCLEAN; + + payload_len =3D ext4_fc_bytelog_payload_len(hdr); + if (payload_len < EXT4_FC_TAG_BASE_LEN) + return -EFSCORRUPTED; + + record_tag =3D le16_to_cpu(hdr->tag); + if (record_tag =3D=3D EXT4_FC_BYTELOG_TAG_BATCH) { + u32 pos =3D 0; + + while (pos < payload_len) { + u32 value_len; + + if (payload_len - pos < EXT4_FC_TAG_BASE_LEN) + return -EFSCORRUPTED; + + ext4_fc_get_tl(&tl, payload + pos); + value_len =3D tl.fc_len; + if (value_len > + payload_len - pos - EXT4_FC_TAG_BASE_LEN) + return -EFSCORRUPTED; + if (!ext4_fc_value_len_isvalid(sbi, tl.fc_tag, + tl.fc_len)) + return -EFSCORRUPTED; + if (fn) { + ret =3D fn(sb, &tl, + payload + pos + + EXT4_FC_TAG_BASE_LEN, + data); + if (ret) + return ret; + } + pos +=3D EXT4_FC_TAG_BASE_LEN + value_len; + } + } else { + u32 value_len; + + ext4_fc_get_tl(&tl, payload); + value_len =3D payload_len - EXT4_FC_TAG_BASE_LEN; + if (tl.fc_len !=3D value_len) + return -EFSCORRUPTED; + if (record_tag !=3D tl.fc_tag) + return -EFSCORRUPTED; + if (!ext4_fc_value_len_isvalid(sbi, tl.fc_tag, tl.fc_len)) + return -EFSCORRUPTED; + if (fn) { + ret =3D fn(sb, &tl, + payload + EXT4_FC_TAG_BASE_LEN, + data); + if (ret) + return ret; + } + } + + iter->ring_crc =3D crc32c(iter->ring_crc, payload, payload_len); + record_len =3D ext4_fc_bytelog_record_len(hdr); + cursor +=3D record_len; + iter->next_seq++; + } + + if (cursor !=3D end) + return -EFSCORRUPTED; + iter->cursor =3D cursor; + if (iter->next_seq !=3D anchor->seq) + return -EUCLEAN; + if (iter->ring_crc !=3D anchor->crc) + return -EFSBADCRC; + return 0; +} + +static int ext4_fc_bytelog_scan_cb(struct super_block *sb, + struct ext4_fc_tl_mem *tl, u8 *val, + void *data) +{ + struct ext4_fc_add_range ext; + struct ext4_extent *ex; + + (void)data; + switch (tl->fc_tag) { + case EXT4_FC_TAG_ADD_RANGE: + memcpy(&ext, val, sizeof(ext)); + ex =3D (struct ext4_extent *)&ext.fc_ex; + return ext4_fc_record_regions(sb, le32_to_cpu(ext.fc_ino), + le32_to_cpu(ex->ee_block), + ext4_ext_pblock(ex), + ext4_ext_get_actual_len(ex), 0); + case EXT4_FC_TAG_DEL_RANGE: + case EXT4_FC_TAG_LINK: + case EXT4_FC_TAG_UNLINK: + case EXT4_FC_TAG_CREAT: + case EXT4_FC_TAG_INODE: + return 0; + default: + return -EOPNOTSUPP; + } +} + +static int ext4_fc_bytelog_replay_cb(struct super_block *sb, + struct ext4_fc_tl_mem *tl, u8 *val, + void *data) +{ + (void)data; + switch (tl->fc_tag) { + case EXT4_FC_TAG_LINK: + return ext4_fc_replay_link(sb, tl, val); + case EXT4_FC_TAG_UNLINK: + return ext4_fc_replay_unlink(sb, tl, val); + case EXT4_FC_TAG_ADD_RANGE: + return ext4_fc_replay_add_range(sb, tl, val); + case EXT4_FC_TAG_CREAT: + return ext4_fc_replay_create(sb, tl, val); + case EXT4_FC_TAG_DEL_RANGE: + return ext4_fc_replay_del_range(sb, tl, val); + case EXT4_FC_TAG_INODE: + return ext4_fc_replay_inode(sb, tl, val); + default: + return -EOPNOTSUPP; + } +} + +static int ext4_fc_replay_scan_bytelog(struct super_block *sb, + struct ext4_fc_replay_state *state, + const struct ext4_fc_bytelog_anchor *anchor) +{ + int ret; + + ret =3D ext4_fc_bytelog_iterate(sb, &state->fc_bytelog_scan, anchor, + ext4_fc_bytelog_scan_cb, state); + if (ret) + return ret; + return JBD2_FC_REPLAY_CONTINUE; +} + +static int ext4_fc_replay_apply_bytelog(struct super_block *sb, + struct ext4_fc_replay_state *state, + const struct ext4_fc_bytelog_anchor *anchor) +{ + return ext4_fc_bytelog_iterate(sb, &state->fc_bytelog_replay, anchor, + ext4_fc_bytelog_replay_cb, NULL); +} + +static int ext4_fc_replay_bytelog_anchor(struct super_block *sb, + struct ext4_fc_replay_state *state, + struct ext4_fc_tl_mem *tl, u8 *val) +{ + struct ext4_fc_bytelog_entry entry; + struct ext4_fc_bytelog_anchor anchor; + + (void)tl; + memcpy(&entry, val, sizeof(entry)); + ext4_fc_bytelog_anchor_from_disk(&anchor, &entry); + return ext4_fc_replay_apply_bytelog(sb, state, &anchor); +} + /* * Recovery Scan phase handler * @@ -2206,6 +2425,8 @@ static int ext4_fc_replay_scan(journal_t *journal, struct ext4_fc_tail tail; __u8 *start, *end, *cur, *val; struct ext4_fc_head head; + struct ext4_fc_bytelog_entry entry; + struct ext4_fc_bytelog_anchor anchor; struct ext4_extent *ex; =20 state =3D &sbi->s_fc_replay_state; @@ -2220,6 +2441,8 @@ static int ext4_fc_replay_scan(journal_t *journal, state->fc_regions =3D NULL; state->fc_regions_valid =3D state->fc_regions_used =3D state->fc_regions_size =3D 0; + ext4_fc_reset_bytelog_state(&state->fc_bytelog_scan); + ext4_fc_reset_bytelog_state(&state->fc_bytelog_replay); /* Check if we can stop early */ if (le16_to_cpu(((struct ext4_fc_tl *)start)->fc_tag) !=3D EXT4_FC_TAG_HEAD) @@ -2278,6 +2501,9 @@ static int ext4_fc_replay_scan(journal_t *journal, state->fc_replay_num_tags =3D state->fc_cur_tag; state->fc_regions_valid =3D state->fc_regions_used; + if (ext4_fc_bytelog_active(sbi) || + state->fc_bytelog_scan.initialized) + ret =3D JBD2_FC_REPLAY_STOP; } else { ret =3D state->fc_replay_num_tags ? JBD2_FC_REPLAY_STOP : -EFSBADCRC; @@ -2299,6 +2525,15 @@ static int ext4_fc_replay_scan(journal_t *journal, state->fc_crc =3D ext4_chksum(state->fc_crc, cur, EXT4_FC_TAG_BASE_LEN + tl.fc_len); break; + case EXT4_FC_TAG_DAX_BYTELOG_ANCHOR: + state->fc_cur_tag++; + state->fc_crc =3D ext4_chksum(state->fc_crc, cur, + EXT4_FC_TAG_BASE_LEN + + tl.fc_len); + memcpy(&entry, val, sizeof(entry)); + ext4_fc_bytelog_anchor_from_disk(&anchor, &entry); + ret =3D ext4_fc_replay_scan_bytelog(sb, state, &anchor); + break; default: ret =3D state->fc_replay_num_tags ? JBD2_FC_REPLAY_STOP : -ECANCELED; @@ -2335,6 +2570,8 @@ static int ext4_fc_replay(journal_t *journal, struct = buffer_head *bh, if (state->fc_current_pass !=3D pass) { state->fc_current_pass =3D pass; sbi->s_mount_state |=3D EXT4_FC_REPLAY; + if (pass =3D=3D PASS_REPLAY) + ext4_fc_reset_bytelog_state(&state->fc_bytelog_replay); } if (!sbi->s_fc_replay_state.fc_replay_num_tags) { ext4_debug("Replay stops\n"); @@ -2393,9 +2630,18 @@ static int ext4_fc_replay(journal_t *journal, struct= buffer_head *bh, 0, tl.fc_len, 0); memcpy(&tail, val, sizeof(tail)); WARN_ON(le32_to_cpu(tail.fc_tid) !=3D expected_tid); + if ((ext4_fc_bytelog_active(sbi) || + state->fc_bytelog_scan.initialized) && + state->fc_replay_num_tags =3D=3D 0) { + ext4_fc_set_bitmaps_and_counters(sb); + return JBD2_FC_REPLAY_STOP; + } break; case EXT4_FC_TAG_HEAD: break; + case EXT4_FC_TAG_DAX_BYTELOG_ANCHOR: + ret =3D ext4_fc_replay_bytelog_anchor(sb, state, &tl, val); + break; default: trace_ext4_fc_replay(sb, tl.fc_tag, 0, tl.fc_len, 0); ret =3D -ECANCELED; diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h index fb51e19b9778..224d718150c4 100644 --- a/fs/ext4/fast_commit.h +++ b/fs/ext4/fast_commit.h @@ -153,6 +153,13 @@ struct ext4_fc_alloc_region { int ino, len; }; =20 +struct ext4_fc_bytelog_state { + u64 cursor; + u64 next_seq; + u32 ring_crc; + bool initialized; +}; + /* * Fast commit replay state. */ @@ -166,6 +173,8 @@ struct ext4_fc_replay_state { int fc_regions_size, fc_regions_used, fc_regions_valid; int *fc_modified_inodes; int fc_modified_inodes_used, fc_modified_inodes_size; + struct ext4_fc_bytelog_state fc_bytelog_scan; + struct ext4_fc_bytelog_state fc_bytelog_replay; }; =20 #define region_last(__region) (((__region)->lblk) + ((__region)->len) - 1) --=20 2.52.0