From nobody Fri Apr 3 14:47:21 2026 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35A2B3D6495 for ; Tue, 24 Mar 2026 09:19:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774344004; cv=none; b=TvMIRb+mX/yAFckiyAeABVr8Y3c/JsYv7VtEQl0dJcC/kjhG54/iqcedCfLCG6uCacmvg4yoUyyun5d9rwRGuwQYS9bfHkl5kKrjWTInh+5T/nBZYHOSE/yykUX98JF1KxLYeNDeFqv43tl4vZxAPkQK9ztzUzyHD9VQtPgEzKU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774344004; c=relaxed/simple; bh=p3bdazt4F12OhT4yNttbPfhMOLrrGmAvbvjrJLyH/nA=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=ZgK2/FgQZZYRB/+KN2iA8mGLvIz3apYMIXdZg2VqM+lF92x0ISTecSfRO4Z7cHExy8jKk/aP96Jm1R6AMMowHLRETH2uRM65yKKHy2gQhaHgDRS+VHJo3Yj5Wac7I6gnRlf13zJwYiBrf34gvp34/8gq4QRII4KKMFAu3IrAMZA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=q+uBdITy; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="q+uBdITy" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2adbfab4501so16003305ad.2 for ; Tue, 24 Mar 2026 02:19:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774343996; x=1774948796; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=R1UeBCmm5qx7DRZKH+TRJDv3MSAv2BY6DtyQVb7uE60=; b=q+uBdITyZsmBbSXJ6ge61bCgQkfjr9cHbvQ3VHfuYb22va9y9R/Vq4A0j0Ernulnkv Lj6VvckY97LzrpBGU+hMxo3GR8ZzJexdi9mo4t2coiGyHdxPtffMcbxhqY178wCHHlwi eBn5hGcQ7DWxX3lnZM41cbE+3DIjqhjNv+TKoHlyv3qob4Q+PvRwm8f2kUzUUouZ+rwk HLcIdbudlmw4RPwk31HXSRFMd255vKqZMEyxGKzcQZZ0+eNsdPg9BY46fNODrXWZ3io4 7ORnNUlqbMwG10XhGjgHlIoFq5W7TMU5Tn3hvu7ZxdQbcJzFskp01n/qV2a1OdzArWxH Zhbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774343996; x=1774948796; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=R1UeBCmm5qx7DRZKH+TRJDv3MSAv2BY6DtyQVb7uE60=; b=BaF8h9nGBTJMinFAESCez2Gm6WpUExQnpgsp2uOgZbbO2vAQmGBJRM38GaDNbfAfC1 hX+5xuqAEmNpW0pXIGYKv5/8T/L+foXmExbwcDPiSQb8d5v9KVONCtELA5H6dsSFUgZq J7ypPeEzvKkEQoR+s4ATud8Sy7wqC0+roog4sJZK+9k+W19flNVp4zSsU9VNkiuVhrP2 9ga8Q9b5/n8sYHS7GTWaeU5ph38Clm2D4HClhb1AxWtkVxFynEXy0mXPDXMZXMeJ8Yx/ OuOpqKvipP9ZAPW7kWcRyaIvMr1PFPwyqif5JmxG+dn6QAdQ+2OE2HBsRHLr/yQAsEG8 49DA== X-Forwarded-Encrypted: i=1; AJvYcCVj/od9oqjodFUeUnaB/ByEu7w9uSBEzIu+qUNP3TN1S6Zwlci9T5+rBWxZcuyQXZVE0vOltmn9QbUFrNU=@vger.kernel.org X-Gm-Message-State: AOJu0Yzuc9fVkHRZiKxPqj7YWBvB12HIrb8NYYCoBIuzbKMsZo+HRKOb 7Td0/aYvXUVDZu/WykfbC6XOwWKYfV/1Mr2ZFHb8roTM5CNNbwAqygpg X-Gm-Gg: ATEYQzyRXVeJrCb6Lj4wVfPKrv025RbpCHnS3HCvWE33v0a1DGZUZPbe/y8SZAi1zVh NF+E/L6+AfsMCK91aROSmHy+1jtfdrhq7ZsuxwIXMh4TafYX87B8E6SCoxDftx6ZSttmQ4XXdlH mjnRhLYK6B9LevaLm+xRuiPSkuh6Qp5Zs1NySU0hoPodwiFrlWvyJFN7noEdpIyEk0NEVtWUl3z SELHldzBnz3LAerS+IBI7JOS8Le5a9DOgPzwHlzdafxgCv8H5z19Dp7jY2hbYyres9appMRkVin gLqMc+N0dmw/HnggObnKp4Py2mydXS5Gl+sZaorUioN7txFJxL6CxjlemA08AFbgbXlI/sZ2964 JrXkpi/LXPwVCiSBoBTp1BVQvx4j++cxhWJWQt9PDzgERI3QuZhNSrZ/qSU2mEs941c6F4QoLuN MEdjdLRa2+NUTiTkML4w== X-Received: by 2002:a17:902:e752:b0:2ae:cb0e:fd74 with SMTP id d9443c01a7336-2b0827a76d3mr146885405ad.24.1774343995940; Tue, 24 Mar 2026 02:19:55 -0700 (PDT) Received: from localhost ([111.228.63.84]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2b083655b5dsm172688005ad.52.2026.03.24.02.19.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Mar 2026 02:19:55 -0700 (PDT) From: Cen Zhang To: clm@fb.com Cc: dsterba@suse.com, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, baijiaju1990@gmail.com, zzzccc <1539412714@qq.com>, Cen Zhang Subject: [PATCH] btrfs: add btrfs_inode_disk_i_size() helper to prevent torn reads of disk_i_size Date: Tue, 24 Mar 2026 17:01:59 +0800 Message-Id: <20260324090200.3932789-1-zzzccc427@gmail.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: zzzccc <1539412714@qq.com> btrfs_inode::disk_i_size is a u64 field updated under inode->lock by btrfs_inode_safe_disk_i_size_write(), but several read sites access it without holding that lock. On 64-bit platforms this is fine because aligned u64 loads are architecturally atomic, but on 32-bit platforms a u64 load is performed as two 32-bit loads which can tear if a concurrent write updates both halves. A torn read of disk_i_size is dangerous in the metadata-serialization paths (fill_inode_item, fill_stack_inode_item) because the torn value gets persisted to the B-tree on disk. After a crash, fsck / mount would see a file size that never existed: - If the torn value is too large, stale data beyond the real EOF is exposed (information leak). - If the torn value is too small (e.g. zero), file data is silently lost. Signed-off-by: Cen Zhang --- fs/btrfs/btrfs_inode.h | 24 ++++++++++++++++++++++++ fs/btrfs/delayed-inode.c | 2 +- fs/btrfs/file.c | 2 +- fs/btrfs/inode.c | 6 +++--- 4 files changed, 29 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 55c272fe5d92..7aff326bedbb 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -418,6 +418,30 @@ static inline void btrfs_i_size_write(struct btrfs_ino= de *inode, u64 size) inode->disk_i_size =3D size; } =20 +/* + * Get the on-disk file size safely without holding inode->lock. + * + * disk_i_size is protected by inode->lock when being written (see + * btrfs_inode_safe_disk_i_size_write()), but several read sites access + * it without that lock. On 64-bit platforms a plain READ_ONCE() is + * sufficient because aligned u64 loads are atomic. On 32-bit platforms + * a u64 load can tear, so we take the spinlock to guarantee a consistent + * snapshot. + */ +static inline u64 btrfs_inode_disk_i_size(struct btrfs_inode *inode) +{ +#if BITS_PER_LONG =3D=3D 32 + u64 size; + + spin_lock(&inode->lock); + size =3D inode->disk_i_size; + spin_unlock(&inode->lock); + return size; +#else + return READ_ONCE(inode->disk_i_size); +#endif +} + static inline bool btrfs_is_free_space_inode(const struct btrfs_inode *ino= de) { return test_bit(BTRFS_INODE_FREE_SPACE_INODE, &inode->runtime_flags); diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 56ff8afe9a22..86be9d1bee55 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1841,7 +1841,7 @@ static void fill_stack_inode_item(struct btrfs_trans_= handle *trans, =20 btrfs_set_stack_inode_uid(inode_item, i_uid_read(vfs_inode)); btrfs_set_stack_inode_gid(inode_item, i_gid_read(vfs_inode)); - btrfs_set_stack_inode_size(inode_item, inode->disk_i_size); + btrfs_set_stack_inode_size(inode_item, btrfs_inode_disk_i_size(inode)); btrfs_set_stack_inode_mode(inode_item, vfs_inode->i_mode); btrfs_set_stack_inode_nlink(inode_item, vfs_inode->i_nlink); btrfs_set_stack_inode_nbytes(inode_item, inode_get_bytes(vfs_inode)); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index a4cb9d3cfc4e..dcd306f669d8 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -178,7 +178,7 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans, if (args->drop_cache) btrfs_drop_extent_map_range(inode, args->start, args->end - 1, false); =20 - if (data_race(args->start >=3D inode->disk_i_size) && !args->replace_exte= nt) + if (args->start >=3D btrfs_inode_disk_i_size(inode) && !args->replace_ext= ent) modify_tree =3D 0; =20 update_refs =3D (btrfs_root_id(root) !=3D BTRFS_TREE_LOG_OBJECTID); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index afc5d75d2dcb..5c75c949e855 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -837,7 +837,7 @@ static inline void inode_should_defrag(struct btrfs_ino= de *inode, { /* If this is a small write inside eof, kick off a defrag */ if (num_bytes < small_write && - (start > 0 || end + 1 < inode->disk_i_size)) + (start > 0 || end + 1 < btrfs_inode_disk_i_size(inode))) btrfs_add_inode_defrag(inode, small_write); } =20 @@ -4264,7 +4264,7 @@ static void fill_inode_item(struct btrfs_trans_handle= *trans, =20 btrfs_set_inode_uid(leaf, item, i_uid_read(inode)); btrfs_set_inode_gid(leaf, item, i_gid_read(inode)); - btrfs_set_inode_size(leaf, item, BTRFS_I(inode)->disk_i_size); + btrfs_set_inode_size(leaf, item, btrfs_inode_disk_i_size(BTRFS_I(inode))); btrfs_set_inode_mode(leaf, item, inode->i_mode); btrfs_set_inode_nlink(leaf, item, inode->i_nlink); =20 @@ -5455,7 +5455,7 @@ static int btrfs_setsize(struct inode *inode, struct = iattr *attr) ret2 =3D btrfs_wait_ordered_range(BTRFS_I(inode), 0, (u64)-1); if (ret2) return ret2; - i_size_write(inode, BTRFS_I(inode)->disk_i_size); + i_size_write(inode, btrfs_inode_disk_i_size(BTRFS_I(inode))); } } =20 --=20 2.34.1