From nobody Sat Feb 7 08:14:04 2026 Received: from m16.mail.163.com (m16.mail.163.com [220.197.31.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26E2C70814; Tue, 27 Jan 2026 02:53:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.31.4 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769482387; cv=none; b=gWlQtpMGxLLGVppoqxm8GkMEKujsZ0u/mMZKTd+08HV+OiC/ZJ15nEzYFCT4w1FywoCCAA/xwyRO/kVtgOKnNaklZ9+R0qpJt00jMZ2c2bpfIWu7BerlSlutY9aQAaVYbnfpE1zxOIdtglHl17LVJWXEw0iZ85gYrv3ytRZzlWU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769482387; c=relaxed/simple; bh=D5yHOL82ovoq0mnd6a4eGKHvRXTz8EJDxVdt6IvTv5k=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=JfrnYQGM+cRwpYNUdgaUHvxLKC//SyCJNMDOo5kMNrrs2AWphtVxGpTWJTwaUz1uUwDCFC3DQEn8BgbhFB5IJ5s4KJL1a0j0lIVj+J6P92lhAVygFdpKzAr1Lxw/sYri0rpCkfD0gYxt3i21yGZCZ9QG7Apjmt8lsYc+24L2hRk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com; spf=pass smtp.mailfrom=163.com; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b=BjKUzZ1I; arc=none smtp.client-ip=220.197.31.4 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=163.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b="BjKUzZ1I" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=64 6QV43xXhfjmMJQ/vkItQt0YhdNJtuZmymnH9a7HQg=; b=BjKUzZ1I7CewrkkQXo Sjd5ri6p7Fh+FynPYCkpvOEVTC/uLMku1hophfVJRRiT//XxcbQrbGR0QeolmVlj gLGfyvzY9zFTViubfipQraZ53CnxOY1IH3oc16blHuItxzZxaEpKToPIuBwLKf/q brk7CDf/xBg3G1cOeG+qCo0w4= Received: from pek-lpg-core6.wrs.com (unknown []) by gzga-smtp-mtada-g0-1 (Coremail) with SMTP id _____wAnZxJ5KHhpSPvzHg--.820S2; Tue, 27 Jan 2026 10:52:43 +0800 (CST) From: Rahul Sharma To: gregkh@linuxfoundation.org, stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Boris Burkov , Qu Wenruo , David Sterba , Rahul Sharma Subject: [PATCH v6.12] btrfs: fix racy bitfield write in btrfs_clear_space_info_full() Date: Tue, 27 Jan 2026 10:52:40 +0800 Message-Id: <20260127025240.1991664-1-black.hawk@163.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _____wAnZxJ5KHhpSPvzHg--.820S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxKr4UKr1ruFy7AF4kCF1rJFb_yoWfurWkpr Wa9r9Iyw4kJF95Wr4kWr4kXF4fKwn5W3W5tr9xAa4rZrn8Grn8WrWqka4FvF4ktrn5XF4a qF4UGr15XF15C37anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0ziqYLkUUUUU= X-CM-SenderInfo: 5eoduy4okd4yi6rwjhhfrp/xtbC3RugOml4KHtOrgAA3+ Content-Type: text/plain; charset="utf-8" From: Boris Burkov [ Upstream commit 38e818718c5e04961eea0fa8feff3f100ce40408 ] From the memory-barriers.txt document regarding memory barrier ordering guarantees: (*) These guarantees do not apply to bitfields, because compilers often generate code to modify these using non-atomic read-modify-write sequences. Do not attempt to use bitfields to synchronize parallel algorithms. (*) Even in cases where bitfields are protected by locks, all fields in a given bitfield must be protected by one lock. If two fields in a given bitfield are protected by different locks, the compiler's non-atomic read-modify-write sequences can cause an update to one field to corrupt the value of an adjacent field. btrfs_space_info has a bitfield sharing an underlying word consisting of the fields full, chunk_alloc, and flush: struct btrfs_space_info { struct btrfs_fs_info * fs_info; /* 0 8 */ struct btrfs_space_info * parent; /* 8 8 */ ... int clamp; /* 172 4 */ unsigned int full:1; /* 176: 0 4 */ unsigned int chunk_alloc:1; /* 176: 1 4 */ unsigned int flush:1; /* 176: 2 4 */ ... Therefore, to be safe from parallel read-modify-writes losing a write to one of the bitfield members protected by a lock, all writes to all the bitfields must use the lock. They almost universally do, except for btrfs_clear_space_info_full() which iterates over the space_infos and writes out found->full =3D 0 without a lock. Imagine that we have one thread completing a transaction in which we finished deleting a block_group and are thus calling btrfs_clear_space_info_full() while simultaneously the data reclaim ticket infrastructure is running do_async_reclaim_data_space(): T1 T2 btrfs_commit_transaction btrfs_clear_space_info_full data_sinfo->full =3D 0 READ: full:0, chunk_alloc:0, flush:1 do_async_reclaim_data_space(d= ata_sinfo) spin_lock(&space_info->lock); if(list_empty(tickets)) space_info->flush =3D 0; READ: full: 0, chunk_alloc:= 0, flush:1 MOD/WRITE: full: 0, chunk_a= lloc:0, flush:0 spin_unlock(&space_info->lo= ck); return; MOD/WRITE: full:0, chunk_alloc:0, flush:1 and now data_sinfo->flush is 1 but the reclaim worker has exited. This breaks the invariant that flush is 0 iff there is no work queued or running. Once this invariant is violated, future allocations that go into __reserve_bytes() will add tickets to space_info->tickets but will see space_info->flush is set to 1 and not queue the work. After this, they will block forever on the resulting ticket, as it is now impossible to kick the worker again. I also confirmed by looking at the assembly of the affected kernel that it is doing RMW operations. For example, to set the flush (3rd) bit to 0, the assembly is: andb $0xfb,0x60(%rbx) and similarly for setting the full (1st) bit to 0: andb $0xfe,-0x20(%rax) So I think this is really a bug on practical systems. I have observed a number of systems in this exact state, but am currently unable to reproduce it. Rather than leaving this footgun lying around for the future, take advantage of the fact that there is room in the struct anyway, and that it is already quite large and simply change the three bitfield members to bools. This avoids writes to space_info->full having any effect on writes to space_info->flush, regardless of locking. Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure") Reviewed-by: Qu Wenruo Signed-off-by: Boris Burkov Reviewed-by: David Sterba Signed-off-by: David Sterba [ The context change is due to the commit cc0517fe779f ("btrfs: tweak extent/chunk allocation for space_info sub-space") in v6.16 which is irrelevant to the logic of this patch. ] Signed-off-by: Rahul Sharma --- fs/btrfs/block-group.c | 6 +++--- fs/btrfs/space-info.c | 22 +++++++++++----------- fs/btrfs/space-info.h | 6 +++--- 3 files changed, 17 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 83a196521670..12692076f4bf 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -4195,7 +4195,7 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *tran= s, u64 flags, mutex_unlock(&fs_info->chunk_mutex); } else { /* Proceed with allocation */ - space_info->chunk_alloc =3D 1; + space_info->chunk_alloc =3D true; wait_for_alloc =3D false; spin_unlock(&space_info->lock); } @@ -4244,7 +4244,7 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *tran= s, u64 flags, spin_lock(&space_info->lock); if (ret < 0) { if (ret =3D=3D -ENOSPC) - space_info->full =3D 1; + space_info->full =3D true; else goto out; } else { @@ -4254,7 +4254,7 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *tran= s, u64 flags, =20 space_info->force_alloc =3D CHUNK_ALLOC_NO_FORCE; out: - space_info->chunk_alloc =3D 0; + space_info->chunk_alloc =3D false; spin_unlock(&space_info->lock); mutex_unlock(&fs_info->chunk_mutex); =20 diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index d5a9cd8a4fd8..5317369c5849 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -183,7 +183,7 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info *= info) struct btrfs_space_info *found; =20 list_for_each_entry(found, head, list) - found->full =3D 0; + found->full =3D false; } =20 /* @@ -318,7 +318,7 @@ void btrfs_add_bg_to_space_info(struct btrfs_fs_info *i= nfo, found->bytes_readonly +=3D block_group->bytes_super; btrfs_space_info_update_bytes_zone_unusable(info, found, block_group->zon= e_unusable); if (block_group->length > 0) - found->full =3D 0; + found->full =3D false; btrfs_try_granting_tickets(info, found); spin_unlock(&found->lock); =20 @@ -1093,7 +1093,7 @@ static void btrfs_async_reclaim_metadata_space(struct= work_struct *work) spin_lock(&space_info->lock); to_reclaim =3D btrfs_calc_reclaim_metadata_size(fs_info, space_info); if (!to_reclaim) { - space_info->flush =3D 0; + space_info->flush =3D false; spin_unlock(&space_info->lock); return; } @@ -1105,7 +1105,7 @@ static void btrfs_async_reclaim_metadata_space(struct= work_struct *work) flush_space(fs_info, space_info, to_reclaim, flush_state, false); spin_lock(&space_info->lock); if (list_empty(&space_info->tickets)) { - space_info->flush =3D 0; + space_info->flush =3D false; spin_unlock(&space_info->lock); return; } @@ -1148,7 +1148,7 @@ static void btrfs_async_reclaim_metadata_space(struct= work_struct *work) flush_state =3D FLUSH_DELAYED_ITEMS_NR; commit_cycles--; } else { - space_info->flush =3D 0; + space_info->flush =3D false; } } else { flush_state =3D FLUSH_DELAYED_ITEMS_NR; @@ -1310,7 +1310,7 @@ static void btrfs_async_reclaim_data_space(struct wor= k_struct *work) =20 spin_lock(&space_info->lock); if (list_empty(&space_info->tickets)) { - space_info->flush =3D 0; + space_info->flush =3D false; spin_unlock(&space_info->lock); return; } @@ -1321,7 +1321,7 @@ static void btrfs_async_reclaim_data_space(struct wor= k_struct *work) flush_space(fs_info, space_info, U64_MAX, ALLOC_CHUNK_FORCE, false); spin_lock(&space_info->lock); if (list_empty(&space_info->tickets)) { - space_info->flush =3D 0; + space_info->flush =3D false; spin_unlock(&space_info->lock); return; } @@ -1338,7 +1338,7 @@ static void btrfs_async_reclaim_data_space(struct wor= k_struct *work) data_flush_states[flush_state], false); spin_lock(&space_info->lock); if (list_empty(&space_info->tickets)) { - space_info->flush =3D 0; + space_info->flush =3D false; spin_unlock(&space_info->lock); return; } @@ -1355,7 +1355,7 @@ static void btrfs_async_reclaim_data_space(struct wor= k_struct *work) if (maybe_fail_all_tickets(fs_info, space_info)) flush_state =3D 0; else - space_info->flush =3D 0; + space_info->flush =3D false; } else { flush_state =3D 0; } @@ -1371,7 +1371,7 @@ static void btrfs_async_reclaim_data_space(struct wor= k_struct *work) =20 aborted_fs: maybe_fail_all_tickets(fs_info, space_info); - space_info->flush =3D 0; + space_info->flush =3D false; spin_unlock(&space_info->lock); } =20 @@ -1740,7 +1740,7 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_i= nfo, */ maybe_clamp_preempt(fs_info, space_info); =20 - space_info->flush =3D 1; + space_info->flush =3D true; trace_btrfs_trigger_flush(fs_info, space_info->flags, orig_bytes, flush, diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index efbecc0c5258..ce20974088b0 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -127,11 +127,11 @@ struct btrfs_space_info { flushing. The value is >> clamp, so turns out to be a 2^clamp divisor. */ =20 - unsigned int full:1; /* indicates that we cannot allocate any more + bool full; /* indicates that we cannot allocate any more chunks for this space */ - unsigned int chunk_alloc:1; /* set if we are allocating a chunk */ + bool chunk_alloc; /* set if we are allocating a chunk */ =20 - unsigned int flush:1; /* set if we are trying to make space */ + bool flush; /* set if we are trying to make space */ =20 unsigned int force_alloc; /* set if we need to force a chunk alloc for this space */ --=20 2.34.1