From nobody Sat Feb 7 18:20:11 2026 Received: from mail-dy1-f180.google.com (mail-dy1-f180.google.com [74.125.82.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D970821CA0D for ; Thu, 29 Jan 2026 23:08:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769728118; cv=none; b=df/b6cdlH8sibo/6k1u/xXC5SAHvZ8rxmq0GSa8tVm4ECuktTdPPN4uC+/X+sAR2T0AvqjaEBJpznB73sx7eUsq1kW7ep4kDKKg4FPHCnprad36vDVd9OlCXLFChM1YwBC+l2CJ3Q+t4divaiq2SKbaPN7sVHuedwVI1RNstVOk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769728118; c=relaxed/simple; bh=l8A7Bb0Ah9P57uxJNkSy88Hx3ejJGD+DmCE0mmVuvfc=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=fhGopN6Cq3wwbVmHfoLUVATFtBKF7i/62JkreBb8sI4bo8fXy01+GHYcKaCldMZBf/jVXrHwjkMJ+pbT7X/HNY0jNSBLD5AlfPKZ5JsZTPR3RrVCgrS8IEH1FAGNk79arzBrL7ZMl5eCK7/HETFSU7VIIBazignJT+/1hYuojHk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mfHs7d/w; arc=none smtp.client-ip=74.125.82.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mfHs7d/w" Received: by mail-dy1-f180.google.com with SMTP id 5a478bee46e88-2b704f08e73so910606eec.1 for ; Thu, 29 Jan 2026 15:08:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769728116; x=1770332916; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=sh7b3TT+ztYyRiJ7tIzVvf9puY7sLF3p/pWLZb/u1QM=; b=mfHs7d/wk4KVyFC08hjD+9sjAJf+ZSjm0QjE5NPfxOeCznsKFpQRyj3Nl5yDNIUQVM /gKg0A2hm1dxQlf14eR1RvnTVEI6g83MpFeYdrIWm8J3xU0Jfvq3793r7y/5qcGLFK1J xwrsZ6aAbsM6RbK2R04O6wUAReIUbrN8S2jygYf366emN/LoXGF3epYj1Nb/ujZ0yBxL 6Z30ePKR0xEPKrYnMRNmYYEutJlwxYLyTSK+nR/f1z7D9sOHTLUFs766cqunTEPYQtD+ KJuYf8+XM2R7+Lp5PMteU/A9IzSc9gptCq6uD25KAu8fd5vPwXq2CBmqCZOhTKpb8eLk xcQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769728116; x=1770332916; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=sh7b3TT+ztYyRiJ7tIzVvf9puY7sLF3p/pWLZb/u1QM=; b=hxG61sTUFTp8vMu0lqwvusv9YoqcnjQHxj2+6Mj4kpEkxXbHAhzFTQ0OqSgsrpFflH f3ahnpw+ceott1T0QlvV6GI2dfnVcJu5V5CMHomVKL5pocDf+ZIXPcjmaAaqPx133nBe ZrkJ8C+8NdSrWNojbT4ousmvwa8tc5gZoS73B/xDlA4JlHj8Sf8N+z3C0er3x31t1nhQ WxNVdU4zbopLpEUjazMvlUu/qeCQsT3EVil51q10onSzUaEC8dSPX6nMaQAcKZr8RI2k 7Id8C56Hg00muAQLlXX3sLKuum4FNUWZaMT+qeZ5FFDZaHvVf5ha6DU4/os68f329XKM qOsw== X-Forwarded-Encrypted: i=1; AJvYcCVt/TBjvSOR2wMnlW/6Nl+CxcgnNzHUVa/fETilLwnnOlmQ0+1fn1l8Cp5pCe+Hqd5ooX0i/m3pigNXZcI=@vger.kernel.org X-Gm-Message-State: AOJu0Yw1g3zW2HNlSXT+kPf6tMfRGI20iXi7aTZNWvPFttISdkNkODzE yR0TIXDFur6JWdHiARHj/OnVD9pfhnMVcwlBFRXiZg9+Ol31O5JpcKCg X-Gm-Gg: AZuq6aJBJEJxXS0p0RnlFlJHxnIk4Mu8C+TEfjP7BCufEkwFwlcMzfBgol3xJGna9Xl 1ODIGl1UdHbfZBsugLqh/ylrShAQBYOAb9+hFtUpYx7T+2uE929kvLWe6/FCBbFRZ2jaOAvfeLX k5/kdqcRecJjOISxnV8p6awutE7Czem25Xbb6cAF1IzcPjf+bf9sZB0ixK9d40G5yzWbOMjPFWG NmimT1SdSpBVF98yItdyGjskWAK7exNg1g08P4Uc2CFM/PETelYrjzF76Ci8yJMH57ijVsuYZe/ Kq1hswDjrCWBh4odZjY7nEULqwXcRjabwQi29fkggPDINJustldLkyKe2QjyPZrHpWli5UG3X3h pLxt+0zBeh7Goy98SIF8ecpU5y2938/iTurHUarN4BDj6Xk8L2isDIR3Ib0UsyTS8wGRpX899Ij P46pTABoYu3xgeBzTe9guY/bAAJse9v0hG/Ciewk8= X-Received: by 2002:a05:7300:f108:b0:2b7:1dd3:585b with SMTP id 5a478bee46e88-2b7b17a83bfmr2020423eec.7.1769728115855; Thu, 29 Jan 2026 15:08:35 -0800 (PST) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::2:15ff]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-2b7a1af88dasm9511691eec.32.2026.01.29.15.08.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Jan 2026 15:08:35 -0800 (PST) From: JP Kobryn To: boris@bur.io, clm@fb.com, dsterba@suse.com Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [RFC PATCH] btrfs: defer freeing of subpage private state to free_folio Date: Thu, 29 Jan 2026 15:08:22 -0800 Message-ID: <20260129230822.168034-1-inwardvessel@gmail.com> X-Mailer: git-send-email 2.52.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" During reclaim, file-mapped folio callbacks can be invoked through the address_space_operations "aops" interface. In one specific case involving btrfs, the release_folio() callback is made and the btrfs-specific private data is freed. Afterward, continuing in the reclaim path, the folio will be freed from the page cache if its refcount has reached zero. Because there is a window between the freeing of the private data and the refcount check, it is possible for another task to increment the refcount in parallel and the folio will remain in the page cache with NULL private data. The other task then acquires the folio and is forced to take precautions on accessing the private field. There is existing code that is aware of this. In some of these places, a NULL check is performed on the private field and if not present it gets recreated. There are surrounding comments referring to the race of freeing the private data. For example: /* * We unlock the page after the io is completed and then re-lock it * above. release_folio() could have come in between that and cleared * folio private, but left the page in the mapping. Set the page mapped * here to make sure it's properly set for the subpage stuff. */ ret =3D set_folio_extent_mapped(folio); It's worth noting in advance that the protections currently in place and also the points in which btrfs invokes filemap_invalidate_inode() may be sufficient. The purpose of this patch though, is to ensure the btrfs private subpage metadata lives as long as its folio, which may help avoid the loss of subpage metadata and improve maintainability (by preventing any possible use after free in present/future code). Currently the private data is freed in the btrfs-specific aops callback release_folio(), but this proposed change instead defers the freeing until the aops free_folio() callback. The patch also might have the advantage of being easy to backport to the LTS trees. On that note, it's worth mentioning that we encountered a kernel panic as a result of this sequence on a 6.16-based arm64 host (configured with 64k pages so btrfs is in subpage mode). On our 6.16 kernel, the race window is shown below between points A and B: [mm] page cache reclaim path [fs] relocation in subpage mode shrink_folio_list() folio_trylock() /* lock acquired */ filemap_release_folio() mapping->a_ops->release_folio() btrfs_release_folio() __btrfs_release_folio() clear_folio_extent_mapped() btrfs_detach_folio_state() bfs =3D folio_detach_private(folio) btrfs_free_folio_state(folio) kfree(bfs) /* point A */ prealloc_file_extent_cluster() filemap_lock_folio() folio_try_get() /* inc refcount */ folio_lock() /* wait for lock */ __remove_mapping() if (!folio_ref_freeze(folio, refcount)) /* point B */ goto cannot_free /* folio remains in cache */ folio_unlock(folio) /* lock released */ /* lock acquired */ btrfs_subpage_clear_updodate() bfs =3D folio->priv /* use-after-free = */ This exact race during relocation should not occur in the latest upstream code, but it's an example of a backport opportunity for this patch. Signed-off-by: JP Kobryn --- fs/btrfs/extent_io.c | 6 ++++-- fs/btrfs/inode.c | 18 ++++++++++++++++++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 3df399dc8856..d83d3f9ae3af 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -928,8 +928,10 @@ void clear_folio_extent_mapped(struct folio *folio) return; =20 fs_info =3D folio_to_fs_info(folio); - if (btrfs_is_subpage(fs_info, folio)) - return btrfs_detach_folio_state(fs_info, folio, BTRFS_SUBPAGE_DATA); + if (btrfs_is_subpage(fs_info, folio)) { + /* freeing of private subpage data is deferred to btrfs_free_folio */ + return; + } =20 folio_detach_private(folio); } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b8abfe7439a3..7a832ee3b591 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7565,6 +7565,23 @@ static bool btrfs_release_folio(struct folio *folio,= gfp_t gfp_flags) return __btrfs_release_folio(folio, gfp_flags); } =20 +/* frees subpage private data if present */ +static void btrfs_free_folio(struct folio *folio) +{ + struct btrfs_folio_state *bfs; + + if (!folio_test_private(folio)) + return; + + bfs =3D folio_detach_private(folio); + if (bfs =3D=3D (void *)EXTENT_FOLIO_PRIVATE) { + /* extent map flag is detached in btrfs_folio_release */ + return; + } + + btrfs_free_folio_state(bfs); +} + #ifdef CONFIG_MIGRATION static int btrfs_migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, @@ -10651,6 +10668,7 @@ static const struct address_space_operations btrfs_= aops =3D { .invalidate_folio =3D btrfs_invalidate_folio, .launder_folio =3D btrfs_launder_folio, .release_folio =3D btrfs_release_folio, + .free_folio =3D btrfs_free_folio, .migrate_folio =3D btrfs_migrate_folio, .dirty_folio =3D filemap_dirty_folio, .error_remove_folio =3D generic_error_remove_folio, --=20 2.47.3