From nobody Mon Feb 9 16:17:23 2026 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EA8642E00C for ; Fri, 6 Feb 2026 18:24:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770402261; cv=none; b=VRdJPO9kTUr+rKNfMXsVuaXlaSj+SFhlJbUw/MYRiuRsImn5hQVaVIcmw2ZeqxIFLxwMTWBO9sRSTuITT1VxkkV8YtJ5lF++T9ZQwhlLE/hYGCsVd7wyQX9ClC6m8RGvjPreldxu2mfrvc5hPNc6QeXy57aP/GuTAOg2QGhA7Ig= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770402261; c=relaxed/simple; bh=XVhsTk+LYpWAXNJDQrvJq2UU4XDe6OSIWEwozraAeOc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nurk/8AhWD3R+06xT6UHZyatAsJYG9uSt/Siw6Pf3xAs2IcCsWXcrpz59aCTC4ySw7B5XnL3jNhVZ+yEJM7ZCrV7Jf3TeCLX2fIDUJnPJICSMEFQGYeTxTcJATdsf71KK1jAUS6dkvv60QZm72biRaQJk4uGVN5XYZSgtSYbVsE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=q827tzaI; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=q827tzaI; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="q827tzaI"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="q827tzaI" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id F2F7F5BD27; Fri, 6 Feb 2026 18:23:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1770402239; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ahxdoq0wAm++q/QMvFMTlIGeudeamFjihSebXBaXzxc=; b=q827tzaI998W3R1OaxrJEuODt4hhyA2sxStRK+nO8UgwIsr+6x8LhRCfV4zt9JXvi34KKh jRp7GVBngNSUGZUhEXk3igIekc0cNKAYuPIcJ/XoGM+e19/r+liIGM5k/SQcLS00ayiQZg up9jdfv+QGV73QtTgJFXBhi7BPy38qo= Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1770402239; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ahxdoq0wAm++q/QMvFMTlIGeudeamFjihSebXBaXzxc=; b=q827tzaI998W3R1OaxrJEuODt4hhyA2sxStRK+nO8UgwIsr+6x8LhRCfV4zt9JXvi34KKh jRp7GVBngNSUGZUhEXk3igIekc0cNKAYuPIcJ/XoGM+e19/r+liIGM5k/SQcLS00ayiQZg up9jdfv+QGV73QtTgJFXBhi7BPy38qo= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id C40DF3EA63; Fri, 6 Feb 2026 18:23:58 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id OMFeL74xhmkTCQAAD6G6ig (envelope-from ); Fri, 06 Feb 2026 18:23:58 +0000 From: Daniel Vacek To: Chris Mason , Josef Bacik , Eric Biggers , "Theodore Y. Ts'o" , Jaegeuk Kim , Jens Axboe , David Sterba Cc: linux-block@vger.kernel.org, Daniel Vacek , linux-fscrypt@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, Boris Burkov Subject: [PATCH v6 09/43] btrfs: add infrastructure for safe em freeing Date: Fri, 6 Feb 2026 19:22:41 +0100 Message-ID: <20260206182336.1397715-10-neelx@suse.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260206182336.1397715-1-neelx@suse.com> References: <20260206182336.1397715-1-neelx@suse.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -6.80 X-Spamd-Result: default: False [-6.80 / 50.00]; REPLY(-4.00)[]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; MID_CONTAINS_FROM(1.00)[]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; RCPT_COUNT_TWELVE(0.00)[13]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FUZZY_RATELIMITED(0.00)[rspamd.com]; TO_DN_SOME(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:mid,suse.com:email]; RCVD_TLS_ALL(0.00)[] X-Spam-Level: X-Spam-Flag: NO Content-Type: text/plain; charset="utf-8" From: Josef Bacik When we add fscrypt support we're going to have fscrypt objects hanging off of extent_maps. This includes a block key, which if we're the last one freeing the key we may have to unregister it from the block layer. This requires taking a semaphore in the block layer, which means we can't free em's under the extent map tree lock. Thankfully we only do this in two places, one where we're dropping a range of extent maps, and when we're freeing logged extents. Add a free_extent_map_safe() which will add the em to a list in the em_tree if we free'd the object. Currently this is unconditional but will be changed to conditional on the fscrypt object we will add in a later patch. To process these delayed objects add a free_pending_extent_maps() that is called after the lock has been dropped on the em_tree. This will process the extent maps on the freed list and do the appropriate freeing work in a safe manner. Signed-off-by: Josef Bacik Reviewed-by: Boris Burkov Signed-off-by: Daniel Vacek --- v5: https://lore.kernel.org/linux-btrfs/6cf44f7860e94de68df242e69f4c5250bd0= 61cff.1706116485.git.josef@toxicpanda.com/ * No changes since (other than simple function renames). --- fs/btrfs/extent_map.c | 76 +++++++++++++++++++++++++++++++++++++++++-- fs/btrfs/extent_map.h | 10 ++++++ fs/btrfs/tree-log.c | 6 ++-- 3 files changed, 87 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 095a561d733f..58589fc11802 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -34,7 +34,9 @@ void __cold btrfs_extent_map_exit(void) void btrfs_extent_map_tree_init(struct extent_map_tree *tree) { tree->root =3D RB_ROOT; + tree->flags =3D 0; INIT_LIST_HEAD(&tree->modified_extents); + INIT_LIST_HEAD(&tree->freed_extents); rwlock_init(&tree->lock); } =20 @@ -51,9 +53,15 @@ struct extent_map *btrfs_alloc_extent_map(void) RB_CLEAR_NODE(&em->rb_node); refcount_set(&em->refs, 1); INIT_LIST_HEAD(&em->list); + INIT_LIST_HEAD(&em->free_list); return em; } =20 +static void free_extent_map(struct extent_map *em) +{ + kmem_cache_free(extent_map_cache, em); +} + /* * Drop the reference out on @em by one and free the structure if the refe= rence * count hits zero. @@ -65,10 +73,69 @@ void btrfs_free_extent_map(struct extent_map *em) if (refcount_dec_and_test(&em->refs)) { WARN_ON(btrfs_extent_map_in_tree(em)); WARN_ON(!list_empty(&em->list)); - kmem_cache_free(extent_map_cache, em); + free_extent_map(em); + } +} + +/* + * Drop a ref for the extent map in the given tree. + * + * @tree: tree that the em is a part of. + * @em: the em to drop the reference to. + * + * Drop the reference count on @em by one, if the reference count hits 0 a= nd + * there is an object on the em that can't be safely freed in the current + * context (if we are holding the extent_map_tree->lock for example), then= add + * it to the freed_extents list on the extent_map_tree for later processin= g. + * + * This must be followed by a btrfs_free_pending_extent_maps() to clear + * the pending frees. + */ +void btrfs_free_extent_map_safe(struct extent_map_tree *tree, + struct extent_map *em) +{ + lockdep_assert_held_write(&tree->lock); + + if (!em) + return; + + if (refcount_dec_and_test(&em->refs)) { + WARN_ON(btrfs_extent_map_in_tree(em)); + WARN_ON(!list_empty(&em->list)); + list_add_tail(&em->free_list, &tree->freed_extents); + set_bit(EXTENT_MAP_TREE_PENDING_FREES, &tree->flags); } } =20 +/* + * Free the em objects that exist on the em tree + * + * @tree: the tree to free the objects from. + * + * If there are any objects on the em->freed_extents list go ahead and + * free them here in a safe way. This is to be coupled with any uses of + * btrfs_free_extent_map_safe(). + */ +void btrfs_free_pending_extent_maps(struct extent_map_tree *tree) +{ + struct extent_map *em; + + /* Avoid taking the write lock if we don't have any pending frees. */ + if (!test_and_clear_bit(EXTENT_MAP_TREE_PENDING_FREES, &tree->flags)) + return; + + write_lock(&tree->lock); + while ((em =3D list_first_entry_or_null(&tree->freed_extents, + struct extent_map, free_list))) { + list_del_init(&em->free_list); + write_unlock(&tree->lock); + free_extent_map(em); + cond_resched(); + write_lock(&tree->lock); + } + write_unlock(&tree->lock); +} + /* Do the math around the end of an extent, handling wrapping. */ static u64 range_end(u64 start, u64 len) { @@ -784,7 +851,7 @@ static void drop_all_extent_maps_fast(struct btrfs_inod= e *inode) em =3D rb_entry(node, struct extent_map, rb_node); em->flags &=3D ~(EXTENT_FLAG_PINNED | EXTENT_FLAG_LOGGING); btrfs_remove_extent_mapping(inode, em); - btrfs_free_extent_map(em); + btrfs_free_extent_map_safe(tree, em); =20 if (cond_resched_rwlock_write(&tree->lock)) node =3D rb_first(&tree->root); @@ -792,6 +859,8 @@ static void drop_all_extent_maps_fast(struct btrfs_inod= e *inode) node =3D next; } write_unlock(&tree->lock); + + btrfs_free_pending_extent_maps(tree); } =20 /* @@ -986,13 +1055,14 @@ void btrfs_drop_extent_map_range(struct btrfs_inode = *inode, u64 start, u64 end, btrfs_free_extent_map(em); next: /* Once for us (for our lookup reference). */ - btrfs_free_extent_map(em); + btrfs_free_extent_map_safe(em_tree, em); =20 em =3D next_em; } =20 write_unlock(&em_tree->lock); =20 + btrfs_free_pending_extent_maps(em_tree); btrfs_free_extent_map(split); btrfs_free_extent_map(split2); } diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h index 6f685f3c9327..a962012be1c3 100644 --- a/fs/btrfs/extent_map.h +++ b/fs/btrfs/extent_map.h @@ -97,11 +97,18 @@ struct extent_map { u32 flags; refcount_t refs; struct list_head list; + struct list_head free_list; +}; + +enum extent_map_flags { + EXTENT_MAP_TREE_PENDING_FREES, }; =20 struct extent_map_tree { struct rb_root root; + unsigned long flags; struct list_head modified_extents; + struct list_head freed_extents; rwlock_t lock; }; =20 @@ -175,6 +182,9 @@ int btrfs_split_extent_map(struct btrfs_inode *inode, u= 64 start, u64 len, u64 pr =20 struct extent_map *btrfs_alloc_extent_map(void); void btrfs_free_extent_map(struct extent_map *em); +void btrfs_free_extent_map_safe(struct extent_map_tree *tree, + struct extent_map *em); +void btrfs_free_pending_extent_maps(struct extent_map_tree *tree); int __init btrfs_extent_map_init(void); void __cold btrfs_extent_map_exit(void); int btrfs_unpin_extent_cache(struct btrfs_inode *inode, u64 start, u64 len= , u64 gen); diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index e1bd03ebfd98..4034c04d4d63 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -5383,7 +5383,7 @@ static int btrfs_log_changed_extents(struct btrfs_tra= ns_handle *trans, */ if (ret) { btrfs_clear_em_logging(inode, em); - btrfs_free_extent_map(em); + btrfs_free_extent_map_safe(tree, em); continue; } =20 @@ -5392,11 +5392,13 @@ static int btrfs_log_changed_extents(struct btrfs_t= rans_handle *trans, ret =3D log_one_extent(trans, inode, em, path, ctx); write_lock(&tree->lock); btrfs_clear_em_logging(inode, em); - btrfs_free_extent_map(em); + btrfs_free_extent_map_safe(tree, em); } WARN_ON(!list_empty(&extents)); write_unlock(&tree->lock); =20 + btrfs_free_pending_extent_maps(tree); + if (!ret) ret =3D btrfs_log_prealloc_extents(trans, inode, path, ctx); if (ret) --=20 2.51.0