From nobody Thu Dec 18 11:54:50 2025 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED48D28F93F for ; Thu, 10 Apr 2025 19:15:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744312507; cv=none; b=gdE/x9eX2ZSrhrxZ+kTZH3B2xALun+xIjmcB82oENMhN1vnEizXoEz0tRTttN+n5q0a3orTaxv3Jkamvw8GnEJj+fpvN5eZx6e6nOTKB0nZWV1nGng4M3VR7t7k5H3LZ99IP5yREmuPQX5CdFOxB3QiC3RiiTKNHc26BorVKH9c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744312507; c=relaxed/simple; bh=TKstUrLqXP5gAXXhXRa22rZF7ZrpYN0ErtNqmn8L6Pg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Myhkmuizf8rEdPVc+IaNMrBEqEbA26RU94g0nTrV64crwwfVQFUsjO+A0HPL1uwVs9xujKwQBEzoGegv4OpsIBt70uNoFHTasxGa3wK3otqFceLihA2KPJIuuuiD9IGYMdZ/vaqCnyqF1iLsvOrtZ44FrFmVGx+qtB7O28H2krA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=CMQZDzdw; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="CMQZDzdw" Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 53AJ7ApT023189; Thu, 10 Apr 2025 19:14:56 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=HZV89 /87me5vO2rrtx5oiznojRMUfJzE0drJ0U7p2nE=; b=CMQZDzdwJQA4DFqAHaJJS fOzQ9tJtLbtaLSDr7E2chz+9sxm2dVbqSijnZU1cEV+rqkyHzjNyTdmrtlW/J+jN VRaqVo8GjByo14VxE6i7uzsftMBX9lrJXzt/Ete3X/h5lKe27pgePZpMfA461O2x 600eTGjTHMb4/oUeHe2+C2Y+Ie8JhzAO9w63FiRb7icdUYAc4NavJA6gVCgd7UXw LaknZuHLMZ466i7k5jNuKIKzT8NitZ+kEV7iZ1DVcT88ovrwVVWHI+R/gPfk7Lej EODaL/zD73aYHZKyhlbDZRlllt9X6dJYJeBwtmuwWBU6sOcoC1CqGwxMV1ff+zid g== Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 45xkuy00g0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Apr 2025 19:14:55 +0000 (GMT) Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 53AIwiO7023934; Thu, 10 Apr 2025 19:14:55 GMT Received: from pps.reinject (localhost [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 45ttyk1gjp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Apr 2025 19:14:55 +0000 Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 53AJEn92040606; Thu, 10 Apr 2025 19:14:54 GMT Received: from sidhakum-ubuntu.osdevelopmeniad.oraclevcn.com (sidhakum-ubuntu.allregionaliads.osdevelopmeniad.oraclevcn.com [100.100.250.108]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 45ttyk1gg6-4; Thu, 10 Apr 2025 19:14:54 +0000 From: Sidhartha Kumar To: linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org Cc: linux-mm@kvack.org, akpm@linux-foundation.org, liam.howlett@oracle.com, willy@infradead.org, Sidhartha Kumar , "Liam R . Howlett" Subject: [PATCH v5 3/6] maple_tree: use vacant nodes to reduce worst case allocations Date: Thu, 10 Apr 2025 19:14:43 +0000 Message-ID: <20250410191446.2474640-4-sidhartha.kumar@oracle.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250410191446.2474640-1-sidhartha.kumar@oracle.com> References: <20250410191446.2474640-1-sidhartha.kumar@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1095,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-04-10_05,2025-04-10_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 adultscore=0 spamscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2502280000 definitions=main-2504100139 X-Proofpoint-GUID: HKoDVbyOLI-CV0R6je0n84BL7VEbT_My X-Proofpoint-ORIG-GUID: HKoDVbyOLI-CV0R6je0n84BL7VEbT_My Content-Type: text/plain; charset="utf-8" In order to determine the store type for a maple tree operation, a walk of the tree is done through mas_wr_walk(). This function descends the tree until a spanning write is detected or we reach a leaf node. While descending, keep track of the height at which we encounter a node with available space. This is done by checking if mas->end is less than the number of slots a given node type can fit. Now that the height of the vacant node is tracked, we can use the difference between the height of the tree and the height of the vacant node to know how many levels we will have to propagate creating new nodes. Update mas_prealloc_calc() to consider the vacant height and reduce the number of worst-case allocations. Rebalancing and spanning stores are not supported and fall back to using the full height of the tree for allocations. Update preallocation testing assertions to take into account vacant height. Reviewed-by: Liam R. Howlett Signed-off-by: Sidhartha Kumar --- include/linux/maple_tree.h | 2 + lib/maple_tree.c | 13 ++++-- tools/testing/radix-tree/maple.c | 79 ++++++++++++++++++++++++++++---- 3 files changed, 82 insertions(+), 12 deletions(-) diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h index cbbcd18d4186..657adb33e61e 100644 --- a/include/linux/maple_tree.h +++ b/include/linux/maple_tree.h @@ -463,6 +463,7 @@ struct ma_wr_state { void __rcu **slots; /* mas->node->slots pointer */ void *entry; /* The entry to write */ void *content; /* The existing entry that is being overwritten */ + unsigned char vacant_height; /* Height of lowest node with free space */ }; =20 #define mas_lock(mas) spin_lock(&((mas)->tree->ma_lock)) @@ -498,6 +499,7 @@ struct ma_wr_state { .mas =3D ma_state, \ .content =3D NULL, \ .entry =3D wr_entry, \ + .vacant_height =3D 0 \ } =20 #define MA_TOPIARY(name, tree) \ diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 195b19505b39..3f794ef072f4 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -3537,6 +3537,9 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas) if (ma_is_leaf(wr_mas->type)) return true; =20 + if (mas->end < mt_slots[wr_mas->type] - 1) + wr_mas->vacant_height =3D mas->depth + 1; + mas_wr_walk_traverse(wr_mas); } =20 @@ -4152,7 +4155,9 @@ static inline void mas_wr_prealloc_setup(struct ma_wr= _state *wr_mas) static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entr= y) { struct ma_state *mas =3D wr_mas->mas; - int ret =3D mas_mt_height(mas) * 3 + 1; + unsigned char height =3D mas_mt_height(mas); + int ret =3D height * 3 + 1; + unsigned char delta =3D height - wr_mas->vacant_height; =20 switch (mas->store_type) { case wr_invalid: @@ -4170,13 +4175,13 @@ static inline int mas_prealloc_calc(struct ma_wr_st= ate *wr_mas, void *entry) ret =3D 0; break; case wr_spanning_store: - ret =3D mas_mt_height(mas) * 3 + 1; + WARN_ON_ONCE(ret !=3D height * 3 + 1); break; case wr_split_store: - ret =3D mas_mt_height(mas) * 2 + 1; + ret =3D delta * 2 + 1; break; case wr_rebalance: - ret =3D mas_mt_height(mas) * 2 - 1; + ret =3D height * 2 + 1; break; case wr_node_store: ret =3D mt_in_rcu(mas->tree) ? 1 : 0; diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/ma= ple.c index e0f8fabe8821..e37a3ab2e921 100644 --- a/tools/testing/radix-tree/maple.c +++ b/tools/testing/radix-tree/maple.c @@ -35475,15 +35475,65 @@ static void check_dfs_preorder(struct maple_tree = *mt) } /* End of depth first search tests */ =20 +/* get height of the lowest non-leaf node with free space */ +static unsigned char get_vacant_height(struct ma_wr_state *wr_mas, void *e= ntry) +{ + struct ma_state *mas =3D wr_mas->mas; + char vacant_height =3D 0; + enum maple_type type; + unsigned long *pivots; + unsigned long min =3D 0; + unsigned long max =3D ULONG_MAX; + unsigned char offset; + + /* start traversal */ + mas_reset(mas); + mas_start(mas); + if (!xa_is_node(mas_root(mas))) + return 0; + + type =3D mte_node_type(mas->node); + wr_mas->type =3D type; + while (!ma_is_leaf(type)) { + mas_node_walk(mas, mte_to_node(mas->node), type, &min, &max); + offset =3D mas->offset; + mas->end =3D mas_data_end(mas); + pivots =3D ma_pivots(mte_to_node(mas->node), type); + + if (pivots) { + if (offset) + min =3D pivots[mas->offset - 1]; + if (offset < mas->end) + max =3D pivots[mas->offset]; + } + wr_mas->r_max =3D offset < mas->end ? pivots[offset] : mas->max; + + /* detect spanning write */ + if (mas_is_span_wr(wr_mas)) + break; + + if (mas->end < mt_slot_count(mas->node) - 1) + vacant_height =3D mas->depth + 1; + + mas_descend(mas); + type =3D mte_node_type(mas->node); + mas->depth++; + } + + return vacant_height; +} + /* Preallocation testing */ static noinline void __init check_prealloc(struct maple_tree *mt) { unsigned long i, max =3D 100; unsigned long allocated; unsigned char height; + unsigned char vacant_height; struct maple_node *mn; void *ptr =3D check_prealloc; MA_STATE(mas, mt, 10, 20); + MA_WR_STATE(wr_mas, &mas, ptr); =20 mt_set_non_kernel(1000); for (i =3D 0; i <=3D max; i++) @@ -35494,8 +35544,9 @@ static noinline void __init check_prealloc(struct m= aple_tree *mt) MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) !=3D 0); allocated =3D mas_allocated(&mas); height =3D mas_mt_height(&mas); + vacant_height =3D get_vacant_height(&wr_mas, ptr); MT_BUG_ON(mt, allocated =3D=3D 0); - MT_BUG_ON(mt, allocated !=3D 1 + height * 3); + MT_BUG_ON(mt, allocated !=3D 1 + (height - vacant_height) * 3); mas_destroy(&mas); allocated =3D mas_allocated(&mas); MT_BUG_ON(mt, allocated !=3D 0); @@ -35503,8 +35554,9 @@ static noinline void __init check_prealloc(struct m= aple_tree *mt) MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) !=3D 0); allocated =3D mas_allocated(&mas); height =3D mas_mt_height(&mas); + vacant_height =3D get_vacant_height(&wr_mas, ptr); MT_BUG_ON(mt, allocated =3D=3D 0); - MT_BUG_ON(mt, allocated !=3D 1 + height * 3); + MT_BUG_ON(mt, allocated !=3D 1 + (height - vacant_height) * 3); MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) !=3D 0); mas_destroy(&mas); allocated =3D mas_allocated(&mas); @@ -35514,7 +35566,8 @@ static noinline void __init check_prealloc(struct m= aple_tree *mt) MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) !=3D 0); allocated =3D mas_allocated(&mas); height =3D mas_mt_height(&mas); - MT_BUG_ON(mt, allocated !=3D 1 + height * 3); + vacant_height =3D get_vacant_height(&wr_mas, ptr); + MT_BUG_ON(mt, allocated !=3D 1 + (height - vacant_height) * 3); mn =3D mas_pop_node(&mas); MT_BUG_ON(mt, mas_allocated(&mas) !=3D allocated - 1); mn->parent =3D ma_parent_ptr(mn); @@ -35527,7 +35580,8 @@ static noinline void __init check_prealloc(struct m= aple_tree *mt) MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) !=3D 0); allocated =3D mas_allocated(&mas); height =3D mas_mt_height(&mas); - MT_BUG_ON(mt, allocated !=3D 1 + height * 3); + vacant_height =3D get_vacant_height(&wr_mas, ptr); + MT_BUG_ON(mt, allocated !=3D 1 + (height - vacant_height) * 3); mn =3D mas_pop_node(&mas); MT_BUG_ON(mt, mas_allocated(&mas) !=3D allocated - 1); MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) !=3D 0); @@ -35540,7 +35594,8 @@ static noinline void __init check_prealloc(struct m= aple_tree *mt) MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) !=3D 0); allocated =3D mas_allocated(&mas); height =3D mas_mt_height(&mas); - MT_BUG_ON(mt, allocated !=3D 1 + height * 3); + vacant_height =3D get_vacant_height(&wr_mas, ptr); + MT_BUG_ON(mt, allocated !=3D 1 + (height - vacant_height) * 3); mn =3D mas_pop_node(&mas); MT_BUG_ON(mt, mas_allocated(&mas) !=3D allocated - 1); mas_push_node(&mas, mn); @@ -35553,7 +35608,8 @@ static noinline void __init check_prealloc(struct m= aple_tree *mt) MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) !=3D 0); allocated =3D mas_allocated(&mas); height =3D mas_mt_height(&mas); - MT_BUG_ON(mt, allocated !=3D 1 + height * 3); + vacant_height =3D get_vacant_height(&wr_mas, ptr); + MT_BUG_ON(mt, allocated !=3D 1 + (height - vacant_height) * 3); mas_store_prealloc(&mas, ptr); MT_BUG_ON(mt, mas_allocated(&mas) !=3D 0); =20 @@ -35578,7 +35634,8 @@ static noinline void __init check_prealloc(struct m= aple_tree *mt) MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) !=3D 0); allocated =3D mas_allocated(&mas); height =3D mas_mt_height(&mas); - MT_BUG_ON(mt, allocated !=3D 1 + height * 2); + vacant_height =3D get_vacant_height(&wr_mas, ptr); + MT_BUG_ON(mt, allocated !=3D 1 + (height - vacant_height) * 2); mas_store_prealloc(&mas, ptr); MT_BUG_ON(mt, mas_allocated(&mas) !=3D 0); mt_set_non_kernel(1); @@ -35595,8 +35652,14 @@ static noinline void __init check_prealloc(struct = maple_tree *mt) MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) !=3D 0); allocated =3D mas_allocated(&mas); height =3D mas_mt_height(&mas); + vacant_height =3D get_vacant_height(&wr_mas, ptr); MT_BUG_ON(mt, allocated =3D=3D 0); - MT_BUG_ON(mt, allocated !=3D 1 + height * 3); + /* + * vacant height cannot be used to compute the number of nodes needed + * as the root contains two entries which means it is on the verge of + * insufficiency. The worst case full height of the tree is needed. + */ + MT_BUG_ON(mt, allocated !=3D height * 3 + 1); mas_store_prealloc(&mas, ptr); MT_BUG_ON(mt, mas_allocated(&mas) !=3D 0); mas_set_range(&mas, 0, 200); --=20 2.43.0