From nobody Sat Jun 13 23:22:38 2026 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F1962F7EE7 for ; Sat, 13 Jun 2026 17:20:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.168.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371235; cv=none; b=FmsHarkSoEVX5UxDHUDQcz44u8G/x203JFDsgdrCZ8AX1p/tLrXbhnQI3R/IaVc6erm/9f1RKS4RcVUK/zWRvhE3fzIOLBFSUlFhT0rw8DLld/naDYApE0XsxnVTQIh5/rVwNDNAw86XUNUuP8kdGdUeNUZTIcw41ILmj83Fsys= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371235; c=relaxed/simple; bh=iVIE9NXQEoQrqSLoH90EvemMg/eTbDXpzD72a2nZZo4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=dg0JHlXqtFz+qleK4CHRUZnEJ19WUCmMAambiewV1rtJbaI6IZ6TVMlVjz4oH+U2tuiqHMaG0YdVr43juQc8xnoAWf85t9ZpIOj51YhncZFO2zVp4QYPQjrF+LWypBPBBw75TZA+KDvJg0ybMX6kI/ZPbS2wPd1+mAio406ljT0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com; spf=pass smtp.mailfrom=oss.qualcomm.com; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b=EuZLRjQY; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b=L9U0Skpy; arc=none smtp.client-ip=205.220.168.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b="EuZLRjQY"; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="L9U0Skpy" Received: from pps.filterd (m0279864.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65DF8p552783848 for ; Sat, 13 Jun 2026 17:20:33 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= ARhAIfL/KUaU+OVkcGHklt1anwnixW8IFSbeLVy/MIA=; b=EuZLRjQYIQQkQPVh vrj0YIlJoIpkCzDtE6Bluk16l6bmPwBK8LvWIkZruM9BfSq/HGfh0/5UnKhFmTRt TBdyTxxRG0mwqnf96T2jAeJfzujotQTHRitxCez5/acO8Wwfka6s0RwXLy91+sn/ spSemHGrJe+jZPWo3+JWYDBDEiUJyc5WaaVfb1NTqaNXDliJEVZIsnUYLiJW+awJ 4qczg8FaUZv9ilsnxvQpIvs/L/p0WqbFiXXe17wbMUK5gydtKp13JrpBu81crECt 39AsQffZ1Crv/lQTqGJ/PSoc2SVRRpvabxN2pYbcjsyMfoF51owgKBLZMOqI0jTb 2C7txQ== Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4es0cghgxk-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Sat, 13 Jun 2026 17:20:33 +0000 (GMT) Received: by mail-pg1-f198.google.com with SMTP id 41be03b00d2f7-c85a2cde332so922768a12.3 for ; Sat, 13 Jun 2026 10:20:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1781371233; x=1781976033; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=ARhAIfL/KUaU+OVkcGHklt1anwnixW8IFSbeLVy/MIA=; b=L9U0SkpyLGUGXbICUIlT0NM4MqLOAQsSo2hFeDWql038fvdzTJomLf2QnGpTWU6nTr yjSOjLnbjv6qBpq1yobFHffpGRZDShh98BMR6LJd8AMf+RIqhKe2gNAz7QhYwW2qsh9R 4AmnjZLWhPr82YMEAuxBX2lXmvCYOHd2dFfP8+63MA5AZZErSwf6z03rObgCl/gesfsV chUz3UKRmGC7Sphn/8Xd7aadT0NPBB6O6CqMkShTuYCYVed/6aNKK6v/dK9/Yn1DaiWY DIJUk99/Hq6jyQg8YdygG4i0Im+RFqFK365ZnaMGqTQIig4hQoFQoIUoq5T9tS+UN9m5 omIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781371233; x=1781976033; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=ARhAIfL/KUaU+OVkcGHklt1anwnixW8IFSbeLVy/MIA=; b=T+JZUUCXgH/x+/t7QQr2gP6wT6VfcTrxclhPxVIRnF4ZtaunTUpNZ5vWS8qMNZhTf3 p4hWsomLuh6J2fxdzzaZBmn28a+z5hs1Sbxv/T3BItkic6XEu0KPmItKR/jAfryrv4bq yLgaSvxekmgGsZRdiIS0uuRAeEpzD/hr4Lm5RB+jmwqHNDlnOWSSv4ZFLLbVp+w1fzG8 4KXqQ4gfDWOmZNEKL0W1eYnouLQUs09oWT+5Ltb0Yw0rN+vWj8geg8GWxa1KTznFCpkT z7Oxs4glS+Dz3E8JZuUQOn5qTa1Im/kjhu78wzuxuS2K3/9lPuhQ2SzzCGbb2k1x5Anj 1g6g== X-Forwarded-Encrypted: i=1; AFNElJ/torVoQs0RrMLP50qVfhYH3E98Jeic0HctqhcdQeGWZj1XDKAgcCe+stx7TjrwwjSVlokgsrSl/eL0zO0=@vger.kernel.org X-Gm-Message-State: AOJu0YzquVVZnkj9as2PLA9LoiLgrWTML9gsUi7dt/fy46VRqCtZcC8k l5p84e0m1AU9hM07JxDrgYbtACWfc7gbmDrmG9haaxtbsjev9iEWTb6Sf1TmJXNKHFBQyRJWEbP DqIHRGpfew3QrhWbPbPCVPGftwAGHghCn63wbaEmK0ZcFJi3l4zZzBqbKXO3AjZQderU= X-Gm-Gg: Acq92OGWdkYnAExbw5WjgJV8e9vWVtI/HEpMitnNRsZsMxO5fypHILvtBZsytRe4rmz +R+4LZd1adpWhojeQ/meZ6g+Ta92PBDJKSOeFMPI0pBkjPFezFGD3eaGoVr+GwZ0HbIW+lx7jp+ ddHb2oPsgAo4rvw/RM1fmCWWfCGDC7Nvg9f0k1YpmqEiWR2kiTj3+aq3IwQYJRe4w/TNj+pAem8 opNf5ht+uPazjgnoA/mSoflXlsRAYJQano/4UgUKQ0802fkjBH9EnZEIfWW0nC0QRXQAGrwCba5 pPpNAeBUuIBbgmnTcC0sFmuKiBaUVXfGPUySOpPzqCmMDQq0eac0/RpzWG5n/NoDYoUZgIcqZfU rcj093IWj3gV1y3HsWxFmfA5cRO3i8FpLNH16C/gWfuhzeXVvMJPzLA== X-Received: by 2002:a05:6a21:6b0d:b0:398:c0ba:9ceb with SMTP id adf61e73a8af0-3b783b717e0mr8866437637.12.1781371232611; Sat, 13 Jun 2026 10:20:32 -0700 (PDT) X-Received: by 2002:a05:6a21:6b0d:b0:398:c0ba:9ceb with SMTP id adf61e73a8af0-3b783b717e0mr8866398637.12.1781371232035; Sat, 13 Jun 2026 10:20:32 -0700 (PDT) Received: from hu-pranarya-hyd.qualcomm.com ([202.46.22.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434accbec5sm5390913b3a.16.2026.06.13.10.20.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 10:20:31 -0700 (PDT) From: Pranjal Arya Date: Sat, 13 Jun 2026 22:49:43 +0530 Subject: [PATCH RFC 01/12] mm/vmalloc: introduce maple_tree-based indexing for vmap_area Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260613-vmalloc_maple-v1-1-0aa740bb944b@oss.qualcomm.com> References: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> In-Reply-To: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> To: Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Alice Ryhl , Andrew Ballance Cc: linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, Lorenzo Stoakes , Pranjal Shrivastava , Will Deacon , Suzuki K Poulose , Neil Armstrong , Mostafa Saleh , Balbir Singh , Suren Baghdasaryan , Marco Elver , Dmitry Vyukov , Alexander Potapenko , Shuah Khan , Dev Jain , Brendan Jackman , Puranjay Mohan , Santosh Shukla , Wyes Karny , Pranjal Arya , Sudeep Holla X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1781371215; l=15109; i=pranjal.arya@oss.qualcomm.com; s=20260516; h=from:subject:message-id; bh=iVIE9NXQEoQrqSLoH90EvemMg/eTbDXpzD72a2nZZo4=; b=CPLygkLp+PKrllmdJ6bYk4qLeuLE3uYFh7TJXThjMqD+4hd7JpODplWi/WxJ57JnzWdWOdjE5 2LkNbUSIzcBAvQYDoloDlpDmAPeXc2xkX4RNTx9MUYxHKE8Xv1G/OE+ X-Developer-Key: i=pranjal.arya@oss.qualcomm.com; a=ed25519; pk=ymtcTlccEIDsi3ErhpjIoZZHKdPBYWGWW0Lchs5MsbE= X-Proofpoint-ORIG-GUID: TaY5lAfaYt-U-J1byqsB9bgCKG4ZS7Wz X-Authority-Analysis: v=2.4 cv=NPLlPU6g c=1 sm=1 tr=0 ts=6a2d9161 cx=c_pps a=Qgeoaf8Lrialg5Z894R3/Q==:117 a=fChuTYTh2wq5r3m49p7fHw==:17 a=IkcTkHD0fZMA:10 a=FelO9ux0wxsA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=DJpcGTmdVt4CTyJn9g5Z:22 a=EUspDBNiAAAA:8 a=eoclzLbliDViwx4lJdQA:9 a=QEXdDO2ut3YA:10 a=x9snwWr2DeNwDh03kgHS:22 X-Proofpoint-GUID: TaY5lAfaYt-U-J1byqsB9bgCKG4ZS7Wz X-Proofpoint-Spam-Info: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX113kNU+daD8R +HlK7fbXJO7id5qIuSC1wpftT5gKroVCjdcaIBfK35YtjWbPC2jjQ5saLzL4OgHn67z3OUnk58P m7DH+NvmwGNv4cfGJmXt9AR5MGv5rzk= X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX1MlRwTn80n4b 4MVM2lrBdcsqnfEhbHlURT/0u7kZoG/k0Vt9MoGqVixck+iZmfaFwvmyphOrKttv5/029T74kM5 l/X/XHj3LD5GWLwCUaT6msV0hC92ejPMVw56Y6QGb67a/ZH6eKL4a5pXBNHBNQC+m2sGgTbdy36 0H5zTtxhD83PJkZ1hHN6HmBWny32t8he7ZaTFUEnc+Ijoe7Af8oAO+zAJnUWFM6wcp4YLxC3O25 cMiiDnQNF+1PvYSMsPlQF5zAQ74uQfiOsmOW1t0OQuolcA9tg+5tsXsfVpOw19WynjEnRHvnJvP Eae3Ziw9eTF1Tsjx2kPrlooj1biR2koo9/CfJsrJALMXTBwtjWKDPgijWjVwwtuD0IBfSmwTvQF qDwESY96JuQ6u4Ka4V7KG7QKqRxvqetyqhTZ0VLA12zI668jbR7EXNiK8uutMf7os9jWKf1v85C 4N8f9KWel9T6s0S5k1Q== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-13_03,2026-06-12_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 lowpriorityscore=0 suspectscore=0 malwarescore=0 bulkscore=0 adultscore=0 impostorscore=0 phishscore=0 clxscore=1015 priorityscore=1501 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606040000 definitions=main-2606130180 Add the maple_tree data structures, helper API, and runtime readiness plumbing that this series uses to retire the augmented rb_tree indexing of vmalloc free and busy ranges. Two new tree handles are added alongside the existing per-node lazy index: - free_vmap_area_mt address-keyed gap query for the global free-area allocator - vn->busy.mt per-node address-keyed lookup for find/free Helpers follow a try_init_*_locked / *_store_*_locked / *_erase_*_locked naming convention so that the conversion call sites read uniformly. The try_init_* helpers fold one-shot allocation of the maple-tree backing state into the first hot-path access; this keeps vmalloc_init() free of the per-tree GFP_NOWAIT paths and lets the tree machinery start cold. No external vmalloc behaviour change in this step. free/busy/lazy operations still go through the rb_tree and per-node lazy.mt; the new helpers and globals are wired up by the conversion patches that follow. Signed-off-by: Pranjal Arya --- mm/vmalloc.c | 433 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-= ---- 1 file changed, 402 insertions(+), 31 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 1afca3568b9b..67f753d04c96 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -880,22 +881,22 @@ static bool vmap_initialized __read_mostly; static struct kmem_cache *vmap_area_cachep; =20 /* - * This linked list is used in pair with free_vmap_area_root. - * It gives O(1) access to prev/next to perform fast coalescing. + * This linked list stores free areas sorted by start address. + * It gives O(1) access to neighbors for fast coalescing. */ static LIST_HEAD(free_vmap_area_list); +/* Next-fit hint to avoid scanning from list head on each allocation. */ +static unsigned long free_vmap_alloc_hint __maybe_unused =3D 1; =20 /* - * This augment red-black tree represents the free vmap space. - * All vmap_area objects in this tree are sorted by va->va_start - * address. It is used for allocation and merging when a vmap - * object is released. - * - * Each vmap_area node contains a maximum available free block - * of its sub-tree, right or left. Therefore it is possible to - * find a lowest match of free area. + * Maple tree shadow of free_vmap_area_list. It is used for + * address lookups and first-fit scans. */ static struct rb_root free_vmap_area_root =3D RB_ROOT; +static struct maple_tree free_vmap_area_mt __maybe_unused =3D + MTREE_INIT_EXT(free_vmap_area_mt, MT_FLAGS_LOCK_EXTERN, free_vmap_area_lo= ck); +static bool free_vmap_area_mt_enabled __maybe_unused; +static bool free_vmap_area_mt_init_tried __maybe_unused; =20 /* * Preload a CPU with one object for "no edge" split case. The @@ -906,14 +907,17 @@ static DEFINE_PER_CPU(struct vmap_area *, ne_fit_prel= oad_node); =20 /* * This structure defines a single, solid model where a list and - * rb-tree are part of one entity protected by the lock. Nodes are + * maple tree are part of one entity protected by the lock. Nodes are * sorted in ascending order, thus for O(1) access to left/right * neighbors a list is used as well as for sequential traversal. */ -struct rb_list { +struct mt_list { struct rb_root root; + struct maple_tree mt; struct list_head head; spinlock_t lock; + bool mt_enabled; + bool mt_init_tried; }; =20 /* @@ -940,8 +944,8 @@ static struct vmap_node { bool skip_populate; =20 /* Bookkeeping data of this node. */ - struct rb_list busy; - struct rb_list lazy; + struct mt_list busy; + struct mt_list lazy; =20 /* * Ready-to-free areas. @@ -1051,6 +1055,10 @@ va_size(struct vmap_area *va) return (va->va_end - va->va_start); } =20 +/* + * Transitional rb compatibility retained until all rb-only users are move= d. + * Follow-up patches in this RFC series remove these helpers. + */ static __always_inline unsigned long get_subtree_max_size(struct rb_node *node) { @@ -1070,6 +1078,130 @@ static DECLARE_WORK(drain_vmap_work, drain_vmap_are= a_work); =20 static __cacheline_aligned_in_smp atomic_long_t vmap_lazy_nr; =20 +/* + * maple nodes are allocated from slab; defer tree population until + * slab allocator is up to avoid early-boot failures. + */ +static __always_inline bool vmap_mt_runtime_ready(void) +{ + return READ_ONCE(vmap_initialized) && slab_is_available(); +} + +static __always_inline bool free_mt_supported(void) +{ + return free_vmap_area_mt_enabled; +} + +static __always_inline void disable_free_mt_locked(void) +{ + lockdep_assert_held(&free_vmap_area_lock); + + if (free_vmap_area_mt_enabled) { + __mt_destroy(&free_vmap_area_mt); + free_vmap_area_mt_enabled =3D false; + } +} + +static __always_inline void free_mt_store_va_locked(struct vmap_area *va) +{ + int err; + + lockdep_assert_held(&free_vmap_area_lock); + + MA_STATE(mas, &free_vmap_area_mt, va->va_start, va->va_end - 1); + + err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) + disable_free_mt_locked(); +} + +static __always_inline void free_mt_erase_va_locked(struct vmap_area *va) +{ + int err; + + lockdep_assert_held(&free_vmap_area_lock); + + MA_STATE(mas, &free_vmap_area_mt, va->va_start, va->va_end - 1); + + err =3D mas_store_gfp(&mas, NULL, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) + disable_free_mt_locked(); +} + +static __always_inline void +free_mt_update_va_locked(struct vmap_area *va, unsigned long old_start, + unsigned long old_end) +{ + int err; + + lockdep_assert_held(&free_vmap_area_lock); + + MA_STATE(mas_erase, &free_vmap_area_mt, old_start, old_end - 1); + MA_STATE(mas_store, &free_vmap_area_mt, va->va_start, va->va_end - 1); + + err =3D mas_store_gfp(&mas_erase, NULL, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) { + disable_free_mt_locked(); + return; + } + + err =3D mas_store_gfp(&mas_store, va, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) + disable_free_mt_locked(); +} + +static void free_mt_rebuild_locked(void) +{ + struct vmap_area *va; + int err; + + lockdep_assert_held(&free_vmap_area_lock); + + __mt_destroy(&free_vmap_area_mt); + free_vmap_area_mt_enabled =3D true; + + list_for_each_entry(va, &free_vmap_area_list, list) { + MA_STATE(mas, &free_vmap_area_mt, va->va_start, va->va_end - 1); + + err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) { + disable_free_mt_locked(); + return; + } + } +} + +static __always_inline void try_init_free_mt_locked(void) +{ + lockdep_assert_held(&free_vmap_area_lock); + + if (free_vmap_area_mt_init_tried) + return; + + if (!vmap_mt_runtime_ready()) + return; + + free_vmap_area_mt_init_tried =3D true; + free_mt_rebuild_locked(); +} + +static __always_inline struct vmap_area * +__find_vmap_area_list(unsigned long addr, struct list_head *head) +{ + struct vmap_area *va; + + addr =3D (unsigned long)kasan_reset_tag((void *)addr); + + list_for_each_entry(va, head, list) { + if (addr < va->va_start) + break; + if (addr < va->va_end) + return va; + } + + return NULL; +} + static struct vmap_area *__find_vmap_area(unsigned long addr, struct rb_ro= ot *root) { struct rb_node *n =3D root->rb_node; @@ -1092,29 +1224,268 @@ static struct vmap_area *__find_vmap_area(unsigned= long addr, struct rb_root *ro } =20 /* Look up the first VA which satisfies addr < va_end, NULL if none. */ -static struct vmap_area * -__find_vmap_area_exceed_addr(unsigned long addr, struct rb_root *root) +static __always_inline struct vmap_area * +__find_vmap_area_exceed_addr_list(unsigned long addr, struct list_head *he= ad) { - struct vmap_area *va =3D NULL; - struct rb_node *n =3D root->rb_node; + struct vmap_area *va; =20 addr =3D (unsigned long)kasan_reset_tag((void *)addr); =20 - while (n) { - struct vmap_area *tmp; + list_for_each_entry(va, head, list) { + if (va->va_end > addr) + return va; + } =20 - tmp =3D rb_entry(n, struct vmap_area, rb_node); - if (tmp->va_end > addr) { - va =3D tmp; - if (tmp->va_start <=3D addr) - break; + return NULL; +} =20 - n =3D n->rb_left; - } else - n =3D n->rb_right; +static __always_inline struct list_head * +find_vmap_area_insert_point_list(struct vmap_area *va, struct list_head *h= ead) +{ + struct vmap_area *tmp; + struct list_head *next =3D head; + + list_for_each_entry(tmp, head, list) { + if (tmp->va_start > va->va_start) { + next =3D &tmp->list; + break; + } } =20 - return va; + if (next !=3D head) { + tmp =3D list_entry(next, struct vmap_area, list); + if (WARN_ON_ONCE(va->va_end > tmp->va_start)) + return NULL; + } + + if (next->prev !=3D head) { + tmp =3D list_entry(next->prev, struct vmap_area, list); + if (WARN_ON_ONCE(va->va_start < tmp->va_end)) + return NULL; + } + + return next; +} + +/* + * Use maple-tree neighbour lookup to locate insertion point in O(log n), + * while preserving sorted-list neighbour traversal. + */ +static __always_inline struct list_head * +find_vmap_area_insert_point_mt(struct vmap_area *va, struct maple_tree *tr= ee, + struct list_head *head) +{ + struct vmap_area *prev, *next; + struct list_head *next_link; + + MA_STATE(mas, tree, va->va_start, va->va_start); + + mas_set(&mas, va->va_start); + next =3D mas_find(&mas, ULONG_MAX); + + if (next) { + if (WARN_ON_ONCE(next->va_start <=3D va->va_start)) + return NULL; + if (WARN_ON_ONCE(va->va_end > next->va_start)) + return NULL; + next_link =3D &next->list; + } else { + next_link =3D head; + } + + if (next_link->prev !=3D head) { + prev =3D list_entry(next_link->prev, struct vmap_area, list); + if (WARN_ON_ONCE(va->va_start < prev->va_end)) + return NULL; + } + + return next_link; +} + +static __always_inline bool +insert_vmap_area_list_sorted(struct vmap_area *va, struct list_head *head) +{ + struct list_head *next; + + next =3D find_vmap_area_insert_point_list(va, head); + if (!next) + return false; + + list_add_tail(&va->list, next); + return true; +} + +static __always_inline bool +insert_vmap_area_list_sorted_mt(struct vmap_area *va, struct maple_tree *t= ree, + struct list_head *head) +{ + struct list_head *next; + + next =3D find_vmap_area_insert_point_mt(va, tree, head); + if (!next) + return false; + + list_add_tail(&va->list, next); + return true; +} + +static __always_inline void +disable_busy_mt_locked(struct vmap_node *vn) +{ + lockdep_assert_held(&vn->busy.lock); + + if (vn->busy.mt_enabled) { + __mt_destroy(&vn->busy.mt); + vn->busy.mt_enabled =3D false; + } + + vn->busy.mt_init_tried =3D true; +} + +static __always_inline void +disable_lazy_mt_locked(struct vmap_node *vn) +{ + lockdep_assert_held(&vn->lazy.lock); + + if (vn->lazy.mt_enabled) { + __mt_destroy(&vn->lazy.mt); + vn->lazy.mt_enabled =3D false; + } + + vn->lazy.mt_init_tried =3D true; +} + +static void +busy_mt_rebuild_locked(struct vmap_node *vn) +{ + struct vmap_area *va; + int err; + + lockdep_assert_held(&vn->busy.lock); + + __mt_destroy(&vn->busy.mt); + vn->busy.mt_enabled =3D true; + + list_for_each_entry(va, &vn->busy.head, list) { + MA_STATE(mas, &vn->busy.mt, va->va_start, va->va_end - 1); + + err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) { + disable_busy_mt_locked(vn); + return; + } + } +} + +static __always_inline void +try_init_busy_mt_locked(struct vmap_node *vn) +{ + lockdep_assert_held(&vn->busy.lock); + + if (vn->busy.mt_init_tried) + return; + + if (!vmap_mt_runtime_ready()) + return; + + vn->busy.mt_init_tried =3D true; + busy_mt_rebuild_locked(vn); +} + +static void +lazy_mt_rebuild_locked(struct vmap_node *vn) +{ + struct vmap_area *va; + int err; + + lockdep_assert_held(&vn->lazy.lock); + + __mt_destroy(&vn->lazy.mt); + vn->lazy.mt_enabled =3D true; + + list_for_each_entry(va, &vn->lazy.head, list) { + MA_STATE(mas, &vn->lazy.mt, va->va_start, va->va_end - 1); + + err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) { + disable_lazy_mt_locked(vn); + return; + } + } +} + +static __always_inline void +try_init_lazy_mt_locked(struct vmap_node *vn) +{ + lockdep_assert_held(&vn->lazy.lock); + + if (vn->lazy.mt_init_tried) + return; + + if (!vmap_mt_runtime_ready()) + return; + + vn->lazy.mt_init_tried =3D true; + lazy_mt_rebuild_locked(vn); +} + +static __always_inline struct vmap_area * +__find_vmap_area_mt(unsigned long addr, struct maple_tree *tree) +{ + MA_STATE(mas, tree, addr, addr); + + addr =3D (unsigned long)kasan_reset_tag((void *)addr); + mas_set(&mas, addr); + + return mas_walk(&mas); +} + +static __always_inline struct vmap_area * +__find_vmap_area_exceed_addr_mt(unsigned long addr, struct maple_tree *tre= e) +{ + MA_STATE(mas, tree, addr, addr); + + addr =3D (unsigned long)kasan_reset_tag((void *)addr); + mas_set(&mas, addr); + + return mas_find(&mas, ULONG_MAX); +} + +static __always_inline struct vmap_area * +__find_vmap_area_enclose_addr_mt(unsigned long addr, struct maple_tree *tr= ee) +{ + MA_STATE(mas, tree, addr, addr); + + addr =3D (unsigned long)kasan_reset_tag((void *)addr); + mas_set(&mas, addr); + + return mas_find_rev(&mas, 0); +} + +static __always_inline struct vmap_area * +find_vmap_area_busy_locked(unsigned long addr, struct vmap_node *vn) +{ + lockdep_assert_held(&vn->busy.lock); + + try_init_busy_mt_locked(vn); + + if (likely(vn->busy.mt_enabled)) + return __find_vmap_area_mt(addr, &vn->busy.mt); + + return __find_vmap_area_list(addr, &vn->busy.head); +} + +static __always_inline struct vmap_area * +find_vmap_area_exceed_addr_busy_locked(unsigned long addr, struct vmap_nod= e *vn) +{ + lockdep_assert_held(&vn->busy.lock); + + try_init_busy_mt_locked(vn); + + if (likely(vn->busy.mt_enabled)) + return __find_vmap_area_exceed_addr_mt(addr, &vn->busy.mt); + + return __find_vmap_area_exceed_addr_list(addr, &vn->busy.head); } =20 /* @@ -1135,7 +1506,7 @@ find_vmap_area_exceed_addr_lock(unsigned long addr, s= truct vmap_area **va) =20 for_each_vmap_node(vn) { spin_lock(&vn->busy.lock); - *va =3D __find_vmap_area_exceed_addr(addr, &vn->busy.root); + *va =3D find_vmap_area_exceed_addr_busy_locked(addr, vn); =20 if (*va) if (!va_start_lowest || (*va)->va_start < va_start_lowest) @@ -1152,7 +1523,7 @@ find_vmap_area_exceed_addr_lock(unsigned long addr, s= truct vmap_area **va) vn =3D addr_to_node(va_start_lowest); =20 spin_lock(&vn->busy.lock); - *va =3D __find_vmap_area(va_start_lowest, &vn->busy.root); + *va =3D find_vmap_area_busy_locked(va_start_lowest, vn); =20 if (*va) return vn; --=20 2.34.1 From nobody Sat Jun 13 23:22:38 2026 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E813430D414 for ; Sat, 13 Jun 2026 17:20:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.168.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371244; cv=none; b=KeAevkkXcR+exdmyeKK7qDq0WEPXqV0+HafnrkIldWbzjyBUkpg8vNr8qlyLXuP3d5qtpg3YJlZM4gtHvLyhrkvxOluvReMAc2apJYk+vCmVlpayyYxGW/Ru+f6hM1xywkPyh1bRnBoFa+WKLR4MuUQG2Ve/cSqJFiibrkCAFVk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371244; c=relaxed/simple; bh=mzShuQjsivGBwOJFNo2gMC1ZvtaDXX3KGiVDadnIc+k=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=MDo76Ln2KQo0lM3j7Ue6dlolFzkK7tROZCS21bIWHBoyrI8yzsDYAAWRo+neJHWOx0YQ7o1xXGfjuuo9LxT1lvEQVXeUwFNYzs/kwpP693ZBXb2kNCmUN5PBoUtO+z4PVBJ52KTWLI/jOxnBRQ/IkDcpBZXSIhqrkAs1wqTGr7Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com; spf=pass smtp.mailfrom=oss.qualcomm.com; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b=ReVyqThP; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b=bwa8MFyx; arc=none smtp.client-ip=205.220.168.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b="ReVyqThP"; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="bwa8MFyx" Received: from pps.filterd (m0279867.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65DF8nX92758302 for ; Sat, 13 Jun 2026 17:20:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= mMQ+LL7Rsw0V5Gp6IYVyB67DNDJRzawcpfiN2k23OTw=; b=ReVyqThPMiH8nkUc cAeZlpJTYISvPRsLWrHlWxkj8kUDo61PFMSa/ly8poQqfNkmsQca5KZuKtn9zY6O GKihXHTbnvpj6xhoMovSZBn54FITkoWsxf4q1x9k+8x/xGMZmB1P6brG13o+d22Z BfVc3Xo+LzYD0bJ42StiAUlLtvPsvr77enIwTaAvNum9XhBqTXhalqt0Ow2EiAVx gARBROt8CWn2M0hSjcuKDs/TNDVdHA641TXAXx58M2sEtjnRSERsuqxLo5EXpV69 TFWYaYuAYwO7+Ki8DqyqKOr8NUT+RLB6NI6z0JRpRvouf+Jf2CZuk/Ztn9utXSLQ Sj560Q== Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4ery9f9nyr-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Sat, 13 Jun 2026 17:20:42 +0000 (GMT) Received: by mail-pf1-f197.google.com with SMTP id d2e1a72fcca58-8423f6247c6so2309717b3a.3 for ; Sat, 13 Jun 2026 10:20:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1781371241; x=1781976041; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=mMQ+LL7Rsw0V5Gp6IYVyB67DNDJRzawcpfiN2k23OTw=; b=bwa8MFyx0hVEVir+EGLTPJUaTFYgsXwolQCN27qSYDqWQtuwQhatuQjCwwYRqdek5v j/QrFtvYTQWW0q28PRYaxChAtkypniw6IEwp/i59sznJKlX+Piy5WWwcBZcOCLW7tR90 ybBlOdAvSx7RKUHlsb34tq5nHda9KMlVJpW32eeX0QXzxdfw86FMbEMfP76qKqV2wBxx G2og525jUSImo7yJFGruwL4ANGDMfMfGXnxei7VBIwucpfF402jN8/lfLQn6m5maiCfw /0HspvFVdsaO5iW/Dy+vIk0Dq8RIZaEE/95k+gx33/V0ZR27L7KGmq7CpZK65QJ59oUu bqww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781371241; x=1781976041; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=mMQ+LL7Rsw0V5Gp6IYVyB67DNDJRzawcpfiN2k23OTw=; b=GE+PKB/9w/fS3gmxcfCnZcog/Js7ETPjEhDT7tHZn22q4dOYc72ZjOeukYTN7EhO5y M+CBdCgB4PZsSYBS5qmWeKB/ZNkTJMyfzdLzTdb7MJTQNOhyuzDzGzVcI/GunaSJgufx yzd88E0P422lhNMp+rFBCK0bXuLEN2iFDxyK3FwWvxzQQ+z2dNqau7P6CvyQNHhogc4w hp7f69mc06jqs1JkavH89DZv767VGtl027hJoEagxVxdYEFxsSq4OcsZ3Q2QLHMpkqev f/vVLwjRfO0xKwTWZnQoLrF9DIQTj2l2LRuSzAhgK/8YctG/AhtLUmX3ydBQjTBxSSh8 mfjg== X-Forwarded-Encrypted: i=1; AFNElJ8I18wCN40vXwLNSkfFo1dXua+KhNCwJhUDcgMYkgYwg1mIkxLOAvVyYN1pQWICLr/r0Opp4rSg8ndwzP8=@vger.kernel.org X-Gm-Message-State: AOJu0Yxvw93mgiGZ4JTJ/gP2XoWlejkkgu2yl/u01I8iWCbXxFGfUxUQ VuzLHgFbiYVRC6dUCfZklPv1XctQXsETZFITKl32F5zWe8bQMDMGceYn1nKpCl26d+RFeN5uRSd KEgviWZuw1/QCElYw7RnhS18GsIwZrwMw2toHcLqMRVKFJpg1HF1wtVq+toVReB2+8t0= X-Gm-Gg: Acq92OE3aB0QJZ0jdT6LrVlISA05yRA04FKorrEoCf5G8I65Ql5qRKgyvp3CCT3kaR7 +2/Tja5My5dOvUNT+ke8FFNV48g/XPhjDKM5b3U1aQzCgIfrtMGXo9cvol/oeYeNTikrmZ9uUGD dTQ02qhjVaov4CGlAT3lNS5qSUUl56R6DMkASx/akkDRJROjRh919ZrP9juYskOQ6DnUVBj2XBI tVbwmmi7uVW0Lc6MXt2IIMsUy77VhmUc/H9ATf4nj2PUbCiu4nGtbyEALuAlt3ZbSERw8rwSSCe tTRwJV7E27w/WkG+XTnKf0ZjF5hMh+0Zo+dspVbHmVlWMw/AqNNKuwCm+cVHw1cIqhfJ78FI3p+ PHKp67p9iwJkl9ZwHAMPwfJyrXG+D4gVSYQcWNmpyJODLnsTKV4tZhA== X-Received: by 2002:a05:6a00:22d0:b0:842:4907:d089 with SMTP id d2e1a72fcca58-8434ce2921cmr8490717b3a.34.1781371241176; Sat, 13 Jun 2026 10:20:41 -0700 (PDT) X-Received: by 2002:a05:6a00:22d0:b0:842:4907:d089 with SMTP id d2e1a72fcca58-8434ce2921cmr8490658b3a.34.1781371240511; Sat, 13 Jun 2026 10:20:40 -0700 (PDT) Received: from hu-pranarya-hyd.qualcomm.com ([202.46.22.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434accbec5sm5390913b3a.16.2026.06.13.10.20.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 10:20:40 -0700 (PDT) From: Pranjal Arya Date: Sat, 13 Jun 2026 22:49:44 +0530 Subject: [PATCH RFC 02/12] mm/vmalloc: convert allocation-side gap finding and insertion to maple_tree Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260613-vmalloc_maple-v1-2-0aa740bb944b@oss.qualcomm.com> References: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> In-Reply-To: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> To: Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Alice Ryhl , Andrew Ballance Cc: linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, Lorenzo Stoakes , Pranjal Shrivastava , Will Deacon , Suzuki K Poulose , Neil Armstrong , Mostafa Saleh , Balbir Singh , Suren Baghdasaryan , Marco Elver , Dmitry Vyukov , Alexander Potapenko , Shuah Khan , Dev Jain , Brendan Jackman , Puranjay Mohan , Santosh Shukla , Wyes Karny , Pranjal Arya , Sudeep Holla X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1781371215; l=27873; i=pranjal.arya@oss.qualcomm.com; s=20260516; h=from:subject:message-id; bh=mzShuQjsivGBwOJFNo2gMC1ZvtaDXX3KGiVDadnIc+k=; b=mGVagTALUpXKoJTvjX1xzksAIiQDKkp7Jbp8u6A4xxsdoJVlQGaXoQ/FJc7kThKELh8l19Qem Ry8mot6zxFWB+nAQpn7K6G1SqyJXpe4OMFQfBFKs+DcisWStadqPjbH X-Developer-Key: i=pranjal.arya@oss.qualcomm.com; a=ed25519; pk=ymtcTlccEIDsi3ErhpjIoZZHKdPBYWGWW0Lchs5MsbE= X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX9n3Kzw01aAHI ednCnnuuVdTYRkVsxvs8hfT7OnECrlZ3TuOAiDIi1OYNGqOrme9F8ZCW/PM2AWUh/OhDqRdWF1o ZOT1cN3Npk8KvZoTeAP6jKHJWklVGa/fuTHsgAIc2fBvs6pfW+qkS/0EoKgjlxsGkwOmxuqXi0/ nVavjOH+7uS++1GyC7UYem/NNUKbPUkUwJj+sIhUjSqVWWCOmp2lSs5SZItL1luK8BshCcFXoUT 7+fes9saqAGHOBqkBZJ6uwPeWlYPUdGuhs09waxuzWHPoY9sJ6Egpss1YZ9mWCJD/a6utIKljG0 L1HcT2qB6pz8trhWGENxkKTQuknQqSHxI+hCFZjmtiy91O9rFoxn5KU2Lft3SK36QC7PCpOOZlp G2Adg+xR+Mjgx82ZUahmOOW37cpUowl4sGa9dsfPsxw5shqLB47qPswwattGPEiLqbAiwTjOinF ILgbUrWZVy6FE+uzThA== X-Proofpoint-Spam-Info: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfXy9nJiEec56/6 Pb4qeLG+QvBNw6sJAjpCBBHnRn7xNm5YvUpWzB7zSTXzCdXWWOMorGik18T05q49byqFTKrz6G0 SkPksfYsBAbXjVGMQAnKwXh6IrX7Qmc= X-Proofpoint-GUID: cuGCSxEEij8BJLqpybMlOdPbK7Nb3rZj X-Proofpoint-ORIG-GUID: cuGCSxEEij8BJLqpybMlOdPbK7Nb3rZj X-Authority-Analysis: v=2.4 cv=ULvt2ify c=1 sm=1 tr=0 ts=6a2d916a cx=c_pps a=rEQLjTOiSrHUhVqRoksmgQ==:117 a=fChuTYTh2wq5r3m49p7fHw==:17 a=IkcTkHD0fZMA:10 a=FelO9ux0wxsA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=eoimf2acIAo5FJnRuUoq:22 a=EUspDBNiAAAA:8 a=iIov-Uo6zy7MBde1-YQA:9 a=QEXdDO2ut3YA:10 a=2VI0MkxyNR6bbpdq8BZq:22 a=yDLO8QUhx1Xh9DD4DYOy:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-13_03,2026-06-12_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 clxscore=1011 phishscore=0 spamscore=0 priorityscore=1501 lowpriorityscore=0 bulkscore=0 malwarescore=0 adultscore=0 impostorscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606040000 definitions=main-2606130180 Switch __alloc_vmap_area, va_clip, classify_va_fit_type, and the insertion / merge helpers to drive the global free-area state through the maple_tree-backed helpers. The augmented rb_tree walk (find_vmap_lowest_match / find_va_links / augment_tree_propagate_*) becomes unreachable on the alloc path and is removed. The alloc path retains a list-based next-fit walk over free_vmap_area_list driven by free_vmap_alloc_hint, but its insertion-point lookup, neighbour validation and free-area indexing run through the maple_tree helpers (find_vmap_area_insert_point_mt, free_mt_store_va_locked, free_mt_update_va_locked). va_clip handles the LE/RE/NE fit types via the new free_mt_*_locked helpers. The pcpu and free paths still drive the rb_tree. Signed-off-by: Pranjal Arya --- mm/vmalloc.c | 653 +++++++++++++++++++++++++++++--------------------------= ---- 1 file changed, 323 insertions(+), 330 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 67f753d04c96..c5f509f033e6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1535,261 +1535,155 @@ find_vmap_area_exceed_addr_lock(unsigned long add= r, struct vmap_area **va) return NULL; } =20 -/* - * This function returns back addresses of parent node - * and its left or right link for further processing. - * - * Otherwise NULL is returned. In that case all further - * steps regarding inserting of conflicting overlap range - * have to be declined and actually considered as a bug. - */ -static __always_inline struct rb_node ** -find_va_links(struct vmap_area *va, - struct rb_root *root, struct rb_node *from, - struct rb_node **parent) -{ - struct vmap_area *tmp_va; - struct rb_node **link; - - if (root) { - link =3D &root->rb_node; - if (unlikely(!*link)) { - *parent =3D NULL; - return link; - } - } else { - link =3D &from; - } +static __always_inline void +insert_vmap_area_busy_locked(struct vmap_area *va, struct vmap_node *vn) +{ + int err; =20 - /* - * Go to the bottom of the tree. When we hit the last point - * we end up with parent rb_node and correct direction, i name - * it link, where the new va->rb_node will be attached to. - */ - do { - tmp_va =3D rb_entry(*link, struct vmap_area, rb_node); + lockdep_assert_held(&vn->busy.lock); =20 - /* - * During the traversal we also do some sanity check. - * Trigger the BUG() if there are sides(left/right) - * or full overlaps. - */ - if (va->va_end <=3D tmp_va->va_start) - link =3D &(*link)->rb_left; - else if (va->va_start >=3D tmp_va->va_end) - link =3D &(*link)->rb_right; - else { - WARN(1, "vmalloc bug: 0x%lx-0x%lx overlaps with 0x%lx-0x%lx\n", - va->va_start, va->va_end, tmp_va->va_start, tmp_va->va_end); + try_init_busy_mt_locked(vn); =20 - return NULL; - } - } while (*link); + if (likely(vn->busy.mt_enabled)) { + MA_STATE(mas, &vn->busy.mt, va->va_start, va->va_end - 1); =20 - *parent =3D &tmp_va->rb_node; - return link; -} + if (!insert_vmap_area_list_sorted_mt(va, &vn->busy.mt, + &vn->busy.head)) + return; =20 -static __always_inline struct list_head * -get_va_next_sibling(struct rb_node *parent, struct rb_node **link) -{ - struct list_head *list; + err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) + disable_busy_mt_locked(vn); =20 - if (unlikely(!parent)) - /* - * The red-black tree where we try to find VA neighbors - * before merging or inserting is empty, i.e. it means - * there is no free vmap space. Normally it does not - * happen but we handle this case anyway. - */ - return NULL; + return; + } =20 - list =3D &rb_entry(parent, struct vmap_area, rb_node)->list; - return (&parent->rb_right =3D=3D link ? list->next : list); + if (!insert_vmap_area_list_sorted(va, &vn->busy.head)) + return; } =20 static __always_inline void -__link_va(struct vmap_area *va, struct rb_root *root, - struct rb_node *parent, struct rb_node **link, - struct list_head *head, bool augment) +unlink_vmap_area_busy_locked(struct vmap_area *va, struct vmap_node *vn) { - /* - * VA is still not in the list, but we can - * identify its future previous list_head node. - */ - if (likely(parent)) { - head =3D &rb_entry(parent, struct vmap_area, rb_node)->list; - if (&parent->rb_right !=3D link) - head =3D head->prev; - } + int err; =20 - /* Insert to the rb-tree */ - rb_link_node(&va->rb_node, parent, link); - if (augment) { - /* - * Some explanation here. Just perform simple insertion - * to the tree. We do not set va->subtree_max_size to - * its current size before calling rb_insert_augmented(). - * It is because we populate the tree from the bottom - * to parent levels when the node _is_ in the tree. - * - * Therefore we set subtree_max_size to zero after insertion, - * to let __augment_tree_propagate_from() puts everything to - * the correct order later on. - */ - rb_insert_augmented(&va->rb_node, - root, &free_vmap_area_rb_augment_cb); - va->subtree_max_size =3D 0; - } else { - rb_insert_color(&va->rb_node, root); - } + lockdep_assert_held(&vn->busy.lock); =20 - /* Address-sort this list */ - list_add(&va->list, head); -} + MA_STATE(mas, &vn->busy.mt, va->va_start, va->va_end - 1); =20 -static __always_inline void -link_va(struct vmap_area *va, struct rb_root *root, - struct rb_node *parent, struct rb_node **link, - struct list_head *head) -{ - __link_va(va, root, parent, link, head, false); -} + list_del_init(&va->list); =20 -static __always_inline void -link_va_augment(struct vmap_area *va, struct rb_root *root, - struct rb_node *parent, struct rb_node **link, - struct list_head *head) -{ - __link_va(va, root, parent, link, head, true); -} + try_init_busy_mt_locked(vn); =20 -static __always_inline void -__unlink_va(struct vmap_area *va, struct rb_root *root, bool augment) -{ - if (WARN_ON(RB_EMPTY_NODE(&va->rb_node))) + if (unlikely(!vn->busy.mt_enabled)) return; =20 - if (augment) - rb_erase_augmented(&va->rb_node, - root, &free_vmap_area_rb_augment_cb); - else - rb_erase(&va->rb_node, root); - - list_del_init(&va->list); - RB_CLEAR_NODE(&va->rb_node); + err =3D mas_store_gfp(&mas, NULL, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) + disable_busy_mt_locked(vn); } =20 static __always_inline void -unlink_va(struct vmap_area *va, struct rb_root *root) +insert_vmap_area_lazy_locked(struct vmap_area *va, struct vmap_node *vn) { - __unlink_va(va, root, false); -} + int err; =20 -static __always_inline void -unlink_va_augment(struct vmap_area *va, struct rb_root *root) -{ - __unlink_va(va, root, true); + lockdep_assert_held(&vn->lazy.lock); + + try_init_lazy_mt_locked(vn); + + if (likely(vn->lazy.mt_enabled)) { + MA_STATE(mas, &vn->lazy.mt, va->va_start, va->va_end - 1); + + if (!insert_vmap_area_list_sorted_mt(va, &vn->lazy.mt, + &vn->lazy.head)) + return; + + err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) + disable_lazy_mt_locked(vn); + + return; + } + + if (!insert_vmap_area_list_sorted(va, &vn->lazy.head)) + return; } =20 -#if DEBUG_AUGMENT_PROPAGATE_CHECK -/* - * Gets called when remove the node and rotate. - */ -static __always_inline unsigned long -compute_subtree_max_size(struct vmap_area *va) +static __always_inline bool +lazy_vmap_areas_empty_locked(struct vmap_node *vn) { - return max3(va_size(va), - get_subtree_max_size(va->rb_node.rb_left), - get_subtree_max_size(va->rb_node.rb_right)); + lockdep_assert_held(&vn->lazy.lock); + + try_init_lazy_mt_locked(vn); + + if (likely(vn->lazy.mt_enabled)) + return mtree_empty(&vn->lazy.mt); + + return list_empty(&vn->lazy.head); } =20 -static void -augment_tree_propagate_check(void) +static __always_inline void +move_lazy_vmap_areas_to_purge_locked(struct vmap_node *vn) { struct vmap_area *va; - unsigned long computed_size; + int err; =20 - list_for_each_entry(va, &free_vmap_area_list, list) { - computed_size =3D compute_subtree_max_size(va); - if (computed_size !=3D va->subtree_max_size) - pr_emerg("tree is corrupted: %lu, %lu\n", - va_size(va), va->subtree_max_size); + lockdep_assert_held(&vn->lazy.lock); + + try_init_lazy_mt_locked(vn); + + if (likely(vn->lazy.mt_enabled)) { + list_for_each_entry(va, &vn->lazy.head, list) { + MA_STATE(mas, &vn->lazy.mt, va->va_start, va->va_end - 1); + + err =3D mas_store_gfp(&mas, NULL, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) { + disable_lazy_mt_locked(vn); + break; + } + } + + if (vn->lazy.mt_enabled && WARN_ON_ONCE(!mtree_empty(&vn->lazy.mt))) + disable_lazy_mt_locked(vn); } + + list_replace_init(&vn->lazy.head, &vn->purge_list); } -#endif =20 -/* - * This function populates subtree_max_size from bottom to upper - * levels starting from VA point. The propagation must be done - * when VA size is modified by changing its va_start/va_end. Or - * in case of newly inserting of VA to the tree. - * - * It means that __augment_tree_propagate_from() must be called: - * - After VA has been inserted to the tree(free path); - * - After VA has been shrunk(allocation path); - * - After VA has been increased(merging path). - * - * Please note that, it does not mean that upper parent nodes - * and their subtree_max_size are recalculated all the time up - * to the root node. - * - * 4--8 - * /\ - * / \ - * / \ - * 2--2 8--8 - * - * For example if we modify the node 4, shrinking it to 2, then - * no any modification is required. If we shrink the node 2 to 1 - * its subtree_max_size is updated only, and set to 1. If we shrink - * the node 8 to 6, then its subtree_max_size is set to 6 and parent - * node becomes 4--6. - */ -static __always_inline void -augment_tree_propagate_from(struct vmap_area *va) +static __always_inline bool +insert_vmap_area_free_locked(struct vmap_area *va) { - /* - * Populate the tree from bottom towards the root until - * the calculated maximum available size of checked node - * is equal to its current one. - */ - free_vmap_area_rb_augment_cb_propagate(&va->rb_node, NULL); + lockdep_assert_held(&free_vmap_area_lock); =20 -#if DEBUG_AUGMENT_PROPAGATE_CHECK - augment_tree_propagate_check(); -#endif -} + try_init_free_mt_locked(); =20 -static void -insert_vmap_area(struct vmap_area *va, - struct rb_root *root, struct list_head *head) -{ - struct rb_node **link; - struct rb_node *parent; + if (likely(free_mt_supported())) { + if (!insert_vmap_area_list_sorted_mt(va, &free_vmap_area_mt, + &free_vmap_area_list)) + return false; =20 - link =3D find_va_links(va, root, NULL, &parent); - if (link) - link_va(va, root, parent, link, head); + free_mt_store_va_locked(va); + } else { + if (!insert_vmap_area_list_sorted(va, &free_vmap_area_list)) + return false; + } + + return true; } =20 -static void -insert_vmap_area_augment(struct vmap_area *va, - struct rb_node *from, struct rb_root *root, - struct list_head *head) +static __always_inline void +unlink_vmap_area_free_locked(struct vmap_area *va) { - struct rb_node **link; - struct rb_node *parent; + lockdep_assert_held(&free_vmap_area_lock); =20 - if (from) - link =3D find_va_links(va, NULL, from, &parent); - else - link =3D find_va_links(va, root, NULL, &parent); + if (WARN_ON_ONCE(list_empty(&va->list))) + return; =20 - if (link) { - link_va_augment(va, root, parent, link, head); - augment_tree_propagate_from(va); - } + if (likely(free_mt_supported())) + free_mt_erase_va_locked(va); + + list_del_init(&va->list); } =20 /* @@ -1804,29 +1698,20 @@ insert_vmap_area_augment(struct vmap_area *va, * ongoing. */ static __always_inline struct vmap_area * -__merge_or_add_vmap_area(struct vmap_area *va, - struct rb_root *root, struct list_head *head, bool augment) +__merge_or_add_vmap_area(struct vmap_area *va, struct list_head *head, boo= l update_mt) { struct vmap_area *sibling; struct list_head *next; - struct rb_node **link; - struct rb_node *parent; + unsigned long old_start, old_end; bool merged =3D false; =20 - /* - * Find a place in the tree where VA potentially will be - * inserted, unless it is merged with its sibling/siblings. - */ - link =3D find_va_links(va, root, NULL, &parent); - if (!link) - return NULL; + if (update_mt && free_mt_supported()) + next =3D find_vmap_area_insert_point_mt(va, &free_vmap_area_mt, head); + else + next =3D find_vmap_area_insert_point_list(va, head); =20 - /* - * Get next node of VA to check if merging can be done. - */ - next =3D get_va_next_sibling(parent, link); - if (unlikely(next =3D=3D NULL)) - goto insert; + if (!next) + return NULL; =20 /* * start end @@ -1838,7 +1723,11 @@ __merge_or_add_vmap_area(struct vmap_area *va, if (next !=3D head) { sibling =3D list_entry(next, struct vmap_area, list); if (sibling->va_start =3D=3D va->va_end) { + old_start =3D sibling->va_start; + old_end =3D sibling->va_end; sibling->va_start =3D va->va_start; + if (update_mt && free_mt_supported()) + free_mt_update_va_locked(sibling, old_start, old_end); =20 /* Free vmap_area object. */ kmem_cache_free(vmap_area_cachep, va); @@ -1862,14 +1751,20 @@ __merge_or_add_vmap_area(struct vmap_area *va, /* * If both neighbors are coalesced, it is important * to unlink the "next" node first, followed by merging - * with "previous" one. Otherwise the tree might not be - * fully populated if a sibling's augmented value is - * "normalized" because of rotation operations. + * with "previous" one. */ - if (merged) - __unlink_va(va, root, augment); + if (merged) { + if (update_mt) + unlink_vmap_area_free_locked(va); + else + list_del_init(&va->list); + } =20 + old_start =3D sibling->va_start; + old_end =3D sibling->va_end; sibling->va_end =3D va->va_end; + if (update_mt && free_mt_supported()) + free_mt_update_va_locked(sibling, old_start, old_end); =20 /* Free vmap_area object. */ kmem_cache_free(vmap_area_cachep, va); @@ -1880,31 +1775,97 @@ __merge_or_add_vmap_area(struct vmap_area *va, } } =20 -insert: - if (!merged) - __link_va(va, root, parent, link, head, augment); + if (!merged) { + if (update_mt) + insert_vmap_area_free_locked(va); + else + list_add_tail(&va->list, next); + } =20 return va; } =20 static __always_inline struct vmap_area * merge_or_add_vmap_area(struct vmap_area *va, - struct rb_root *root, struct list_head *head) + struct list_head *head) { - return __merge_or_add_vmap_area(va, root, head, false); + return __merge_or_add_vmap_area(va, head, false); } =20 static __always_inline struct vmap_area * -merge_or_add_vmap_area_augment(struct vmap_area *va, - struct rb_root *root, struct list_head *head) +merge_or_add_vmap_area_free_locked(struct vmap_area *va) { - va =3D __merge_or_add_vmap_area(va, root, head, true); - if (va) - augment_tree_propagate_from(va); + lockdep_assert_held(&free_vmap_area_lock); + + va =3D __merge_or_add_vmap_area(va, &free_vmap_area_list, true); + if (va && va->va_start < free_vmap_alloc_hint) + free_vmap_alloc_hint =3D va->va_start; =20 return va; } =20 +/* + * Transitional wrappers retained until all legacy rb call sites are switc= hed. + * Follow-up patches in this series remove these wrappers. + */ +static __always_inline void +insert_vmap_area(struct vmap_area *va, struct rb_root *root, + struct list_head *head) +{ + struct vmap_node *vn =3D addr_to_node(va->va_start); + + if (head =3D=3D &free_vmap_area_list) { + insert_vmap_area_free_locked(va); + return; + } + + if (head =3D=3D &vn->lazy.head) { + insert_vmap_area_lazy_locked(va, vn); + return; + } + + insert_vmap_area_busy_locked(va, vn); +} + +static __always_inline void +insert_vmap_area_augment(struct vmap_area *va, struct rb_node *from, + struct rb_root *root, struct list_head *head) +{ + insert_vmap_area(va, root, head); +} + +static __always_inline void unlink_va(struct vmap_area *va, struct rb_root= *root) +{ + struct vmap_node *vn =3D addr_to_node(va->va_start); + + if (root =3D=3D &free_vmap_area_root) { + unlink_vmap_area_free_locked(va); + return; + } + + unlink_vmap_area_busy_locked(va, vn); +} + +static __always_inline void +unlink_va_augment(struct vmap_area *va, struct rb_root *root) +{ + unlink_va(va, root); +} + +static __always_inline void augment_tree_propagate_from(struct vmap_area *= va) +{ +} + +static __always_inline struct vmap_area * +merge_or_add_vmap_area_augment(struct vmap_area *va, struct rb_root *root, + struct list_head *head) +{ + if (head =3D=3D &free_vmap_area_list) + return merge_or_add_vmap_area_free_locked(va); + + return merge_or_add_vmap_area(va, head); +} + static __always_inline bool is_within_this_va(struct vmap_area *va, unsigned long size, unsigned long align, unsigned long vstart) @@ -1924,74 +1885,103 @@ is_within_this_va(struct vmap_area *va, unsigned l= ong size, return (nva_start_addr + size <=3D va->va_end); } =20 -/* - * Find the first free block(lowest start address) in the tree, - * that will accomplish the request corresponding to passing - * parameters. Please note, with an alignment bigger than PAGE_SIZE, - * a search length is adjusted to account for worst case alignment - * overhead. - */ static __always_inline struct vmap_area * -find_vmap_lowest_match(struct rb_root *root, unsigned long size, - unsigned long align, unsigned long vstart, bool adjust_search_size) +find_vmap_lowest_match_list(struct list_head *head, unsigned long size, + unsigned long align, unsigned long vstart) { struct vmap_area *va; - struct rb_node *node; - unsigned long length; =20 - /* Start from the root. */ - node =3D root->rb_node; + list_for_each_entry(va, head, list) { + if (!is_within_this_va(va, size, align, vstart)) + continue; =20 - /* Adjust the search size for alignment overhead. */ - length =3D adjust_search_size ? size + align - 1 : size; + return va; + } =20 - while (node) { - va =3D rb_entry(node, struct vmap_area, rb_node); + return NULL; +} =20 - if (get_subtree_max_size(node->rb_left) >=3D length && - vstart < va->va_start) { - node =3D node->rb_left; - } else { - if (is_within_this_va(va, size, align, vstart)) - return va; +static __always_inline unsigned long +clamp_vmap_alloc_hint(unsigned long hint, unsigned long vstart, + unsigned long vend) +{ + if (hint < vstart || hint >=3D vend) + return vstart; =20 - /* - * Does not make sense to go deeper towards the right - * sub-tree if it does not have a free block that is - * equal or bigger to the requested search length. - */ - if (get_subtree_max_size(node->rb_right) >=3D length) { - node =3D node->rb_right; - continue; - } + return hint; +} =20 - /* - * OK. We roll back and find the first right sub-tree, - * that will satisfy the search criteria. It can happen - * due to "vstart" restriction or an alignment overhead - * that is bigger then PAGE_SIZE. - */ - while ((node =3D rb_parent(node))) { - va =3D rb_entry(node, struct vmap_area, rb_node); - if (is_within_this_va(va, size, align, vstart)) +/* + * Next-fit scan with wrap-around. Use maple to jump to the first candidate + * around the hint in O(log n), then continue on the ordered list for cheap + * neighbour traversal and deterministic coalescing behaviour. + */ +static __always_inline struct vmap_area * +find_vmap_match_list_next_fit(struct list_head *head, unsigned long size, + unsigned long align, unsigned long vstart, + unsigned long vend) +{ + struct vmap_area *va, *start =3D NULL; + unsigned long hint; + bool wrapped; + + hint =3D clamp_vmap_alloc_hint(free_vmap_alloc_hint, vstart, vend); + + if (hint !=3D vstart) { + if (free_mt_supported()) + start =3D __find_vmap_area_exceed_addr_mt(hint, + &free_vmap_area_mt); + + if (start) { + va =3D start; + list_for_each_entry_from(va, head, list) { + if (is_within_this_va(va, size, align, hint)) return va; + } + } else { + list_for_each_entry(va, head, list) { + if (va->va_end <=3D hint) + continue; =20 - if (get_subtree_max_size(node->rb_right) >=3D length && - vstart <=3D va->va_start) { - /* - * Shift the vstart forward. Please note, we update it with - * parent's start address adding "1" because we do not want - * to enter same sub-tree after it has already been checked - * and no suitable free block found there. - */ - vstart =3D va->va_start + 1; - node =3D node->rb_right; - break; - } + if (is_within_this_va(va, size, align, hint)) + return va; } } } =20 + wrapped =3D (hint !=3D vstart); + list_for_each_entry(va, head, list) { + if (wrapped) { + if (start && va =3D=3D start) + break; + if (!start && va->va_start >=3D hint) + break; + } + + if (is_within_this_va(va, size, align, vstart)) + return va; + } + + return NULL; +} + +static __always_inline struct vmap_area * +find_vmap_lowest_match_mt(struct maple_tree *tree, unsigned long size, + unsigned long align, unsigned long vstart) +{ + MA_STATE(mas, tree, vstart, vstart); + struct vmap_area *va; + + mas_set(&mas, vstart); + va =3D mas_find(&mas, ULONG_MAX); + + while (va) { + if (is_within_this_va(va, size, align, vstart)) + return va; + + va =3D mas_next(&mas, ULONG_MAX); + } + return NULL; } =20 @@ -2015,8 +2005,8 @@ find_vmap_lowest_linear_match(struct list_head *head,= unsigned long size, } =20 static void -find_vmap_lowest_match_check(struct rb_root *root, struct list_head *head, - unsigned long size, unsigned long align) +find_vmap_lowest_match_check(struct list_head *head, unsigned long size, + unsigned long align) { struct vmap_area *va_1, *va_2; unsigned long vstart; @@ -2025,7 +2015,10 @@ find_vmap_lowest_match_check(struct rb_root *root, s= truct list_head *head, get_random_bytes(&rnd, sizeof(rnd)); vstart =3D VMALLOC_START + rnd; =20 - va_1 =3D find_vmap_lowest_match(root, size, align, vstart, false); + if (free_mt_supported()) + va_1 =3D find_vmap_lowest_match_mt(&free_vmap_area_mt, size, align, vsta= rt); + else + va_1 =3D find_vmap_lowest_linear_match(head, size, align, vstart); va_2 =3D find_vmap_lowest_linear_match(head, size, align, vstart); =20 if (va_1 !=3D va_2) @@ -2069,11 +2062,12 @@ classify_va_fit_type(struct vmap_area *va, } =20 static __always_inline int -va_clip(struct rb_root *root, struct list_head *head, - struct vmap_area *va, unsigned long nva_start_addr, - unsigned long size) +va_clip(struct vmap_area *va, unsigned long nva_start_addr, + unsigned long size) { struct vmap_area *lva =3D NULL; + unsigned long old_start =3D va->va_start; + unsigned long old_end =3D va->va_end; enum fit_type type =3D classify_va_fit_type(va, nva_start_addr, size); =20 if (type =3D=3D FL_FIT_TYPE) { @@ -2084,7 +2078,7 @@ va_clip(struct rb_root *root, struct list_head *head, * V NVA V * |---------------| */ - unlink_va_augment(va, root); + unlink_vmap_area_free_locked(va); kmem_cache_free(vmap_area_cachep, va); } else if (type =3D=3D LE_FIT_TYPE) { /* @@ -2159,10 +2153,11 @@ va_clip(struct rb_root *root, struct list_head *hea= d, } =20 if (type !=3D FL_FIT_TYPE) { - augment_tree_propagate_from(va); + if (free_mt_supported()) + free_mt_update_va_locked(va, old_start, old_end); =20 if (lva) /* type =3D=3D NE_FIT_TYPE */ - insert_vmap_area_augment(lva, &va->rb_node, root, head); + insert_vmap_area_free_locked(lva); } =20 return 0; @@ -2170,7 +2165,6 @@ va_clip(struct rb_root *root, struct list_head *head, =20 static unsigned long va_alloc(struct vmap_area *va, - struct rb_root *root, struct list_head *head, unsigned long size, unsigned long align, unsigned long vstart, unsigned long vend) { @@ -2187,7 +2181,7 @@ va_alloc(struct vmap_area *va, return -ERANGE; =20 /* Update the free vmap_area. */ - ret =3D va_clip(root, head, va, nva_start_addr, size); + ret =3D va_clip(va, nva_start_addr, size); if (WARN_ON_ONCE(ret)) return ret; =20 @@ -2199,35 +2193,37 @@ va_alloc(struct vmap_area *va, * Otherwise an error value is returned that indicates failure. */ static __always_inline unsigned long -__alloc_vmap_area(struct rb_root *root, struct list_head *head, - unsigned long size, unsigned long align, - unsigned long vstart, unsigned long vend) +__alloc_vmap_area(unsigned long size, unsigned long align, + unsigned long vstart, unsigned long vend) { - bool adjust_search_size =3D true; unsigned long nva_start_addr; struct vmap_area *va; =20 + lockdep_assert_held(&free_vmap_area_lock); + /* - * Do not adjust when: - * a) align <=3D PAGE_SIZE, because it does not make any sense. - * All blocks(their start addresses) are at least PAGE_SIZE - * aligned anyway; - * b) a short range where a requested size corresponds to exactly - * specified [vstart:vend] interval and an alignment > PAGE_SIZE. - * With adjusted search length an allocation would not succeed. + * Next-fit with wrap-around lowers repeated list-head scans in + * high-churn workloads. */ - if (align <=3D PAGE_SIZE || (align > PAGE_SIZE && (vend - vstart) =3D=3D = size)) - adjust_search_size =3D false; + va =3D find_vmap_match_list_next_fit(&free_vmap_area_list, size, align, + vstart, vend); =20 - va =3D find_vmap_lowest_match(root, size, align, vstart, adjust_search_si= ze); if (unlikely(!va)) return -ENOENT; =20 - nva_start_addr =3D va_alloc(va, root, head, size, align, vstart, vend); + nva_start_addr =3D va_alloc(va, size, align, vstart, vend); + if (!IS_ERR_VALUE(nva_start_addr)) { + unsigned long next_hint; + + if (check_add_overflow(nva_start_addr, size, &next_hint)) + free_vmap_alloc_hint =3D vstart; + else + free_vmap_alloc_hint =3D next_hint; + } =20 #if DEBUG_AUGMENT_LOWEST_MATCH_CHECK if (!IS_ERR_VALUE(nva_start_addr)) - find_vmap_lowest_match_check(root, head, size, align); + find_vmap_lowest_match_check(&free_vmap_area_list, size, align); #endif =20 return nva_start_addr; @@ -2441,8 +2437,7 @@ static struct vmap_area *alloc_vmap_area(unsigned lon= g size, retry: if (IS_ERR_VALUE(addr)) { preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, node); - addr =3D __alloc_vmap_area(&free_vmap_area_root, &free_vmap_area_list, - size, align, vstart, vend); + addr =3D __alloc_vmap_area(size, align, vstart, vend); spin_unlock(&free_vmap_area_lock); =20 /* @@ -2589,7 +2584,6 @@ static void decay_va_pool_node(struct vmap_node *vn, bool full_decay) { LIST_HEAD(decay_list); - struct rb_root decay_root =3D RB_ROOT; struct vmap_area *va, *nva; unsigned long n_decay, pool_len; int i; @@ -2618,7 +2612,7 @@ decay_va_pool_node(struct vmap_node *vn, bool full_de= cay) break; =20 list_del_init(&va->list); - merge_or_add_vmap_area(va, &decay_root, &decay_list); + merge_or_add_vmap_area(va, &decay_list); } =20 /* @@ -5456,8 +5450,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned l= ong *offsets, /* It is a BUG(), but trigger recovery instead. */ goto recovery; =20 - ret =3D va_clip(&free_vmap_area_root, - &free_vmap_area_list, va, start, size); + ret =3D va_clip(va, start, size); if (WARN_ON_ONCE(unlikely(ret))) /* It is a BUG(), but trigger recovery instead. */ goto recovery; --=20 2.34.1 From nobody Sat Jun 13 23:22:38 2026 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BA3C31F98D for ; Sat, 13 Jun 2026 17:20:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.168.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371253; cv=none; b=Yz7VYmcOwSKs4emV3UmDmF7hMnXnKjnKf6ZCDNBHigw46K+W4VRnrX17opETZCkCyg5iEZPwIazEDqYhf1TkScrhU5jgUv7t3hCyfC/mpP+wz3IQnOszwdk5Z5Du1b+iV2c60B5QtA+1pbBnFyb/zjGGvNd6hdlb8PoBh+MgzX4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371253; c=relaxed/simple; bh=d7Rh9moU4PlywobCCQlby0ESibig6WBPrZyoRaflKyI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Kh5QNq21iq0NI2hZeCk0rvjWSbt00IRCvF/ndxQQiTgCdIBUMeyz4UCCaofV+WAuADvCDa8M1HvWmn4LTZtCFkIaZYLi22tRxB9iEFpufqBLg9mUr64gG7B+3yjvHsH90L+kTHsi4Pmu6MT6z2CZMLEQV2syHhSvgpPzCXwOOwk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com; spf=pass smtp.mailfrom=oss.qualcomm.com; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b=NuwBzWrd; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b=Z0fqC8qZ; arc=none smtp.client-ip=205.220.168.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b="NuwBzWrd"; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="Z0fqC8qZ" Received: from pps.filterd (m0279865.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65DFJlWE1257480 for ; Sat, 13 Jun 2026 17:20:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= uCURNgSWRUFOFyKhAq/7V/tz5RhnVVGkvQX7hSUp9RM=; b=NuwBzWrduzG3GLBq c8tAxU2Tazzqv1lB3qZ0HY3ukFxGsv/eZzNdJ18GIUa3G+vt07gawMkvTjrvhRqQ 81HsEUN7s/czX+/NmBrXJA1qEmOFpZek4W+JEXbnJa/2pPaxm3AM0Wi8zD2Bl8bl CmOAiW2vKS4On9F8Gz8TrR7M0If9GJNpl9X4FuW0tNQoHDc6MBcdosyKh6YIXP4l QuDUSwsk6oI8um57xKx1uOnkkB5o4mz82/NlhR99bqPhrm8N/3DlA2P/DRDQqws/ AvsXoC/t62X6vROQe32UD0MM8irOYiSdX2IZTLtF+KwRl84Wal9wr+sarGMZgtCk Vee3Jw== Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4eryffhksm-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Sat, 13 Jun 2026 17:20:50 +0000 (GMT) Received: by mail-pf1-f197.google.com with SMTP id d2e1a72fcca58-8423f3e4728so1361952b3a.2 for ; Sat, 13 Jun 2026 10:20:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1781371250; x=1781976050; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=uCURNgSWRUFOFyKhAq/7V/tz5RhnVVGkvQX7hSUp9RM=; b=Z0fqC8qZPw7LzK121KVJtagfXBQdiVR5zATqT4LDcboM0jCcFUrokPbD4+AsUtcozB ZgKZTvFkHs06RkbNiMT/fsqhU3PKwnL7njr8MmMA0tZTvTMlXmRgMJ++CsUvm9WUMU2A tNJ+JbS1/z+1AU3SsvQ9KnCYN2zt8kQq1JTCVJ5JIrfz8vIR6FpbgY2cTAusCdLIwGm6 3N6fV5AyDVPIDZsoYq8weIAvS6S5oruazuG1hGzGslNaCFOeSlcVOICc1rKAViQP8ng/ oyLs5tR3kCkTbFNLAC0+WYdh4m9ESiXh63N8eV8uQrHik/ncCeYsc2zk4kiL4LADwEWU 3bWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781371250; x=1781976050; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=uCURNgSWRUFOFyKhAq/7V/tz5RhnVVGkvQX7hSUp9RM=; b=a/nM6kivQz3Nk9iL9pXcd0ErXrHPFHppJhaEBj9XlGblMuC79Zv4J/IYK99avMrRTC wGH6LFSxr9601rXam+ubLFnhXKW7HHCGs5r073io57czm90FOkCvFM7VhpCge0YG0pQE M31xNvYmLvAgQ1BVgTi3v8NwxjzFqlbEIOqxSUUsrlT1GpkY56A7evM2x/XfJV0LYTG9 11cDEXWqF1caSYYzmBzP0PqGXnK24rTCSilmEYpytdQHgxl3KEwVlMOTN+wc1/DgPFIm v8IgDHKQujIM15Bv++8KYUllYdxwtXcE10T6fGTMk0LAzu/i3TfSEOplcKtiDgoxb0td JLmA== X-Forwarded-Encrypted: i=1; AFNElJ/0Dyt5B16vzYQ+tyUlmxgCuiY0U438Cy7kAhiqlVKIBq71fKzuotE3rQX1zxjOOZJPS6N5cp4bFxanO3Q=@vger.kernel.org X-Gm-Message-State: AOJu0Yyla0/jOJ1CSacmXybu1oLDd+0BbX1HN3KoI97M5O4qptmtji6r ePnV5Gr/4CJy6FpZNaw92mqjVK2Nc7uMu3u/69R8cfl7A3FUYDnCVWmV0NRMQ1qJGokZoHx5Kyt I11Nozbg8vpxDcs8/K+XU3bwCPrgjvfvgYLm3AdAR7sJSglYSQ2JOjFW3Ko3lBLgSsMw= X-Gm-Gg: Acq92OEchzVqEYzmFP6fp2zA3Z5nxN+WBUZXkESnkCYE1jJgnfS7K0JkFpqbXLV8Jqx +y99cxCatliKGCYvouAIi0c7BirKy5YHxZcEOf/1jjK2ZRf/cdr+0/fR5lUkJVfPweGLYtu0AAs YWckzHEKzWnY2CnL4dXWFpdAhK+VmQYkMVDdxmOdi+kBwCdDZWr3hpOC83EIpyye1fyIg08wsKi 7zx5zdb+FL0VO9vIr214s61caBYFFitdIMDeH1aL2auGAN9fiWxc1IjW4aK+z55jIUqrZdcMXOF ABPGD4oaUE/Y+2dzPE3F84SW6SR9oM5FEQaQ1uIgybpi6KcK7DkYtUwAID3CUbK6c4Xg/9VRtWI vVgujqqseiTZrk+DOFNC0E3oEWGrB8mDUZXHyb6qm2lteLKFZ3/FJtQ== X-Received: by 2002:a05:6a00:ad04:b0:842:6594:de with SMTP id d2e1a72fcca58-8434cd4558cmr8150274b3a.15.1781371249355; Sat, 13 Jun 2026 10:20:49 -0700 (PDT) X-Received: by 2002:a05:6a00:ad04:b0:842:6594:de with SMTP id d2e1a72fcca58-8434cd4558cmr8150235b3a.15.1781371248892; Sat, 13 Jun 2026 10:20:48 -0700 (PDT) Received: from hu-pranarya-hyd.qualcomm.com ([202.46.22.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434accbec5sm5390913b3a.16.2026.06.13.10.20.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 10:20:48 -0700 (PDT) From: Pranjal Arya Date: Sat, 13 Jun 2026 22:49:45 +0530 Subject: [PATCH RFC 03/12] mm/vmalloc: convert free, purge, and pcpu paths to maple_tree Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260613-vmalloc_maple-v1-3-0aa740bb944b@oss.qualcomm.com> References: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> In-Reply-To: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> To: Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Alice Ryhl , Andrew Ballance Cc: linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, Lorenzo Stoakes , Pranjal Shrivastava , Will Deacon , Suzuki K Poulose , Neil Armstrong , Mostafa Saleh , Balbir Singh , Suren Baghdasaryan , Marco Elver , Dmitry Vyukov , Alexander Potapenko , Shuah Khan , Dev Jain , Brendan Jackman , Puranjay Mohan , Santosh Shukla , Wyes Karny , Pranjal Arya , Sudeep Holla X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1781371215; l=8254; i=pranjal.arya@oss.qualcomm.com; s=20260516; h=from:subject:message-id; bh=d7Rh9moU4PlywobCCQlby0ESibig6WBPrZyoRaflKyI=; b=i4ZVjEUEpATUFw1AesbsDlvUbQAz4biJWSgx2WlatBpp99/Fh4zL92ovokQ6YvX3lYIEirA4W qeSCLC8FcQJCFfFoCZXnrUVBMNf37s3X1mpNxtFxc/JVLzamPLd3Rr/ X-Developer-Key: i=pranjal.arya@oss.qualcomm.com; a=ed25519; pk=ymtcTlccEIDsi3ErhpjIoZZHKdPBYWGWW0Lchs5MsbE= X-Authority-Analysis: v=2.4 cv=HuxG3UTS c=1 sm=1 tr=0 ts=6a2d9172 cx=c_pps a=rEQLjTOiSrHUhVqRoksmgQ==:117 a=fChuTYTh2wq5r3m49p7fHw==:17 a=IkcTkHD0fZMA:10 a=FelO9ux0wxsA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=Um2Pa8k9VHT-vaBCBUpS:22 a=EUspDBNiAAAA:8 a=r6JOZ6DJY9FYdPoncJEA:9 a=QEXdDO2ut3YA:10 a=2VI0MkxyNR6bbpdq8BZq:22 X-Proofpoint-ORIG-GUID: 5AkQwnS9eocJ9pxRO1CiPOqwGZliZS5t X-Proofpoint-Spam-Info: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfXyEdku/RbeDmy M0tdZ+GlAfcGOIVtNHvFw2zNCZb/9ZozWfvoXcaYUR6fLVNIWVHpMaCPfPkU0vEvfcs2r1Ry2ip nIREO+gnHWrQLTO92LsFayFimXDoIro= X-Proofpoint-GUID: 5AkQwnS9eocJ9pxRO1CiPOqwGZliZS5t X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX8ZcK8Hu4lX6p OXwJ0wHVXQXW9GvOwD9lTOKzaCFDkKMy8YzJBYr9CeIAOzIBQjEgdS6COT8WYJe5f3rVw9qrWpE EYWDU0krICF3rlK3VJAXu8iW/RFJCd/WllMu+sdCvcVec91gpnihoeHC/RxBX2WfEOjG8JQZPqv pCkl+gB6qbgrx6hXkX1Hil+6E7m0EvIy4YZEEN+E8KHZpiOprjDyl04neDCn5iXTqJoTseIOyVw KWswFl7bYzXA6WECHooBGSkZDnltCmqpce20Y7eanOUTGOvp8yK5QexLkm2DODwjMlzA00Bh9yR GA4MfUPut8mhkM65RI65BqF2dS7Z8MNOEcUM6ifegBL5ZgVov3icBTvJitkOOnyrPXvqTnku1Bg WEAcLHCO1A/BN6Ak/ThN6qsbv4kx0DZRt8t+ysZtA+vIAL3Ew3+n9ORrZPVNLWa9ZIvCKbV9+Rl /0TmQt1tqXN4rzw3BUw== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-13_03,2026-06-12_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 bulkscore=0 adultscore=0 impostorscore=0 priorityscore=1501 phishscore=0 clxscore=1011 spamscore=0 suspectscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606040000 definitions=main-2606130180 Move free_vmap_area_noflush, the lazy-purge processing (__purge_vmap_area_lazy, purge_vmap_node, decay_va_pool_node, kasan_release_vmalloc_node), and pcpu_get_vm_areas onto the maple_tree helpers. Per-node busy lookup (find_vmap_area, find_unlink_vmap_area, find_vmap_area_exceed_addr_lock) and the vmap_block free path likewise shift to vn->busy.mt. pcpu_get_vm_areas keeps its top-down free-area walk; the new free_vmap_area_prev() helper returns the previous free-area neighbour via the address-sorted list, while enclose-address queries (pvm_find_va_enclose_addr) move onto the maple_tree where supported. After this patch, the augmented rb_tree is no longer consulted on the allocation or free path. The address-sorted free_vmap_area_list is still walked on the alloc path's list-based next-fit scan and on neighbour traversal in pcpu_get_vm_areas. Signed-off-by: Pranjal Arya --- mm/vmalloc.c | 78 +++++++++++++++++++++++++++++---------------------------= ---- 1 file changed, 38 insertions(+), 40 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index c5f509f033e6..f2117eafa9cf 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2240,14 +2240,14 @@ static void free_vmap_area(struct vmap_area *va) * Remove from the busy tree/list. */ spin_lock(&vn->busy.lock); - unlink_va(va, &vn->busy.root); + unlink_vmap_area_busy_locked(va, vn); spin_unlock(&vn->busy.lock); =20 /* * Insert/Merge it back to the free tree/list. */ spin_lock(&free_vmap_area_lock); - merge_or_add_vmap_area_augment(va, &free_vmap_area_root, &free_vmap_area_= list); + merge_or_add_vmap_area_free_locked(va); spin_unlock(&free_vmap_area_lock); } =20 @@ -2431,12 +2431,13 @@ static struct vmap_area *alloc_vmap_area(unsigned l= ong size, * Only scan the relevant parts containing pointers to other objects * to avoid false negatives. */ - kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask); + kmemleak_scan_area(&va->vm, SIZE_MAX, gfp_mask); } =20 retry: if (IS_ERR_VALUE(addr)) { preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, node); + try_init_free_mt_locked(); addr =3D __alloc_vmap_area(size, align, vstart, vend); spin_unlock(&free_vmap_area_lock); =20 @@ -2479,7 +2480,7 @@ static struct vmap_area *alloc_vmap_area(unsigned lon= g size, vn =3D addr_to_node(va->va_start); =20 spin_lock(&vn->busy.lock); - insert_vmap_area(va, &vn->busy.root, &vn->busy.head); + insert_vmap_area_busy_locked(va, vn); spin_unlock(&vn->busy.lock); =20 BUG_ON(!IS_ALIGNED(va->va_start, align)); @@ -2575,8 +2576,7 @@ reclaim_list_global(struct list_head *head) =20 spin_lock(&free_vmap_area_lock); list_for_each_entry_safe(va, n, head, list) - merge_or_add_vmap_area_augment(va, - &free_vmap_area_root, &free_vmap_area_list); + merge_or_add_vmap_area_free_locked(va); spin_unlock(&free_vmap_area_lock); } =20 @@ -2719,12 +2719,13 @@ static bool __purge_vmap_area_lazy(unsigned long st= art, unsigned long end, vn->skip_populate =3D full_pool_decay; decay_va_pool_node(vn, full_pool_decay); =20 - if (RB_EMPTY_ROOT(&vn->lazy.root)) + spin_lock(&vn->lazy.lock); + if (lazy_vmap_areas_empty_locked(vn)) { + spin_unlock(&vn->lazy.lock); continue; + } =20 - spin_lock(&vn->lazy.lock); - WRITE_ONCE(vn->lazy.root.rb_node, NULL); - list_replace_init(&vn->lazy.head, &vn->purge_list); + move_lazy_vmap_areas_to_purge_locked(vn); spin_unlock(&vn->lazy.lock); =20 start =3D min(start, list_first_entry(&vn->purge_list, @@ -2823,7 +2824,7 @@ static void free_vmap_area_noflush(struct vmap_area *= va) id_to_node(vn_id):addr_to_node(va->va_start); =20 spin_lock(&vn->lazy.lock); - insert_vmap_area(va, &vn->lazy.root, &vn->lazy.head); + insert_vmap_area_lazy_locked(va, vn); spin_unlock(&vn->lazy.lock); =20 trace_free_vmap_area_noflush(va_start, nr_lazy, nr_lazy_max); @@ -2873,7 +2874,7 @@ struct vmap_area *find_vmap_area(unsigned long addr) vn =3D &vmap_nodes[i]; =20 spin_lock(&vn->busy.lock); - va =3D __find_vmap_area(addr, &vn->busy.root); + va =3D find_vmap_area_busy_locked(addr, vn); spin_unlock(&vn->busy.lock); =20 if (va) @@ -2897,9 +2898,9 @@ static struct vmap_area *find_unlink_vmap_area(unsign= ed long addr) vn =3D &vmap_nodes[i]; =20 spin_lock(&vn->busy.lock); - va =3D __find_vmap_area(addr, &vn->busy.root); + va =3D find_vmap_area_busy_locked(addr, vn); if (va) - unlink_va(va, &vn->busy.root); + unlink_vmap_area_busy_locked(va, vn); spin_unlock(&vn->busy.lock); =20 if (va) @@ -2955,8 +2956,8 @@ struct vmap_block_queue { =20 /* * An xarray requires an extra memory dynamically to - * be allocated. If it is an issue, we can use rb-tree - * instead. + * be allocated. If it is an issue, switch to another + * indexing data structure. */ struct xarray vmap_blocks; }; @@ -3133,7 +3134,7 @@ static void free_vmap_block(struct vmap_block *vb) =20 vn =3D addr_to_node(vb->va->va_start); spin_lock(&vn->busy.lock); - unlink_va(vb->va, &vn->busy.root); + unlink_vmap_area_busy_locked(vb->va, vn); spin_unlock(&vn->busy.lock); =20 free_vmap_area_noflush(vb->va); @@ -5238,9 +5239,15 @@ void free_vm_area(struct vm_struct *area) EXPORT_SYMBOL_GPL(free_vm_area); =20 #ifdef CONFIG_SMP -static struct vmap_area *node_to_va(struct rb_node *n) +static __always_inline struct vmap_area * +free_vmap_area_prev(struct vmap_area *va) { - return rb_entry_safe(n, struct vmap_area, rb_node); + lockdep_assert_held(&free_vmap_area_lock); + + if (va->list.prev =3D=3D &free_vmap_area_list) + return NULL; + + return list_entry(va->list.prev, struct vmap_area, list); } =20 /** @@ -5255,26 +5262,19 @@ static struct vmap_area *node_to_va(struct rb_node = *n) static struct vmap_area * pvm_find_va_enclose_addr(unsigned long addr) { - struct vmap_area *va, *tmp; - struct rb_node *n; + struct vmap_area *va; =20 - n =3D free_vmap_area_root.rb_node; - va =3D NULL; + lockdep_assert_held(&free_vmap_area_lock); =20 - while (n) { - tmp =3D rb_entry(n, struct vmap_area, rb_node); - if (tmp->va_start <=3D addr) { - va =3D tmp; - if (tmp->va_end >=3D addr) - break; + if (free_mt_supported()) + return __find_vmap_area_enclose_addr_mt(addr, &free_vmap_area_mt); =20 - n =3D n->rb_right; - } else { - n =3D n->rb_left; - } + list_for_each_entry_reverse(va, &free_vmap_area_list, list) { + if (va->va_start <=3D addr) + return va; } =20 - return va; + return NULL; } =20 /** @@ -5419,7 +5419,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned l= ong *offsets, * If this VA does not fit, move base downwards and recheck. */ if (base + start < va->va_start) { - va =3D node_to_va(rb_prev(&va->rb_node)); + va =3D free_vmap_area_prev(va); base =3D pvm_determine_end_from_reverse(&va, align) - end; term_area =3D area; continue; @@ -5474,7 +5474,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned l= ong *offsets, struct vmap_node *vn =3D addr_to_node(vas[area]->va_start); =20 spin_lock(&vn->busy.lock); - insert_vmap_area(vas[area], &vn->busy.root, &vn->busy.head); + insert_vmap_area_busy_locked(vas[area], vn); setup_vmalloc_vm(vms[area], vas[area], VM_ALLOC, pcpu_get_vm_areas); spin_unlock(&vn->busy.lock); @@ -5501,8 +5501,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned l= ong *offsets, while (area--) { orig_start =3D vas[area]->va_start; orig_end =3D vas[area]->va_end; - va =3D merge_or_add_vmap_area_augment(vas[area], &free_vmap_area_root, - &free_vmap_area_list); + va =3D merge_or_add_vmap_area_free_locked(vas[area]); if (va) kasan_release_vmalloc(orig_start, orig_end, va->va_start, va->va_end, @@ -5552,8 +5551,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned l= ong *offsets, for (area =3D 0; area < nr_vms; area++) { orig_start =3D vas[area]->va_start; orig_end =3D vas[area]->va_end; - va =3D merge_or_add_vmap_area_augment(vas[area], &free_vmap_area_root, - &free_vmap_area_list); + va =3D merge_or_add_vmap_area_free_locked(vas[area]); if (va) kasan_release_vmalloc(orig_start, orig_end, va->va_start, va->va_end, --=20 2.34.1 From nobody Sat Jun 13 23:22:38 2026 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A1503FDBF7 for ; Sat, 13 Jun 2026 17:21:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.168.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371264; cv=none; b=ZwU3pCh7LmACxv8NDeyMKdOjr7tn7R4kH+/G5YWLx2Htev5qH5y/g2SULxuTCsJLunlex2Z04cv2obKvcp0ZDSjJgRUPMf3omdrtwhCDL37gq/I8SRwnHYfvURFcBCEY3UoXvAzFefBHZOxvCHBX8t3L7PeADmI2D3jK4STa1Wc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371264; c=relaxed/simple; bh=fO7p35yLhLCL7LmgBhMmEl7jA4eYMFRf0y923znTj5k=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=CBn8odCTW1XCJcVc9EFHbdAnWndNe4+asgMZCySR9z80TGJsBjPpniDLJkALZ0xDLG6cb/z2YHoXPQ8WGgRsFX14bX/SNgu2vQv7k9t1RaB8z4sHoVB91xf0wWIOBgRz8Q22u5obvAi7Kaivkr7ffWykgUVbwCdUN/a5PqUZ/2M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com; spf=pass smtp.mailfrom=oss.qualcomm.com; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b=Vtje02JL; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b=Q7SYsvxu; arc=none smtp.client-ip=205.220.168.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b="Vtje02JL"; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="Q7SYsvxu" Received: from pps.filterd (m0279864.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65DF8p262783835 for ; Sat, 13 Jun 2026 17:20:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= fl9EcxsXrxxpFdLKsvvtWPk1tb6yf1Ba7ILpGjWgZTM=; b=Vtje02JLOCGs8ZDa 9Fp8D7O3MY4B1uv9kSYyndpXE0ZtVzkiWnz3K5/Gd4Uibh99WoiAlTkZOQe+YfGt 5aldTkaZTLOwFRo+lFJUa04S+Idivll1wMD0qXK2RfWAKv8hpV4gLjFxVzUBe8ml WpSWNLBybnJg8Sh6DyL2fuKZlwflS7o5JXQBmMY1P7ePjKlSHmwmdU6djVVETr88 V+CTtHgG2aOQj0m/RSH9Dy9Yw60usBES99DHkwS4bYO7dpkDXaVh5kVwxCE8NnW3 FsY0pFMI0CLY9GHk5eAhMJ1zQvG5gu29KOq00K5LRFH5GFrjUGhM9oEVDez8TJSA Gm4hig== Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4es0cghh03-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Sat, 13 Jun 2026 17:20:59 +0000 (GMT) Received: by mail-pf1-f200.google.com with SMTP id d2e1a72fcca58-8423efbfb61so1397088b3a.0 for ; Sat, 13 Jun 2026 10:20:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1781371259; x=1781976059; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=fl9EcxsXrxxpFdLKsvvtWPk1tb6yf1Ba7ILpGjWgZTM=; b=Q7SYsvxubhbkM3/1yr4vgsV68ZlyZfFbG4nl4FeV9WGZGR1rUAATdVVU4qOl0vk9/y UtE9Qr4XxjBs33b1cUjzFyrC9J+UrhxC1jB2iX38hg/MPtIa1cCB++2vm0CzCjcLv0lN PRG2wLj+mbCPdXdn1kL9CSuYapMXK6B50JRQ5N6q/RE6ZOU+fra8FVh/Io8D2uewD1pe jxSELYjcUfneK94n9SRot4jOkx8QHkT2WAhszkOGs6+//HcuHKbnygiBc3qmiSDRvZy/ m4kA92n+PlVA07kf+wCjlfyWmeAGxIUgkII99MjgMVQmPnHEpAYeiASkz9tRlmpChgh9 xykw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781371259; x=1781976059; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=fl9EcxsXrxxpFdLKsvvtWPk1tb6yf1Ba7ILpGjWgZTM=; b=JAi6CpmGB9APtSMaEvYG3oVZtVfPMibHXnRyCE8AFNEyRKS0q1Ey9sDopGPLiT16cS /1mHsrUC2CSWGFW0fadEqyKvIIPuwp+o339ryJSdVVyutnVk8v8EnOt+4AcPv7b6HBH5 /G43l1u0leaQR+YogF7rmgftZgftZ6emwSFYSukiqS8P1/SEasbWK4bxCoA1fM3D442D /zuh/hXiavFnDkiTtJl53g3pHwwzHrZDSz3IiBQXtpPEXKKzl1ayIAgAsbHLCP3BnLcU xzSOWGaOcCrxWbwBwxx9eiO9myjXCaFMZteCP8LAm1yEl5HzyMujH6SxoNVBZz2+tIrT vmqg== X-Forwarded-Encrypted: i=1; AFNElJ9VAKkA1NLWp2IT0BVTpDHisc38VUnv+WLgGSnpdj0qmwFc+pPvW+8MTPoRr/rAzbbxLZx8UnvwKjKsEmA=@vger.kernel.org X-Gm-Message-State: AOJu0YxcfaUQdxp6mZ0xnEQCZYi4zcJ5p231GrSfuPez/5ygfrpiJkev u+72+0vr7UGClb40pb1nb1QgnPxeiUn1i5Vi0vrpI4gga0IArDshlowQcIE24PWe4B0Trgag8VA A0k4Ye2dFrCCccY+BrEGEfi3Cw7UmOYa6ZJ4gZlb0fqOYw0w2iBbUF2+FTZ2NY1cDdjI= X-Gm-Gg: Acq92OGQjyP6GNc5fOHKrV/vgIeakHKdCTUJl5KKqU17/70u97kAhAPvvtQmYzcISaT uXXjZwC1HQ0W+zB079EuFb8AODBBsxnrrWdMV8Pu+I3N0N8O129rCu3BlV3p7xCjDGOnuipnayf Zwz7BkpKbmxz3FUv/Q1Y6gkpqCdBlbOCKyZoyEG6McuvYyIFzJq4wnm2ZVEvs0Y1T12zUnL1oHx LHtwmKIprftqDwhaBm54UYSOYIWgm6baoXc54/sSTVHKRdj3v9NLuKCz+utumKCSxzcP0cNI6Md XFa7cfYS9XZJSuuaVI81TdQqSxqB7/CHUURiRUnKYRG1t5LTadN1ZLieTRJcnqLmIUJb9n1dIOr U3NOHjr387C4SNj+P0bCf5xaUR2bMPRbWtCOa4UAwo7LU3qFt6DTxhw== X-Received: by 2002:a05:6a00:2d29:b0:842:6097:467d with SMTP id d2e1a72fcca58-844e196bc9amr4347651b3a.15.1781371258108; Sat, 13 Jun 2026 10:20:58 -0700 (PDT) X-Received: by 2002:a05:6a00:2d29:b0:842:6097:467d with SMTP id d2e1a72fcca58-844e196bc9amr4347599b3a.15.1781371257350; Sat, 13 Jun 2026 10:20:57 -0700 (PDT) Received: from hu-pranarya-hyd.qualcomm.com ([202.46.22.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434accbec5sm5390913b3a.16.2026.06.13.10.20.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 10:20:57 -0700 (PDT) From: Pranjal Arya Date: Sat, 13 Jun 2026 22:49:46 +0530 Subject: [PATCH RFC 04/12] mm/vmalloc: finalize maple-only indexing and shrink struct vmap_area Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260613-vmalloc_maple-v1-4-0aa740bb944b@oss.qualcomm.com> References: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> In-Reply-To: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> To: Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Alice Ryhl , Andrew Ballance Cc: linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, Lorenzo Stoakes , Pranjal Shrivastava , Will Deacon , Suzuki K Poulose , Neil Armstrong , Mostafa Saleh , Balbir Singh , Suren Baghdasaryan , Marco Elver , Dmitry Vyukov , Alexander Potapenko , Shuah Khan , Dev Jain , Brendan Jackman , Puranjay Mohan , Santosh Shukla , Wyes Karny , Pranjal Arya , Sudeep Holla X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1781371215; l=51713; i=pranjal.arya@oss.qualcomm.com; s=20260516; h=from:subject:message-id; bh=fO7p35yLhLCL7LmgBhMmEl7jA4eYMFRf0y923znTj5k=; b=H1CSHTXVzF1kuQm4TSu0l+okUiGeApvOIdH5wsh24YMSJ1FhzSDwjnXTYulZAR34r+3w4sSqk Al/LcKiAGAaAFev0P77H95e/dVdIhZPhsVCdbsQxGCmGHhW3MEa1Lkp X-Developer-Key: i=pranjal.arya@oss.qualcomm.com; a=ed25519; pk=ymtcTlccEIDsi3ErhpjIoZZHKdPBYWGWW0Lchs5MsbE= X-Proofpoint-ORIG-GUID: jck6rcMTICPnhIm24JrT4tt-7tN3OHNu X-Authority-Analysis: v=2.4 cv=NPLlPU6g c=1 sm=1 tr=0 ts=6a2d917b cx=c_pps a=mDZGXZTwRPZaeRUbqKGCBw==:117 a=fChuTYTh2wq5r3m49p7fHw==:17 a=IkcTkHD0fZMA:10 a=FelO9ux0wxsA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=DJpcGTmdVt4CTyJn9g5Z:22 a=EUspDBNiAAAA:8 a=FKSjEIMCKi8Q58X1TJIA:9 a=QEXdDO2ut3YA:10 a=zc0IvFSfCIW2DFIPzwfm:22 X-Proofpoint-GUID: jck6rcMTICPnhIm24JrT4tt-7tN3OHNu X-Proofpoint-Spam-Info: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfXxwKW+pt2BlvE tNE41+TzTWZfln8aqZ7SQqYVJm4XqYPaE4xfOdnsH5lS2OsMmR0+WRstSAaAYXSNAbgGLwrIHE9 hfAxlPDbGcMxDSqMX++wYpyETgCnVoI= X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfXyEyiDTI4u4dK ymsmuCv3eOirLy8/dQ+9KDZxXwJMj4Rnb1bmLieUZ0CSK1zNdgg36KgryakHdVC/81HRIWYHGgM 7LSQfvHiQP3DAzVcy9ioqElfUpA0Myru4p4D5gmUGTQnitBL9tNVaURJ1HF5P0E71UYjPlyWNZ3 BfpTo0YoYOcQiG61AqLJnMExrcDSYFfKQeWACHSepQXD3QTmb8OKx6ktlttalxQ0k+UBb6G4gIj oWdH17f8CPJsvmEf5D5QvDz1H8UgDsArOYIHWOReDKDSsEX6xQvZcX6RhiAcmAE+tVw71nJPW7n +UA2YHisV5AJC841ygkyKAdKzZqZHbHKFI4ldz/7JdrfyB0G6HOWAIYImK44oOqTv56/GLFs3Tx nK089WmgQfKa7FHrntUnBfewAZVgB+BlkiP+gDXES88X1+O9PtiLuY8TCIzA+EQMvxmGDXkr6uO hFn7nx2IIYGf5y4oXCg== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-13_03,2026-06-12_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 lowpriorityscore=0 suspectscore=0 malwarescore=0 bulkscore=0 adultscore=0 impostorscore=0 phishscore=0 clxscore=1015 priorityscore=1501 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606040000 definitions=main-2606130180 With every alloc/free/pcpu path now driven by maple_tree, retire the augmented rb_tree from struct vmap_area and from include/linux/vmalloc.h. The struct shrinks by 24 bytes on both x86_64 and arm64 (72 -> 48): - struct rb_node rb_node; - struct list_head list; - union { - unsigned long subtree_max_size; - struct vm_struct *vm; - }; + struct list_head list; + struct vm_struct *vm; Also allow maple_tree_init() to be called twice: vmalloc_init() runs before start_kernel() reaches its own maple_tree_init() callsite, and the maple_tree machinery needs to be live for the early seeding done in vmap_init_free_space(). The second call becomes a no-op once maple_node_cache is set. Signed-off-by: Pranjal Arya --- include/linux/vmalloc.h | 16 +- lib/maple_tree.c | 7 + mm/vmalloc.c | 1123 ++++++++++++++++++-------------------------= ---- 3 files changed, 436 insertions(+), 710 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index d87dc7f77f4e..642bca92b804 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -9,7 +9,6 @@ #include #include #include /* pgprot_t */ -#include #include =20 #include @@ -72,19 +71,8 @@ struct vmap_area { unsigned long va_start; unsigned long va_end; =20 - struct rb_node rb_node; /* address sorted rbtree */ - struct list_head list; /* address sorted list */ - - /* - * The following two variables can be packed, because - * a vmap_area object can be either: - * 1) in "free" tree (root is free_vmap_area_root) - * 2) or "busy" tree (root is vmap_area_root) - */ - union { - unsigned long subtree_max_size; /* in "free" tree */ - struct vm_struct *vm; /* in "busy" tree */ - }; + struct list_head list; /* auxiliary linkage (pool/purge/lazy) */ + struct vm_struct *vm; unsigned long flags; /* mark type of vm_map_ram area */ }; =20 diff --git a/lib/maple_tree.c b/lib/maple_tree.c index e52876435b77..f3474a107372 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -5634,6 +5634,13 @@ void __init maple_tree_init(void) .sheaf_capacity =3D 32, }; =20 + /* + * vmalloc_init() may need Maple allocations before start_kernel() reaches + * its own maple_tree_init() callsite. Keep initialization idempotent. + */ + if (maple_node_cache) + return; + maple_node_cache =3D kmem_cache_create("maple_node", sizeof(struct maple_node), &args, SLAB_PANIC); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index f2117eafa9cf..c908c1a0fcd4 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -23,7 +23,6 @@ #include #include #include -#include #include #include #include @@ -36,7 +35,6 @@ #include #include #include -#include #include #include #include @@ -881,22 +879,20 @@ static bool vmap_initialized __read_mostly; static struct kmem_cache *vmap_area_cachep; =20 /* - * This linked list stores free areas sorted by start address. - * It gives O(1) access to neighbors for fast coalescing. + * Maple tree of free ranges. */ -static LIST_HEAD(free_vmap_area_list); -/* Next-fit hint to avoid scanning from list head on each allocation. */ -static unsigned long free_vmap_alloc_hint __maybe_unused =3D 1; - -/* - * Maple tree shadow of free_vmap_area_list. It is used for - * address lookups and first-fit scans. - */ -static struct rb_root free_vmap_area_root =3D RB_ROOT; -static struct maple_tree free_vmap_area_mt __maybe_unused =3D - MTREE_INIT_EXT(free_vmap_area_mt, MT_FLAGS_LOCK_EXTERN, free_vmap_area_lo= ck); -static bool free_vmap_area_mt_enabled __maybe_unused; -static bool free_vmap_area_mt_init_tried __maybe_unused; +static struct maple_tree free_vmap_area_mt =3D + MTREE_INIT_EXT(free_vmap_area_mt, + MT_FLAGS_LOCK_EXTERN | MT_FLAGS_ALLOC_RANGE, + free_vmap_area_lock); +static bool free_vmap_area_mt_enabled; +static bool free_vmap_area_mt_init_tried; +static struct maple_tree occupied_vmap_area_mt =3D + MTREE_INIT_EXT(occupied_vmap_area_mt, + MT_FLAGS_LOCK_EXTERN | MT_FLAGS_ALLOC_RANGE, + free_vmap_area_lock); +static bool occupied_vmap_area_mt_enabled; +static bool occupied_vmap_area_mt_init_tried; =20 /* * Preload a CPU with one object for "no edge" split case. The @@ -905,19 +901,11 @@ static bool free_vmap_area_mt_init_tried __maybe_unus= ed; */ static DEFINE_PER_CPU(struct vmap_area *, ne_fit_preload_node); =20 -/* - * This structure defines a single, solid model where a list and - * maple tree are part of one entity protected by the lock. Nodes are - * sorted in ascending order, thus for O(1) access to left/right - * neighbors a list is used as well as for sequential traversal. - */ +/* Per-node ordered range index backed by Maple Tree. */ struct mt_list { - struct rb_root root; struct maple_tree mt; - struct list_head head; spinlock_t lock; bool mt_enabled; - bool mt_init_tried; }; =20 /* @@ -1055,22 +1043,6 @@ va_size(struct vmap_area *va) return (va->va_end - va->va_start); } =20 -/* - * Transitional rb compatibility retained until all rb-only users are move= d. - * Follow-up patches in this RFC series remove these helpers. - */ -static __always_inline unsigned long -get_subtree_max_size(struct rb_node *node) -{ - struct vmap_area *va; - - va =3D rb_entry_safe(node, struct vmap_area, rb_node); - return va ? va->subtree_max_size : 0; -} - -RB_DECLARE_CALLBACKS_MAX(static, free_vmap_area_rb_augment_cb, - struct vmap_area, rb_node, unsigned long, subtree_max_size, va_size) - static void reclaim_and_purge_vmap_areas(void); static BLOCKING_NOTIFIER_HEAD(vmap_notify_list); static void drain_vmap_area_work(struct work_struct *work); @@ -1078,31 +1050,12 @@ static DECLARE_WORK(drain_vmap_work, drain_vmap_are= a_work); =20 static __cacheline_aligned_in_smp atomic_long_t vmap_lazy_nr; =20 -/* - * maple nodes are allocated from slab; defer tree population until - * slab allocator is up to avoid early-boot failures. - */ -static __always_inline bool vmap_mt_runtime_ready(void) -{ - return READ_ONCE(vmap_initialized) && slab_is_available(); -} - static __always_inline bool free_mt_supported(void) { return free_vmap_area_mt_enabled; } =20 -static __always_inline void disable_free_mt_locked(void) -{ - lockdep_assert_held(&free_vmap_area_lock); - - if (free_vmap_area_mt_enabled) { - __mt_destroy(&free_vmap_area_mt); - free_vmap_area_mt_enabled =3D false; - } -} - -static __always_inline void free_mt_store_va_locked(struct vmap_area *va) +static __always_inline bool free_mt_store_va_locked(struct vmap_area *va) { int err; =20 @@ -1110,12 +1063,20 @@ static __always_inline void free_mt_store_va_locked= (struct vmap_area *va) =20 MA_STATE(mas, &free_vmap_area_mt, va->va_start, va->va_end - 1); =20 - err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); - if (WARN_ON_ONCE(err)) - disable_free_mt_locked(); + err =3D mas_preallocate(&mas, va, GFP_NOWAIT | __GFP_NOWARN); + if (!err) { + mas_store_prealloc(&mas, va); + mas_destroy(&mas); + } else { + err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) + return false; + } + + return true; } =20 -static __always_inline void free_mt_erase_va_locked(struct vmap_area *va) +static __always_inline bool free_mt_erase_va_locked(struct vmap_area *va) { int err; =20 @@ -1125,10 +1086,12 @@ static __always_inline void free_mt_erase_va_locked= (struct vmap_area *va) =20 err =3D mas_store_gfp(&mas, NULL, GFP_ATOMIC | __GFP_NOWARN); if (WARN_ON_ONCE(err)) - disable_free_mt_locked(); + return false; + + return true; } =20 -static __always_inline void +static __always_inline bool free_mt_update_va_locked(struct vmap_area *va, unsigned long old_start, unsigned long old_end) { @@ -1140,35 +1103,14 @@ free_mt_update_va_locked(struct vmap_area *va, unsi= gned long old_start, MA_STATE(mas_store, &free_vmap_area_mt, va->va_start, va->va_end - 1); =20 err =3D mas_store_gfp(&mas_erase, NULL, GFP_ATOMIC | __GFP_NOWARN); - if (WARN_ON_ONCE(err)) { - disable_free_mt_locked(); - return; - } + if (WARN_ON_ONCE(err)) + return false; =20 err =3D mas_store_gfp(&mas_store, va, GFP_ATOMIC | __GFP_NOWARN); if (WARN_ON_ONCE(err)) - disable_free_mt_locked(); -} - -static void free_mt_rebuild_locked(void) -{ - struct vmap_area *va; - int err; - - lockdep_assert_held(&free_vmap_area_lock); - - __mt_destroy(&free_vmap_area_mt); - free_vmap_area_mt_enabled =3D true; - - list_for_each_entry(va, &free_vmap_area_list, list) { - MA_STATE(mas, &free_vmap_area_mt, va->va_start, va->va_end - 1); + return false; =20 - err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); - if (WARN_ON_ONCE(err)) { - disable_free_mt_locked(); - return; - } - } + return true; } =20 static __always_inline void try_init_free_mt_locked(void) @@ -1178,257 +1120,101 @@ static __always_inline void try_init_free_mt_lock= ed(void) if (free_vmap_area_mt_init_tried) return; =20 - if (!vmap_mt_runtime_ready()) + if (!slab_is_available()) return; =20 free_vmap_area_mt_init_tried =3D true; - free_mt_rebuild_locked(); -} - -static __always_inline struct vmap_area * -__find_vmap_area_list(unsigned long addr, struct list_head *head) -{ - struct vmap_area *va; - - addr =3D (unsigned long)kasan_reset_tag((void *)addr); - - list_for_each_entry(va, head, list) { - if (addr < va->va_start) - break; - if (addr < va->va_end) - return va; - } - - return NULL; -} - -static struct vmap_area *__find_vmap_area(unsigned long addr, struct rb_ro= ot *root) -{ - struct rb_node *n =3D root->rb_node; - - addr =3D (unsigned long)kasan_reset_tag((void *)addr); - - while (n) { - struct vmap_area *va; - - va =3D rb_entry(n, struct vmap_area, rb_node); - if (addr < va->va_start) - n =3D n->rb_left; - else if (addr >=3D va->va_end) - n =3D n->rb_right; - else - return va; - } - - return NULL; + free_vmap_area_mt_enabled =3D true; } =20 -/* Look up the first VA which satisfies addr < va_end, NULL if none. */ -static __always_inline struct vmap_area * -__find_vmap_area_exceed_addr_list(unsigned long addr, struct list_head *he= ad) +static __always_inline bool occupied_mt_supported(void) { - struct vmap_area *va; - - addr =3D (unsigned long)kasan_reset_tag((void *)addr); - - list_for_each_entry(va, head, list) { - if (va->va_end > addr) - return va; - } - - return NULL; + return occupied_vmap_area_mt_enabled; } =20 -static __always_inline struct list_head * -find_vmap_area_insert_point_list(struct vmap_area *va, struct list_head *h= ead) +static __always_inline void try_init_occupied_mt_locked(void) { - struct vmap_area *tmp; - struct list_head *next =3D head; - - list_for_each_entry(tmp, head, list) { - if (tmp->va_start > va->va_start) { - next =3D &tmp->list; - break; - } - } + lockdep_assert_held(&free_vmap_area_lock); =20 - if (next !=3D head) { - tmp =3D list_entry(next, struct vmap_area, list); - if (WARN_ON_ONCE(va->va_end > tmp->va_start)) - return NULL; - } + if (occupied_vmap_area_mt_init_tried) + return; =20 - if (next->prev !=3D head) { - tmp =3D list_entry(next->prev, struct vmap_area, list); - if (WARN_ON_ONCE(va->va_start < tmp->va_end)) - return NULL; - } + if (!slab_is_available()) + return; =20 - return next; + occupied_vmap_area_mt_init_tried =3D true; + occupied_vmap_area_mt_enabled =3D true; } =20 -/* - * Use maple-tree neighbour lookup to locate insertion point in O(log n), - * while preserving sorted-list neighbour traversal. - */ -static __always_inline struct list_head * -find_vmap_area_insert_point_mt(struct vmap_area *va, struct maple_tree *tr= ee, - struct list_head *head) +static __always_inline bool +occupied_mt_store_range_locked(unsigned long start, unsigned long end) { - struct vmap_area *prev, *next; - struct list_head *next_link; + int err; =20 - MA_STATE(mas, tree, va->va_start, va->va_start); + lockdep_assert_held(&free_vmap_area_lock); =20 - mas_set(&mas, va->va_start); - next =3D mas_find(&mas, ULONG_MAX); + if (WARN_ON_ONCE(!occupied_mt_supported())) + return false; =20 - if (next) { - if (WARN_ON_ONCE(next->va_start <=3D va->va_start)) - return NULL; - if (WARN_ON_ONCE(va->va_end > next->va_start)) - return NULL; - next_link =3D &next->list; - } else { - next_link =3D head; - } + MA_STATE(mas, &occupied_vmap_area_mt, start, end - 1); =20 - if (next_link->prev !=3D head) { - prev =3D list_entry(next_link->prev, struct vmap_area, list); - if (WARN_ON_ONCE(va->va_start < prev->va_end)) - return NULL; + err =3D mas_preallocate(&mas, XA_ZERO_ENTRY, GFP_NOWAIT | __GFP_NOWARN); + if (!err) { + mas_store_prealloc(&mas, XA_ZERO_ENTRY); + mas_destroy(&mas); + return true; } =20 - return next_link; + err =3D mas_store_gfp(&mas, XA_ZERO_ENTRY, GFP_ATOMIC | __GFP_NOWARN); + return !WARN_ON_ONCE(err); } =20 static __always_inline bool -insert_vmap_area_list_sorted(struct vmap_area *va, struct list_head *head) +occupied_mt_erase_range_locked(unsigned long start, unsigned long end) { - struct list_head *next; - - next =3D find_vmap_area_insert_point_list(va, head); - if (!next) - return false; - - list_add_tail(&va->list, next); - return true; -} + int err; =20 -static __always_inline bool -insert_vmap_area_list_sorted_mt(struct vmap_area *va, struct maple_tree *t= ree, - struct list_head *head) -{ - struct list_head *next; + lockdep_assert_held(&free_vmap_area_lock); =20 - next =3D find_vmap_area_insert_point_mt(va, tree, head); - if (!next) + if (WARN_ON_ONCE(!occupied_mt_supported())) return false; =20 - list_add_tail(&va->list, next); - return true; -} + MA_STATE(mas, &occupied_vmap_area_mt, start, end - 1); =20 -static __always_inline void -disable_busy_mt_locked(struct vmap_node *vn) -{ - lockdep_assert_held(&vn->busy.lock); - - if (vn->busy.mt_enabled) { - __mt_destroy(&vn->busy.mt); - vn->busy.mt_enabled =3D false; - } - - vn->busy.mt_init_tried =3D true; -} - -static __always_inline void -disable_lazy_mt_locked(struct vmap_node *vn) -{ - lockdep_assert_held(&vn->lazy.lock); - - if (vn->lazy.mt_enabled) { - __mt_destroy(&vn->lazy.mt); - vn->lazy.mt_enabled =3D false; - } - - vn->lazy.mt_init_tried =3D true; + err =3D mas_store_gfp(&mas, NULL, GFP_ATOMIC | __GFP_NOWARN); + return !WARN_ON_ONCE(err); } =20 -static void -busy_mt_rebuild_locked(struct vmap_node *vn) +static __always_inline bool +occupied_mt_erase_va_locked(struct vmap_area *va) { - struct vmap_area *va; - int err; - - lockdep_assert_held(&vn->busy.lock); - - __mt_destroy(&vn->busy.mt); - vn->busy.mt_enabled =3D true; + lockdep_assert_held(&free_vmap_area_lock); =20 - list_for_each_entry(va, &vn->busy.head, list) { - MA_STATE(mas, &vn->busy.mt, va->va_start, va->va_end - 1); + if (!occupied_mt_supported()) + return true; =20 - err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); - if (WARN_ON_ONCE(err)) { - disable_busy_mt_locked(vn); - return; - } - } + return occupied_mt_erase_range_locked(va->va_start, va->va_end); } =20 static __always_inline void try_init_busy_mt_locked(struct vmap_node *vn) { lockdep_assert_held(&vn->busy.lock); - - if (vn->busy.mt_init_tried) - return; - - if (!vmap_mt_runtime_ready()) - return; - - vn->busy.mt_init_tried =3D true; - busy_mt_rebuild_locked(vn); -} - -static void -lazy_mt_rebuild_locked(struct vmap_node *vn) -{ - struct vmap_area *va; - int err; - - lockdep_assert_held(&vn->lazy.lock); - - __mt_destroy(&vn->lazy.mt); - vn->lazy.mt_enabled =3D true; - - list_for_each_entry(va, &vn->lazy.head, list) { - MA_STATE(mas, &vn->lazy.mt, va->va_start, va->va_end - 1); - - err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); - if (WARN_ON_ONCE(err)) { - disable_lazy_mt_locked(vn); - return; - } - } + WARN_ON_ONCE(!vn->busy.mt_enabled); } =20 static __always_inline void try_init_lazy_mt_locked(struct vmap_node *vn) { lockdep_assert_held(&vn->lazy.lock); - - if (vn->lazy.mt_init_tried) - return; - - if (!vmap_mt_runtime_ready()) - return; - - vn->lazy.mt_init_tried =3D true; - lazy_mt_rebuild_locked(vn); + WARN_ON_ONCE(!vn->lazy.mt_enabled); } =20 +/* + * Busy/lazy lookup paths remain lock-based to preserve pointer lifetime + * semantics. + */ + static __always_inline struct vmap_area * __find_vmap_area_mt(unsigned long addr, struct maple_tree *tree) { @@ -1462,6 +1248,24 @@ __find_vmap_area_enclose_addr_mt(unsigned long addr,= struct maple_tree *tree) return mas_find_rev(&mas, 0); } =20 +static __always_inline bool +validate_vmap_area_range_insert_mt_locked(struct maple_tree *tree, + unsigned long start, + unsigned long end) +{ + struct vmap_area *left, *right; + + left =3D __find_vmap_area_enclose_addr_mt(start, tree); + if (left && WARN_ON_ONCE(left->va_end > start)) + return false; + + right =3D __find_vmap_area_exceed_addr_mt(start, tree); + if (right && WARN_ON_ONCE(right->va_start < end)) + return false; + + return true; +} + static __always_inline struct vmap_area * find_vmap_area_busy_locked(unsigned long addr, struct vmap_node *vn) { @@ -1469,10 +1273,10 @@ find_vmap_area_busy_locked(unsigned long addr, stru= ct vmap_node *vn) =20 try_init_busy_mt_locked(vn); =20 - if (likely(vn->busy.mt_enabled)) - return __find_vmap_area_mt(addr, &vn->busy.mt); + if (WARN_ON_ONCE(!vn->busy.mt_enabled)) + return NULL; =20 - return __find_vmap_area_list(addr, &vn->busy.head); + return __find_vmap_area_mt(addr, &vn->busy.mt); } =20 static __always_inline struct vmap_area * @@ -1482,10 +1286,10 @@ find_vmap_area_exceed_addr_busy_locked(unsigned lon= g addr, struct vmap_node *vn) =20 try_init_busy_mt_locked(vn); =20 - if (likely(vn->busy.mt_enabled)) - return __find_vmap_area_exceed_addr_mt(addr, &vn->busy.mt); + if (WARN_ON_ONCE(!vn->busy.mt_enabled)) + return NULL; =20 - return __find_vmap_area_exceed_addr_list(addr, &vn->busy.head); + return __find_vmap_area_exceed_addr_mt(addr, &vn->busy.mt); } =20 /* @@ -1544,22 +1348,27 @@ insert_vmap_area_busy_locked(struct vmap_area *va, = struct vmap_node *vn) =20 try_init_busy_mt_locked(vn); =20 - if (likely(vn->busy.mt_enabled)) { - MA_STATE(mas, &vn->busy.mt, va->va_start, va->va_end - 1); + if (WARN_ON_ONCE(!vn->busy.mt_enabled)) + return; =20 - if (!insert_vmap_area_list_sorted_mt(va, &vn->busy.mt, - &vn->busy.head)) - return; + if (!validate_vmap_area_range_insert_mt_locked(&vn->busy.mt, + va->va_start, + va->va_end)) + return; =20 - err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); - if (WARN_ON_ONCE(err)) - disable_busy_mt_locked(vn); + INIT_LIST_HEAD(&va->list); + + MA_STATE(mas, &vn->busy.mt, va->va_start, va->va_end - 1); =20 + err =3D mas_preallocate(&mas, va, GFP_NOWAIT | __GFP_NOWARN); + if (!err) { + mas_store_prealloc(&mas, va); + mas_destroy(&mas); return; } =20 - if (!insert_vmap_area_list_sorted(va, &vn->busy.head)) - return; + err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); + WARN_ON_ONCE(err); } =20 static __always_inline void @@ -1569,18 +1378,17 @@ unlink_vmap_area_busy_locked(struct vmap_area *va, = struct vmap_node *vn) =20 lockdep_assert_held(&vn->busy.lock); =20 - MA_STATE(mas, &vn->busy.mt, va->va_start, va->va_end - 1); - - list_del_init(&va->list); - try_init_busy_mt_locked(vn); =20 - if (unlikely(!vn->busy.mt_enabled)) + if (WARN_ON_ONCE(!vn->busy.mt_enabled)) return; =20 + MA_STATE(mas, &vn->busy.mt, va->va_start, va->va_end - 1); + err =3D mas_store_gfp(&mas, NULL, GFP_ATOMIC | __GFP_NOWARN); - if (WARN_ON_ONCE(err)) - disable_busy_mt_locked(vn); + WARN_ON_ONCE(err); + + INIT_LIST_HEAD(&va->list); } =20 static __always_inline void @@ -1591,23 +1399,27 @@ insert_vmap_area_lazy_locked(struct vmap_area *va, = struct vmap_node *vn) lockdep_assert_held(&vn->lazy.lock); =20 try_init_lazy_mt_locked(vn); + if (WARN_ON_ONCE(!vn->lazy.mt_enabled)) + return; =20 - if (likely(vn->lazy.mt_enabled)) { - MA_STATE(mas, &vn->lazy.mt, va->va_start, va->va_end - 1); + if (!validate_vmap_area_range_insert_mt_locked(&vn->lazy.mt, + va->va_start, + va->va_end)) + return; =20 - if (!insert_vmap_area_list_sorted_mt(va, &vn->lazy.mt, - &vn->lazy.head)) - return; + INIT_LIST_HEAD(&va->list); =20 - err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); - if (WARN_ON_ONCE(err)) - disable_lazy_mt_locked(vn); + MA_STATE(mas, &vn->lazy.mt, va->va_start, va->va_end - 1); =20 + err =3D mas_preallocate(&mas, va, GFP_NOWAIT | __GFP_NOWARN); + if (!err) { + mas_store_prealloc(&mas, va); + mas_destroy(&mas); return; } =20 - if (!insert_vmap_area_list_sorted(va, &vn->lazy.head)) - return; + err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); + WARN_ON_ONCE(err); } =20 static __always_inline bool @@ -1616,60 +1428,56 @@ lazy_vmap_areas_empty_locked(struct vmap_node *vn) lockdep_assert_held(&vn->lazy.lock); =20 try_init_lazy_mt_locked(vn); + if (WARN_ON_ONCE(!vn->lazy.mt_enabled)) + return true; =20 - if (likely(vn->lazy.mt_enabled)) - return mtree_empty(&vn->lazy.mt); - - return list_empty(&vn->lazy.head); + return mtree_empty(&vn->lazy.mt); } =20 static __always_inline void move_lazy_vmap_areas_to_purge_locked(struct vmap_node *vn) { struct vmap_area *va; - int err; =20 lockdep_assert_held(&vn->lazy.lock); =20 try_init_lazy_mt_locked(vn); + if (WARN_ON_ONCE(!vn->lazy.mt_enabled)) + return; =20 - if (likely(vn->lazy.mt_enabled)) { - list_for_each_entry(va, &vn->lazy.head, list) { - MA_STATE(mas, &vn->lazy.mt, va->va_start, va->va_end - 1); - - err =3D mas_store_gfp(&mas, NULL, GFP_ATOMIC | __GFP_NOWARN); - if (WARN_ON_ONCE(err)) { - disable_lazy_mt_locked(vn); - break; - } - } + MA_STATE(mas, &vn->lazy.mt, 0, 0); =20 - if (vn->lazy.mt_enabled && WARN_ON_ONCE(!mtree_empty(&vn->lazy.mt))) - disable_lazy_mt_locked(vn); - } + mas_for_each(&mas, va, ULONG_MAX) + list_add_tail(&va->list, &vn->purge_list); =20 - list_replace_init(&vn->lazy.head, &vn->purge_list); + __mt_destroy(&vn->lazy.mt); + mt_init_flags(&vn->lazy.mt, MT_FLAGS_LOCK_EXTERN); + mt_set_external_lock(&vn->lazy.mt, &vn->lazy.lock); + vn->lazy.mt_enabled =3D true; } =20 static __always_inline bool insert_vmap_area_free_locked(struct vmap_area *va) { + struct vmap_area *prev, *next; + lockdep_assert_held(&free_vmap_area_lock); =20 try_init_free_mt_locked(); =20 - if (likely(free_mt_supported())) { - if (!insert_vmap_area_list_sorted_mt(va, &free_vmap_area_mt, - &free_vmap_area_list)) - return false; + if (unlikely(!free_mt_supported())) + return false; =20 - free_mt_store_va_locked(va); - } else { - if (!insert_vmap_area_list_sorted(va, &free_vmap_area_list)) - return false; - } + prev =3D __find_vmap_area_enclose_addr_mt(va->va_start, &free_vmap_area_m= t); + if (prev && WARN_ON_ONCE(prev->va_end > va->va_start)) + return false; =20 - return true; + next =3D __find_vmap_area_exceed_addr_mt(va->va_start, &free_vmap_area_mt= ); + if (next && WARN_ON_ONCE(next->va_start < va->va_end)) + return false; + + INIT_LIST_HEAD(&va->list); + return free_mt_store_va_locked(va); } =20 static __always_inline void @@ -1677,193 +1485,56 @@ unlink_vmap_area_free_locked(struct vmap_area *va) { lockdep_assert_held(&free_vmap_area_lock); =20 - if (WARN_ON_ONCE(list_empty(&va->list))) + if (unlikely(!free_mt_supported())) return; =20 - if (likely(free_mt_supported())) - free_mt_erase_va_locked(va); - - list_del_init(&va->list); -} - -/* - * Merge de-allocated chunk of VA memory with previous - * and next free blocks. If coalesce is not done a new - * free area is inserted. If VA has been merged, it is - * freed. - * - * Please note, it can return NULL in case of overlap - * ranges, followed by WARN() report. Despite it is a - * buggy behaviour, a system can be alive and keep - * ongoing. - */ -static __always_inline struct vmap_area * -__merge_or_add_vmap_area(struct vmap_area *va, struct list_head *head, boo= l update_mt) -{ - struct vmap_area *sibling; - struct list_head *next; - unsigned long old_start, old_end; - bool merged =3D false; - - if (update_mt && free_mt_supported()) - next =3D find_vmap_area_insert_point_mt(va, &free_vmap_area_mt, head); - else - next =3D find_vmap_area_insert_point_list(va, head); - - if (!next) - return NULL; - - /* - * start end - * | | - * |<------VA------>|<-----Next----->| - * | | - * start end - */ - if (next !=3D head) { - sibling =3D list_entry(next, struct vmap_area, list); - if (sibling->va_start =3D=3D va->va_end) { - old_start =3D sibling->va_start; - old_end =3D sibling->va_end; - sibling->va_start =3D va->va_start; - if (update_mt && free_mt_supported()) - free_mt_update_va_locked(sibling, old_start, old_end); - - /* Free vmap_area object. */ - kmem_cache_free(vmap_area_cachep, va); - - /* Point to the new merged area. */ - va =3D sibling; - merged =3D true; - } - } - - /* - * start end - * | | - * |<-----Prev----->|<------VA------>| - * | | - * start end - */ - if (next->prev !=3D head) { - sibling =3D list_entry(next->prev, struct vmap_area, list); - if (sibling->va_end =3D=3D va->va_start) { - /* - * If both neighbors are coalesced, it is important - * to unlink the "next" node first, followed by merging - * with "previous" one. - */ - if (merged) { - if (update_mt) - unlink_vmap_area_free_locked(va); - else - list_del_init(&va->list); - } - - old_start =3D sibling->va_start; - old_end =3D sibling->va_end; - sibling->va_end =3D va->va_end; - if (update_mt && free_mt_supported()) - free_mt_update_va_locked(sibling, old_start, old_end); - - /* Free vmap_area object. */ - kmem_cache_free(vmap_area_cachep, va); - - /* Point to the new merged area. */ - va =3D sibling; - merged =3D true; - } - } - - if (!merged) { - if (update_mt) - insert_vmap_area_free_locked(va); - else - list_add_tail(&va->list, next); - } + if (!free_mt_erase_va_locked(va)) + return; =20 - return va; -} - -static __always_inline struct vmap_area * -merge_or_add_vmap_area(struct vmap_area *va, - struct list_head *head) -{ - return __merge_or_add_vmap_area(va, head, false); + INIT_LIST_HEAD(&va->list); } =20 static __always_inline struct vmap_area * merge_or_add_vmap_area_free_locked(struct vmap_area *va) { + struct vmap_area *left, *right; + unsigned long new_start, new_end; + lockdep_assert_held(&free_vmap_area_lock); =20 - va =3D __merge_or_add_vmap_area(va, &free_vmap_area_list, true); - if (va && va->va_start < free_vmap_alloc_hint) - free_vmap_alloc_hint =3D va->va_start; + if (unlikely(!free_mt_supported())) + return NULL; =20 - return va; -} + new_start =3D va->va_start; + new_end =3D va->va_end; =20 -/* - * Transitional wrappers retained until all legacy rb call sites are switc= hed. - * Follow-up patches in this series remove these wrappers. - */ -static __always_inline void -insert_vmap_area(struct vmap_area *va, struct rb_root *root, - struct list_head *head) -{ - struct vmap_node *vn =3D addr_to_node(va->va_start); + left =3D __find_vmap_area_enclose_addr_mt(new_start, &free_vmap_area_mt); + if (left && WARN_ON_ONCE(left->va_end > new_start)) + return NULL; =20 - if (head =3D=3D &free_vmap_area_list) { - insert_vmap_area_free_locked(va); - return; + if (left && left->va_end =3D=3D new_start) { + new_start =3D left->va_start; + unlink_vmap_area_free_locked(left); + kmem_cache_free(vmap_area_cachep, left); } =20 - if (head =3D=3D &vn->lazy.head) { - insert_vmap_area_lazy_locked(va, vn); - return; - } - - insert_vmap_area_busy_locked(va, vn); -} - -static __always_inline void -insert_vmap_area_augment(struct vmap_area *va, struct rb_node *from, - struct rb_root *root, struct list_head *head) -{ - insert_vmap_area(va, root, head); -} - -static __always_inline void unlink_va(struct vmap_area *va, struct rb_root= *root) -{ - struct vmap_node *vn =3D addr_to_node(va->va_start); + right =3D __find_vmap_area_exceed_addr_mt(new_start, &free_vmap_area_mt); + if (right && WARN_ON_ONCE(right->va_start < new_end)) + return NULL; =20 - if (root =3D=3D &free_vmap_area_root) { - unlink_vmap_area_free_locked(va); - return; + if (right && right->va_start =3D=3D new_end) { + new_end =3D right->va_end; + unlink_vmap_area_free_locked(right); + kmem_cache_free(vmap_area_cachep, right); } =20 - unlink_vmap_area_busy_locked(va, vn); -} + va->va_start =3D new_start; + va->va_end =3D new_end; =20 -static __always_inline void -unlink_va_augment(struct vmap_area *va, struct rb_root *root) -{ - unlink_va(va, root); -} - -static __always_inline void augment_tree_propagate_from(struct vmap_area *= va) -{ -} - -static __always_inline struct vmap_area * -merge_or_add_vmap_area_augment(struct vmap_area *va, struct rb_root *root, - struct list_head *head) -{ - if (head =3D=3D &free_vmap_area_list) - return merge_or_add_vmap_area_free_locked(va); + if (!insert_vmap_area_free_locked(va)) + return NULL; =20 - return merge_or_add_vmap_area(va, head); + return va; } =20 static __always_inline bool @@ -1885,86 +1556,57 @@ is_within_this_va(struct vmap_area *va, unsigned lo= ng size, return (nva_start_addr + size <=3D va->va_end); } =20 -static __always_inline struct vmap_area * -find_vmap_lowest_match_list(struct list_head *head, unsigned long size, - unsigned long align, unsigned long vstart) +static __always_inline bool +occupied_mt_find_hole_window_locked(unsigned long min, unsigned long max, + unsigned long size, unsigned long align, + unsigned long *addr) { - struct vmap_area *va; + MA_STATE(mas, &occupied_vmap_area_mt, 0, 0); + unsigned long search =3D min; + unsigned long hole_end; =20 - list_for_each_entry(va, head, list) { - if (!is_within_this_va(va, size, align, vstart)) - continue; + while (search <=3D max) { + unsigned long candidate, candidate_end; =20 - return va; - } + mas_set(&mas, search); + if (mas_empty_area(&mas, search, max, size)) + return false; =20 - return NULL; -} + hole_end =3D min(mas.last, max); + candidate =3D ALIGN(mas.index, align); + if (candidate < mas.index) + return false; =20 -static __always_inline unsigned long -clamp_vmap_alloc_hint(unsigned long hint, unsigned long vstart, - unsigned long vend) -{ - if (hint < vstart || hint >=3D vend) - return vstart; + if (check_add_overflow(candidate, size - 1, &candidate_end)) + return false; =20 - return hint; -} + if (candidate >=3D search && candidate_end <=3D hole_end) { + *addr =3D candidate; + return true; + } =20 -/* - * Next-fit scan with wrap-around. Use maple to jump to the first candidate - * around the hint in O(log n), then continue on the ordered list for cheap - * neighbour traversal and deterministic coalescing behaviour. - */ -static __always_inline struct vmap_area * -find_vmap_match_list_next_fit(struct list_head *head, unsigned long size, - unsigned long align, unsigned long vstart, - unsigned long vend) -{ - struct vmap_area *va, *start =3D NULL; - unsigned long hint; - bool wrapped; - - hint =3D clamp_vmap_alloc_hint(free_vmap_alloc_hint, vstart, vend); - - if (hint !=3D vstart) { - if (free_mt_supported()) - start =3D __find_vmap_area_exceed_addr_mt(hint, - &free_vmap_area_mt); - - if (start) { - va =3D start; - list_for_each_entry_from(va, head, list) { - if (is_within_this_va(va, size, align, hint)) - return va; - } - } else { - list_for_each_entry(va, head, list) { - if (va->va_end <=3D hint) - continue; + if (hole_end =3D=3D ULONG_MAX) + return false; =20 - if (is_within_this_va(va, size, align, hint)) - return va; - } - } + search =3D hole_end + 1; } =20 - wrapped =3D (hint !=3D vstart); - list_for_each_entry(va, head, list) { - if (wrapped) { - if (start && va =3D=3D start) - break; - if (!start && va->va_start >=3D hint) - break; - } + return false; +} =20 - if (is_within_this_va(va, size, align, vstart)) - return va; - } +static __always_inline unsigned long +occupied_mt_find_hole_lowest_locked(unsigned long size, unsigned long alig= n, + unsigned long vstart, unsigned long vend) +{ + unsigned long addr; =20 - return NULL; + if (occupied_mt_find_hole_window_locked(vstart, vend - 1, size, align, &a= ddr)) + return addr; + + return -ENOENT; } =20 +/* Lowest-match scan directly on maple ordered traversal. */ static __always_inline struct vmap_area * find_vmap_lowest_match_mt(struct maple_tree *tree, unsigned long size, unsigned long align, unsigned long vstart) @@ -1989,24 +1631,26 @@ find_vmap_lowest_match_mt(struct maple_tree *tree, = unsigned long size, #include =20 static struct vmap_area * -find_vmap_lowest_linear_match(struct list_head *head, unsigned long size, - unsigned long align, unsigned long vstart) +find_vmap_lowest_linear_match(struct maple_tree *tree, unsigned long size, + unsigned long align, unsigned long vstart) { + MA_STATE(mas, tree, vstart, vstart); struct vmap_area *va; =20 - list_for_each_entry(va, head, list) { + mas_set(&mas, vstart); + va =3D mas_find(&mas, ULONG_MAX); + while (va) { if (!is_within_this_va(va, size, align, vstart)) - continue; - - return va; + va =3D mas_next(&mas, ULONG_MAX); + else + return va; } =20 - return NULL; + return va; } =20 static void -find_vmap_lowest_match_check(struct list_head *head, unsigned long size, - unsigned long align) +find_vmap_lowest_match_check(unsigned long size, unsigned long align) { struct vmap_area *va_1, *va_2; unsigned long vstart; @@ -2015,11 +1659,8 @@ find_vmap_lowest_match_check(struct list_head *head,= unsigned long size, get_random_bytes(&rnd, sizeof(rnd)); vstart =3D VMALLOC_START + rnd; =20 - if (free_mt_supported()) - va_1 =3D find_vmap_lowest_match_mt(&free_vmap_area_mt, size, align, vsta= rt); - else - va_1 =3D find_vmap_lowest_linear_match(head, size, align, vstart); - va_2 =3D find_vmap_lowest_linear_match(head, size, align, vstart); + va_1 =3D find_vmap_lowest_match_mt(&free_vmap_area_mt, size, align, vstar= t); + va_2 =3D find_vmap_lowest_linear_match(&free_vmap_area_mt, size, align, v= start); =20 if (va_1 !=3D va_2) pr_emerg("not lowest: t: 0x%p, l: 0x%p, v: 0x%lx\n", @@ -2153,39 +1794,38 @@ va_clip(struct vmap_area *va, unsigned long nva_sta= rt_addr, } =20 if (type !=3D FL_FIT_TYPE) { - if (free_mt_supported()) - free_mt_update_va_locked(va, old_start, old_end); + if (free_mt_supported() && + !free_mt_update_va_locked(va, old_start, old_end)) + return -ENOMEM; =20 - if (lva) /* type =3D=3D NE_FIT_TYPE */ - insert_vmap_area_free_locked(lva); + if (lva && !insert_vmap_area_free_locked(lva)) { + kmem_cache_free(vmap_area_cachep, lva); + return -ENOMEM; + } } =20 return 0; } =20 -static unsigned long -va_alloc(struct vmap_area *va, - unsigned long size, unsigned long align, - unsigned long vstart, unsigned long vend) +static __always_inline bool +restore_allocated_vmap_range_free_locked(unsigned long start, unsigned lon= g end) { - unsigned long nva_start_addr; - int ret; + struct vmap_area *va; =20 - if (va->va_start > vstart) - nva_start_addr =3D ALIGN(va->va_start, align); - else - nva_start_addr =3D ALIGN(vstart, align); + lockdep_assert_held(&free_vmap_area_lock); =20 - /* Check the "vend" restriction. */ - if (nva_start_addr + size > vend) - return -ERANGE; + va =3D kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT); + if (!va) + return false; =20 - /* Update the free vmap_area. */ - ret =3D va_clip(va, nva_start_addr, size); - if (WARN_ON_ONCE(ret)) - return ret; + va->va_start =3D start; + va->va_end =3D end; + if (!insert_vmap_area_free_locked(va)) { + kmem_cache_free(vmap_area_cachep, va); + return false; + } =20 - return nva_start_addr; + return true; } =20 /* @@ -2196,34 +1836,42 @@ static __always_inline unsigned long __alloc_vmap_area(unsigned long size, unsigned long align, unsigned long vstart, unsigned long vend) { + int ret; unsigned long nva_start_addr; + unsigned long nva_end_addr; struct vmap_area *va; =20 lockdep_assert_held(&free_vmap_area_lock); =20 - /* - * Next-fit with wrap-around lowers repeated list-head scans in - * high-churn workloads. - */ - va =3D find_vmap_match_list_next_fit(&free_vmap_area_list, size, align, - vstart, vend); + try_init_occupied_mt_locked(); =20 - if (unlikely(!va)) + if (WARN_ON_ONCE(!occupied_mt_supported())) return -ENOENT; =20 - nva_start_addr =3D va_alloc(va, size, align, vstart, vend); - if (!IS_ERR_VALUE(nva_start_addr)) { - unsigned long next_hint; + nva_start_addr =3D occupied_mt_find_hole_lowest_locked(size, align, + vstart, vend); + if (IS_ERR_VALUE(nva_start_addr)) + return nva_start_addr; + nva_end_addr =3D nva_start_addr + size; =20 - if (check_add_overflow(nva_start_addr, size, &next_hint)) - free_vmap_alloc_hint =3D vstart; - else - free_vmap_alloc_hint =3D next_hint; + va =3D __find_vmap_area_mt(nva_start_addr, &free_vmap_area_mt); + if (WARN_ON_ONCE(!va)) + return -ENOENT; + + ret =3D va_clip(va, nva_start_addr, size); + if (WARN_ON_ONCE(ret)) + return ret; + + if (!occupied_mt_store_range_locked(nva_start_addr, nva_end_addr)) { + bool restored; + + restored =3D restore_allocated_vmap_range_free_locked(nva_start_addr, nv= a_end_addr); + WARN_ON_ONCE(!restored); + return -ENOMEM; } =20 #if DEBUG_AUGMENT_LOWEST_MATCH_CHECK - if (!IS_ERR_VALUE(nva_start_addr)) - find_vmap_lowest_match_check(&free_vmap_area_list, size, align); + find_vmap_lowest_match_check(size, align); #endif =20 return nva_start_addr; @@ -2247,7 +1895,8 @@ static void free_vmap_area(struct vmap_area *va) * Insert/Merge it back to the free tree/list. */ spin_lock(&free_vmap_area_lock); - merge_or_add_vmap_area_free_locked(va); + WARN_ON_ONCE(!occupied_mt_erase_va_locked(va)); + WARN_ON_ONCE(!merge_or_add_vmap_area_free_locked(va)); spin_unlock(&free_vmap_area_lock); } =20 @@ -2566,24 +2215,36 @@ static DEFINE_MUTEX(vmap_purge_lock); /* for per-CPU blocks */ static void purge_fragmented_blocks_allcpus(void); =20 -static void -reclaim_list_global(struct list_head *head) +static bool +reclaim_list_global(struct list_head *head, bool erase_occupied, + struct list_head *failed) { struct vmap_area *va, *n; + bool ok =3D true; =20 if (list_empty(head)) - return; + return true; =20 spin_lock(&free_vmap_area_lock); - list_for_each_entry_safe(va, n, head, list) - merge_or_add_vmap_area_free_locked(va); + list_for_each_entry_safe(va, n, head, list) { + list_del_init(&va->list); + if (erase_occupied) + WARN_ON_ONCE(!occupied_mt_erase_va_locked(va)); + if (WARN_ON_ONCE(!merge_or_add_vmap_area_free_locked(va))) { + list_add_tail(&va->list, failed); + ok =3D false; + } + } spin_unlock(&free_vmap_area_lock); + + return ok; } =20 static void decay_va_pool_node(struct vmap_node *vn, bool full_decay) { LIST_HEAD(decay_list); + LIST_HEAD(decay_failed); struct vmap_area *va, *nva; unsigned long n_decay, pool_len; int i; @@ -2612,7 +2273,7 @@ decay_va_pool_node(struct vmap_node *vn, bool full_de= cay) break; =20 list_del_init(&va->list); - merge_or_add_vmap_area(va, &decay_list); + list_add_tail(&va->list, &decay_list); } =20 /* @@ -2629,7 +2290,11 @@ decay_va_pool_node(struct vmap_node *vn, bool full_d= ecay) } } =20 - reclaim_list_global(&decay_list); + WARN_ON_ONCE(!reclaim_list_global(&decay_list, false, &decay_failed)); + list_for_each_entry_safe(va, nva, &decay_failed, list) { + list_del_init(&va->list); + WARN_ON_ONCE(!node_pool_add_va(vn, va)); + } } =20 #define KASAN_RELEASE_BATCH_SIZE 32 @@ -2664,8 +2329,10 @@ static void purge_vmap_node(struct work_struct *work) struct vmap_node *vn =3D container_of(work, struct vmap_node, purge_work); unsigned long nr_purged_pages =3D 0; + unsigned long nr_failed_pages =3D 0; struct vmap_area *va, *n_va; LIST_HEAD(local_list); + LIST_HEAD(local_failed); =20 if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) kasan_release_vmalloc_node(vn); @@ -2691,7 +2358,23 @@ static void purge_vmap_node(struct work_struct *work) =20 atomic_long_sub(nr_purged_pages, &vmap_lazy_nr); =20 - reclaim_list_global(&local_list); + WARN_ON_ONCE(!reclaim_list_global(&local_list, false, &local_failed)); + list_for_each_entry_safe(va, n_va, &local_failed, list) { + unsigned int vn_id =3D decode_vn_id(va->flags); + struct vmap_node *dst; + + list_del_init(&va->list); + dst =3D is_vn_id_valid(vn_id) ? + id_to_node(vn_id) : addr_to_node(va->va_start); + + spin_lock(&dst->lazy.lock); + insert_vmap_area_lazy_locked(va, dst); + spin_unlock(&dst->lazy.lock); + nr_failed_pages +=3D va_size(va) >> PAGE_SHIFT; + } + + if (nr_failed_pages) + atomic_long_add(nr_failed_pages, &vmap_lazy_nr); } =20 /* @@ -2823,6 +2506,15 @@ static void free_vmap_area_noflush(struct vmap_area = *va) vn =3D is_vn_id_valid(vn_id) ? id_to_node(vn_id):addr_to_node(va->va_start); =20 + /* + * Drop occupied-range visibility as soon as the area is freed, even + * though coalescing/reinsertion into the free index remains deferred. + */ + spin_lock(&free_vmap_area_lock); + try_init_occupied_mt_locked(); + WARN_ON_ONCE(!occupied_mt_erase_va_locked(va)); + spin_unlock(&free_vmap_area_lock); + spin_lock(&vn->lazy.lock); insert_vmap_area_lazy_locked(va, vn); spin_unlock(&vn->lazy.lock); @@ -5240,14 +4932,11 @@ EXPORT_SYMBOL_GPL(free_vm_area); =20 #ifdef CONFIG_SMP static __always_inline struct vmap_area * -free_vmap_area_prev(struct vmap_area *va) +free_vmap_area_prev_by_addr(unsigned long addr) { lockdep_assert_held(&free_vmap_area_lock); =20 - if (va->list.prev =3D=3D &free_vmap_area_list) - return NULL; - - return list_entry(va->list.prev, struct vmap_area, list); + return __find_vmap_area_enclose_addr_mt(addr, &free_vmap_area_mt); } =20 /** @@ -5262,19 +4951,9 @@ free_vmap_area_prev(struct vmap_area *va) static struct vmap_area * pvm_find_va_enclose_addr(unsigned long addr) { - struct vmap_area *va; - lockdep_assert_held(&free_vmap_area_lock); =20 - if (free_mt_supported()) - return __find_vmap_area_enclose_addr_mt(addr, &free_vmap_area_mt); - - list_for_each_entry_reverse(va, &free_vmap_area_list, list) { - if (va->va_start <=3D addr) - return va; - } - - return NULL; + return __find_vmap_area_enclose_addr_mt(addr, &free_vmap_area_mt); } =20 /** @@ -5293,13 +4972,19 @@ pvm_determine_end_from_reverse(struct vmap_area **v= a, unsigned long align) unsigned long vmalloc_end =3D VMALLOC_END & ~(align - 1); unsigned long addr; =20 + lockdep_assert_held(&free_vmap_area_lock); + if (likely(*va)) { - list_for_each_entry_from_reverse((*va), - &free_vmap_area_list, list) { + do { addr =3D min((*va)->va_end & ~(align - 1), vmalloc_end); if ((*va)->va_start < addr) return addr; - } + + if ((*va)->va_start =3D=3D 0) + break; + + *va =3D free_vmap_area_prev_by_addr((*va)->va_start - 1); + } while (*va); } =20 return 0; @@ -5382,6 +5067,8 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned l= ong *offsets, } retry: spin_lock(&free_vmap_area_lock); + try_init_free_mt_locked(); + try_init_occupied_mt_locked(); =20 /* start scanning - we scan from the top, begin with the last area */ area =3D term_area =3D last_area; @@ -5419,7 +5106,10 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned = long *offsets, * If this VA does not fit, move base downwards and recheck. */ if (base + start < va->va_start) { - va =3D free_vmap_area_prev(va); + if (va->va_start =3D=3D 0) + va =3D NULL; + else + va =3D free_vmap_area_prev_by_addr(va->va_start - 1); base =3D pvm_determine_end_from_reverse(&va, align) - end; term_area =3D area; continue; @@ -5459,6 +5149,12 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned = long *offsets, va =3D vas[area]; va->va_start =3D start; va->va_end =3D start + size; + + if (occupied_mt_supported() && + !occupied_mt_store_range_locked(va->va_start, va->va_end)) { + area++; + goto recovery; + } } =20 spin_unlock(&free_vmap_area_lock); @@ -5501,11 +5197,14 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned= long *offsets, while (area--) { orig_start =3D vas[area]->va_start; orig_end =3D vas[area]->va_end; + WARN_ON_ONCE(!occupied_mt_erase_va_locked(vas[area])); va =3D merge_or_add_vmap_area_free_locked(vas[area]); + WARN_ON_ONCE(!va); if (va) kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end, - KASAN_VMALLOC_PAGE_RANGE | KASAN_VMALLOC_TLB_FLUSH); + va->va_start, va->va_end, + KASAN_VMALLOC_PAGE_RANGE | + KASAN_VMALLOC_TLB_FLUSH); vas[area] =3D NULL; } =20 @@ -5551,7 +5250,9 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned l= ong *offsets, for (area =3D 0; area < nr_vms; area++) { orig_start =3D vas[area]->va_start; orig_end =3D vas[area]->va_end; + WARN_ON_ONCE(!occupied_mt_erase_va_locked(vas[area])); va =3D merge_or_add_vmap_area_free_locked(vas[area]); + WARN_ON_ONCE(!va); if (va) kasan_release_vmalloc(orig_start, orig_end, va->va_start, va->va_end, @@ -5598,7 +5299,7 @@ bool vmalloc_dump_obj(void *object) if (!spin_trylock(&vn->busy.lock)) return false; =20 - va =3D __find_vmap_area(addr, &vn->busy.root); + va =3D find_vmap_area_busy_locked(addr, vn); if (!va || !va->vm) { spin_unlock(&vn->busy.lock); return false; @@ -5650,11 +5351,17 @@ static void show_purge_info(struct seq_file *m) =20 for_each_vmap_node(vn) { spin_lock(&vn->lazy.lock); - list_for_each_entry(va, &vn->lazy.head, list) { - seq_printf(m, "0x%pK-0x%pK %7ld unpurged vm_area\n", - (void *)va->va_start, (void *)va->va_end, - va_size(va)); + try_init_lazy_mt_locked(vn); + if (WARN_ON_ONCE(!vn->lazy.mt_enabled)) { + spin_unlock(&vn->lazy.lock); + continue; } + MA_STATE(mas, &vn->lazy.mt, 0, 0); + + mas_for_each(&mas, va, ULONG_MAX) + seq_printf(m, "0x%pK-0x%pK %7ld unpurged vm_area\n", + (void *)va->va_start, (void *)va->va_end, + va_size(va)); spin_unlock(&vn->lazy.lock); } } @@ -5671,12 +5378,19 @@ static int vmalloc_info_show(struct seq_file *m, vo= id *p) =20 for_each_vmap_node(vn) { spin_lock(&vn->busy.lock); - list_for_each_entry(va, &vn->busy.head, list) { + try_init_busy_mt_locked(vn); + if (WARN_ON_ONCE(!vn->busy.mt_enabled)) { + spin_unlock(&vn->busy.lock); + continue; + } + MA_STATE(mas, &vn->busy.mt, 0, 0); + + mas_for_each(&mas, va, ULONG_MAX) { if (!va->vm) { if (va->flags & VMAP_RAM) seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n", - (void *)va->va_start, (void *)va->va_end, - va_size(va)); + (void *)va->va_start, (void *)va->va_end, + va_size(va)); =20 continue; } @@ -5689,7 +5403,7 @@ static int vmalloc_info_show(struct seq_file *m, void= *p) smp_rmb(); =20 seq_printf(m, "0x%pK-0x%pK %7ld", - v->addr, v->addr + v->size, v->size); + v->addr, v->addr + v->size, v->size); =20 if (v->caller) seq_printf(m, " %pS", v->caller); @@ -5754,6 +5468,8 @@ static void __init vmap_init_free_space(void) struct vmap_area *free; struct vm_struct *busy; =20 + spin_lock(&free_vmap_area_lock); + /* * B F B B B F * -|-----|.....|-----|-----|-----|.....|- @@ -5761,19 +5477,18 @@ static void __init vmap_init_free_space(void) * |<--------------------------------->| */ for (busy =3D vmlist; busy; busy =3D busy->next) { - if ((unsigned long) busy->addr - vmap_start > 0) { + if ((unsigned long)busy->addr - vmap_start > 0) { free =3D kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT); if (!WARN_ON_ONCE(!free)) { free->va_start =3D vmap_start; - free->va_end =3D (unsigned long) busy->addr; + free->va_end =3D (unsigned long)busy->addr; =20 - insert_vmap_area_augment(free, NULL, - &free_vmap_area_root, - &free_vmap_area_list); + if (WARN_ON_ONCE(!insert_vmap_area_free_locked(free))) + kmem_cache_free(vmap_area_cachep, free); } } =20 - vmap_start =3D (unsigned long) busy->addr + busy->size; + vmap_start =3D (unsigned long)busy->addr + busy->size; } =20 if (vmap_end - vmap_start > 0) { @@ -5782,11 +5497,12 @@ static void __init vmap_init_free_space(void) free->va_start =3D vmap_start; free->va_end =3D vmap_end; =20 - insert_vmap_area_augment(free, NULL, - &free_vmap_area_root, - &free_vmap_area_list); + if (WARN_ON_ONCE(!insert_vmap_area_free_locked(free))) + kmem_cache_free(vmap_area_cachep, free); } } + + spin_unlock(&free_vmap_area_lock); } =20 static void vmap_init_nodes(void) @@ -5825,13 +5541,15 @@ static void vmap_init_nodes(void) #endif =20 for_each_vmap_node(vn) { - vn->busy.root =3D RB_ROOT; - INIT_LIST_HEAD(&vn->busy.head); spin_lock_init(&vn->busy.lock); + mt_init_flags(&vn->busy.mt, MT_FLAGS_LOCK_EXTERN); + mt_set_external_lock(&vn->busy.mt, &vn->busy.lock); + vn->busy.mt_enabled =3D true; =20 - vn->lazy.root =3D RB_ROOT; - INIT_LIST_HEAD(&vn->lazy.head); spin_lock_init(&vn->lazy.lock); + mt_init_flags(&vn->lazy.mt, MT_FLAGS_LOCK_EXTERN); + mt_set_external_lock(&vn->lazy.mt, &vn->lazy.lock); + vn->lazy.mt_enabled =3D true; =20 for (i =3D 0; i < MAX_VA_SIZE_PAGES; i++) { INIT_LIST_HEAD(&vn->pool[i].head); @@ -5881,6 +5599,11 @@ void __init vmalloc_init(void) * Create the cache for vmap_area objects. */ vmap_area_cachep =3D KMEM_CACHE(vmap_area, SLAB_PANIC); + /* + * vmalloc_init() performs Maple stores/preallocation while importing + * early ranges. Ensure Maple node cache is available at this stage. + */ + maple_tree_init(); =20 for_each_possible_cpu(i) { struct vmap_block_queue *vbq; @@ -5911,7 +5634,15 @@ void __init vmalloc_init(void) va->vm =3D tmp; =20 vn =3D addr_to_node(va->va_start); - insert_vmap_area(va, &vn->busy.root, &vn->busy.head); + spin_lock(&vn->busy.lock); + insert_vmap_area_busy_locked(va, vn); + spin_unlock(&vn->busy.lock); + + spin_lock(&free_vmap_area_lock); + try_init_occupied_mt_locked(); + WARN_ON_ONCE(!occupied_mt_store_range_locked(va->va_start, + va->va_end)); + spin_unlock(&free_vmap_area_lock); } =20 /* --=20 2.34.1 From nobody Sat Jun 13 23:22:38 2026 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE32D34AB01 for ; Sat, 13 Jun 2026 17:21:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371271; cv=none; b=k4x3C1U4Nsq8ObOtlC3md7xIebrg21bq5m2SucsbGRygKJ7/2C5h+E6hWQEnCsmldQ7rlgI9chsmu+gdKsTiWfcqW+jZW/56fAELVtETcB7z/H6yZvd99soe1pECfJKJbHWpxQYvx66ZWM7Vk2A0D+PltT1PC//XUO5fd5ZLhrs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371271; c=relaxed/simple; bh=93nd/fp0WT0lOBo8bTdnf+a1qmT1fxj8wsunRs1e6EU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=tkaI1cDu4qzAd/CEe9GePeL8qQNcgCvV6X4lqJls4GdCTzGNjNM5hNJvxPvyK+h4Xo01FBag6+7U3Wy1DCuPfHiZ3+bO7jCDrSho1UGgo5jXkyobxC0lSmi/5RAdfIJQdUVwFCmJ0NKZJBDxOTDJ2SFvJ+3cY8EYJBD232DdzAw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com; spf=pass smtp.mailfrom=oss.qualcomm.com; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b=MZlaAldJ; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b=L/OTmlLL; arc=none smtp.client-ip=205.220.180.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b="MZlaAldJ"; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="L/OTmlLL" Received: from pps.filterd (m0279872.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65DFBAI13144349 for ; Sat, 13 Jun 2026 17:21:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= yByRTqcgr+Wmflbfnv6FJYfUST+aqgHU9xDBqMKtVHU=; b=MZlaAldJLzVAYK1l p0tS7zIIn+KuuOOBF24rcUDf3A/f+vnzj5jLpjqiS2X9Qr4zcEe3wmQDqzuicrSR w+/7ceBEXQ9NySVojXFWjQFlFgN/WMVujKUQAcwmQT7Azeg36FTPVHU8+0rImlSX 7nUYkEL6vx59+qkd/D47M8CdBEsu1OUlVmLh2hf0yI9zd3IZOk+emYHZ2gKaEGlH ND2+LcthBI/R2ikjP6Tkh4TZhnuiBsaTZd92V0jn9jy02ZCGPxxts1zTiekvv4qg 0bpUUcZtNMRdXV1CPb453q5PzyDOXPLVsBAQ48qNJPakvXcCDrWFMsdwrxxVZty6 Q5KIng== Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4eryk69mm9-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Sat, 13 Jun 2026 17:21:07 +0000 (GMT) Received: by mail-pg1-f199.google.com with SMTP id 41be03b00d2f7-c85a2c305d0so991088a12.1 for ; Sat, 13 Jun 2026 10:21:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1781371267; x=1781976067; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=yByRTqcgr+Wmflbfnv6FJYfUST+aqgHU9xDBqMKtVHU=; b=L/OTmlLLrB3G+/kOf4gt5JxfjsMI9gxBwCjqusPFzqvD7GH4Z18FrIXuWi1foVC/Er ekvDGZ4o66C4BOODOZB5HKTi5sCnldwXhY5CI6sEzyKrLdKjvL5kt7nZEwo3zXPV1XkH MWqUCRkhfybvmtrgbI4KB9AWQO/L3MdUuZr5FBBunOt1bJMXSSbFxr31BKi6aBEqnkkF 1WetkJ5xRJsqY2c43fAgZ8NBrtUcMS3gzCgfzBlPHskTaTEiW+12Yf+ol+wOl51q6vqy biXteeUh9TuzInUvTJa8nT9Z1P+PoetC2IPjhlGQ+P7HqNARXDH+4fE5/WM1tlojY0xV JOLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781371267; x=1781976067; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=yByRTqcgr+Wmflbfnv6FJYfUST+aqgHU9xDBqMKtVHU=; b=NOIZPmnhKznpC9BhdZ7iLNh0IngMmQIYFWhMO9WooCgUWqJ+AvYiQbEftS+yHnaPmi iWteqtBbdc0h7bjVQRnFD+2RDRBn/3zDCKXvMR3ZoVTTTKGoLAHF8sBudJcOG8kVSkYf 0GHY/Yiwr/7ejJVokIa6WMXxrgfHn+tNd8kg0OvzxnL04Pa5AP8db9YQctXwHfyCPZ2F vv4gTAU4WZVGAneHsMoAGJ8IqB5s0uf/iA3skW/BYuf0CrwAM/Gz+i+TYmUTb3FWoI/M 0XnuG55MzejA5PfCKPVr73M8M3IX+/ih1QE3aR8Wc3GycE1FuUMG9X/llX8+zYBjk+uw dD6Q== X-Forwarded-Encrypted: i=1; AFNElJ/yVHMFwQr8O1aVxR3htjrsZhUJ37JJ99pc3PDsjUajcS78V/hcA6F4olPUT3vJ/xW6oRtqFvFftwFm518=@vger.kernel.org X-Gm-Message-State: AOJu0Yx4uTNvoPTOncmAL3cqotF3FYiPi5+V4YerjRMl2dP6n2FC1DK2 SLxBENobi7hzzwb+0P3icN74aS8O25WPm7kdvNry0SfMzZbBB13jl023jsH0OcQ2u7bPZkG97aK JnYZKRC3dLToiwsBz3XYnFddcKIv97/wikCluhfOg0QNNB21OwNed+UNetETvAEWPMrI= X-Gm-Gg: Acq92OEGcF5RHc6IsqZnTsMFXs/2Dvu4Zr1IY335g00HQf3oHmvmQvjE3TgQyzu9Dy4 IMKmjXs3HH9OT/T0A9ApG5ke15y/BUnEROBHAusdBf+89uHFWtL43MhVANmjiiv7YYcLduRReYt w5+Kws1u7OcPy0a5wjc3/VjhGHUrmIQ5aBd4MIjdlXr9kvM4lBbHNtfpiI9fZHOaLIxmUWzSLVn wqsEZUPpwj9Hc1xXNHdbJszT2liAsfoCDjH8LFiOguaH4D9gYDtl7o0uKT149fUs2XhhoDGYdFX l5SEZQ2frmoZQXQn28JJBw8kEYPTZOjdPjP8cWbHAUYMTfEgxCBP4hV8U11AIQwQrHpKlEQVkT6 Ed++A7M3SeYnC4VfvuZVKsBz2bHEgtJNbod91woQP70FGSxOQ1J0z0Q== X-Received: by 2002:a05:6a00:950d:b0:842:2f3d:dff2 with SMTP id d2e1a72fcca58-8434ce81341mr7553840b3a.34.1781371266342; Sat, 13 Jun 2026 10:21:06 -0700 (PDT) X-Received: by 2002:a05:6a00:950d:b0:842:2f3d:dff2 with SMTP id d2e1a72fcca58-8434ce81341mr7553803b3a.34.1781371265702; Sat, 13 Jun 2026 10:21:05 -0700 (PDT) Received: from hu-pranarya-hyd.qualcomm.com ([202.46.22.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434accbec5sm5390913b3a.16.2026.06.13.10.20.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 10:21:05 -0700 (PDT) From: Pranjal Arya Date: Sat, 13 Jun 2026 22:49:47 +0530 Subject: [PATCH RFC 05/12] mm/vmalloc: tighten failure handling under memory pressure Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260613-vmalloc_maple-v1-5-0aa740bb944b@oss.qualcomm.com> References: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> In-Reply-To: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> To: Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Alice Ryhl , Andrew Ballance Cc: linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, Lorenzo Stoakes , Pranjal Shrivastava , Will Deacon , Suzuki K Poulose , Neil Armstrong , Mostafa Saleh , Balbir Singh , Suren Baghdasaryan , Marco Elver , Dmitry Vyukov , Alexander Potapenko , Shuah Khan , Dev Jain , Brendan Jackman , Puranjay Mohan , Santosh Shukla , Wyes Karny , Pranjal Arya , Sudeep Holla X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1781371215; l=27712; i=pranjal.arya@oss.qualcomm.com; s=20260516; h=from:subject:message-id; bh=93nd/fp0WT0lOBo8bTdnf+a1qmT1fxj8wsunRs1e6EU=; b=u5NeW4Ud9Q3H5+3jTYofL7oc8m5kRtaEgxllCWom5Z0j0JW8ZN6Ht4QAizUAGKSg0IlzZP5gc 14TKEq5dlwXDvGo3hbqVIVJiZ2SroiXTIFhM+jxPuhZStMZf5j4gbD4 X-Developer-Key: i=pranjal.arya@oss.qualcomm.com; a=ed25519; pk=ymtcTlccEIDsi3ErhpjIoZZHKdPBYWGWW0Lchs5MsbE= X-Proofpoint-Spam-Info: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX0U6qqk3ed6u4 ZM5zfT8jOecqpdGlP4GMo4fXO5rHVek5T2do5mbejCMVrQo+hinwq4oYhMNf56B79tsHIYFVIwg 2qUcO9s5eyiZDcWjQrSvKso0mglORww= X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX9U4pQqeHOilN yCkDiYp9AflawDhMTgiJzOkpd3i3nGmr6UKKXuAPg/X900Cv5YLSR1mwl3ksBUEzniAGq5t9q+B IYOfzmaXHp8Fvt2GzqF0Vcf4CLVivkMCQpxADCudyjnnZWWamfa5H1bGPq+Swzt7195OsvQ34WY E84q34g0uRrRLqDqzPEGhs25e4m0CrL5fBCTucwsKqoW78/7fXe8sav4zZQ7aOQPKBjb3KxA2oU xdM30eZKbOVsEbs6a/jXlDrM58NfEK0z3XUuw88+8tieBRZaj8Np6nluweBYCKwzda1+n02N5yU gH0ymUAikkH/ds2uMd2u7qqpK85p3vTAh7BpgUmwUzXEeWRCmGYYwOApsykDGWFnJGv9BBcPhD+ 2EeIdwVH9PKnyXC5JTuMfvZveOpBs8c5/wHYANhq5+KMswWW2zJly6m0Qw0FXDfELSy3awB9l5z h7nYqX7uczh18lIiZFg== X-Authority-Analysis: v=2.4 cv=NrThtcdJ c=1 sm=1 tr=0 ts=6a2d9183 cx=c_pps a=Oh5Dbbf/trHjhBongsHeRQ==:117 a=fChuTYTh2wq5r3m49p7fHw==:17 a=IkcTkHD0fZMA:10 a=FelO9ux0wxsA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=yx91gb_oNiZeI1HMLzn7:22 a=EUspDBNiAAAA:8 a=o4zA9CgbANw9Vykgm7UA:9 a=QEXdDO2ut3YA:10 a=_Vgx9l1VpLgwpw_dHYaR:22 X-Proofpoint-GUID: Z9neF0u21kQyo75wlw5cjwa5LKCBJCel X-Proofpoint-ORIG-GUID: Z9neF0u21kQyo75wlw5cjwa5LKCBJCel X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-13_03,2026-06-12_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 spamscore=0 bulkscore=0 suspectscore=0 impostorscore=0 clxscore=1015 priorityscore=1501 phishscore=0 adultscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606040000 definitions=main-2606130180 Tighten failure handling on the two paths that publish into the maple_tree under a spinlock and have no caller-friendly way to return -ENOMEM: - free_vmap_area_noflush() falls back to vmap_retry_list when publish_vmap_area_lazy() can't allocate maple slabs under GFP_NOWAIT, and reschedules drain_vmap_work to retry. - the alloc path rolls the busy insert back onto the retry queue if insert_vmap_area_busy_locked() fails, rather than leaking the vmap_area or panicking. Add vmap_retry_list as a non-indexed retry queue scanned by the allocator as an exclusion set and drained from the purge worker, and wire the two publish-failure paths above through it. Signed-off-by: Pranjal Arya --- mm/vmalloc.c | 566 +++++++++++++++++++++++++++++++++++++++++++++++++------= ---- 1 file changed, 474 insertions(+), 92 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index c908c1a0fcd4..7feb1b182cfa 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -869,6 +869,12 @@ EXPORT_SYMBOL(vmalloc_to_pfn); =20 static DEFINE_SPINLOCK(free_vmap_area_lock); static bool vmap_initialized __read_mostly; +/* + * Non-index retry queue for ranges that could not be transitioned to their + * target maple index state in constrained paths. This queue is scanned by= the + * allocator as an exclusion set and drained by purge workers. + */ +static LIST_HEAD(vmap_retry_list); =20 /* * This kmem_cache is used for vmap_area objects. Instead of @@ -1113,6 +1119,47 @@ free_mt_update_va_locked(struct vmap_area *va, unsig= ned long old_start, return true; } =20 +static __always_inline void +retry_queue_add_va_locked(struct vmap_area *va) +{ + lockdep_assert_held(&free_vmap_area_lock); + + /* + * Keep a VA on one list at a time. Retry entries are detached from + * all indexed containers before they are queued here. + */ + if (unlikely(!READ_ONCE(va->list.next) && !READ_ONCE(va->list.prev))) + INIT_LIST_HEAD(&va->list); + if (WARN_ON_ONCE(!list_empty(&va->list))) + return; + list_add_tail(&va->list, &vmap_retry_list); +} + +static __always_inline bool +retry_queue_overlap_locked(unsigned long start, unsigned long end, + unsigned long *blocked_end) +{ + struct vmap_area *va; + bool overlap =3D false; + + lockdep_assert_held(&free_vmap_area_lock); + + if (list_empty(&vmap_retry_list)) + return false; + + list_for_each_entry(va, &vmap_retry_list, list) { + unsigned long va_end =3D va->va_end - 1; + + if (va->va_start > end || va_end < start) + continue; + + overlap =3D true; + *blocked_end =3D max(*blocked_end, va_end); + } + + return overlap; +} + static __always_inline void try_init_free_mt_locked(void) { lockdep_assert_held(&free_vmap_area_lock); @@ -1169,6 +1216,14 @@ occupied_mt_store_range_locked(unsigned long start, = unsigned long end) return !WARN_ON_ONCE(err); } =20 +static __always_inline bool +occupied_mt_store_va_locked(struct vmap_area *va) +{ + lockdep_assert_held(&free_vmap_area_lock); + + return occupied_mt_store_range_locked(va->va_start, va->va_end); +} + static __always_inline bool occupied_mt_erase_range_locked(unsigned long start, unsigned long end) { @@ -1339,7 +1394,7 @@ find_vmap_area_exceed_addr_lock(unsigned long addr, s= truct vmap_area **va) return NULL; } =20 -static __always_inline void +static __always_inline bool insert_vmap_area_busy_locked(struct vmap_area *va, struct vmap_node *vn) { int err; @@ -1349,12 +1404,12 @@ insert_vmap_area_busy_locked(struct vmap_area *va, = struct vmap_node *vn) try_init_busy_mt_locked(vn); =20 if (WARN_ON_ONCE(!vn->busy.mt_enabled)) - return; + return false; =20 if (!validate_vmap_area_range_insert_mt_locked(&vn->busy.mt, va->va_start, va->va_end)) - return; + return false; =20 INIT_LIST_HEAD(&va->list); =20 @@ -1364,11 +1419,11 @@ insert_vmap_area_busy_locked(struct vmap_area *va, = struct vmap_node *vn) if (!err) { mas_store_prealloc(&mas, va); mas_destroy(&mas); - return; + return true; } =20 err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); - WARN_ON_ONCE(err); + return !WARN_ON_ONCE(err); } =20 static __always_inline void @@ -1391,7 +1446,7 @@ unlink_vmap_area_busy_locked(struct vmap_area *va, st= ruct vmap_node *vn) INIT_LIST_HEAD(&va->list); } =20 -static __always_inline void +static __always_inline bool insert_vmap_area_lazy_locked(struct vmap_area *va, struct vmap_node *vn) { int err; @@ -1400,12 +1455,12 @@ insert_vmap_area_lazy_locked(struct vmap_area *va, = struct vmap_node *vn) =20 try_init_lazy_mt_locked(vn); if (WARN_ON_ONCE(!vn->lazy.mt_enabled)) - return; + return false; =20 if (!validate_vmap_area_range_insert_mt_locked(&vn->lazy.mt, va->va_start, va->va_end)) - return; + return false; =20 INIT_LIST_HEAD(&va->list); =20 @@ -1415,11 +1470,72 @@ insert_vmap_area_lazy_locked(struct vmap_area *va, = struct vmap_node *vn) if (!err) { mas_store_prealloc(&mas, va); mas_destroy(&mas); - return; + return true; } =20 err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); - WARN_ON_ONCE(err); + return !WARN_ON_ONCE(err); +} + +static __always_inline bool +unlink_vmap_area_lazy_locked(struct vmap_area *va, struct vmap_node *vn) +{ + int err; + + lockdep_assert_held(&vn->lazy.lock); + + try_init_lazy_mt_locked(vn); + if (WARN_ON_ONCE(!vn->lazy.mt_enabled)) + return false; + + MA_STATE(mas, &vn->lazy.mt, va->va_start, va->va_end - 1); + + err =3D mas_store_gfp(&mas, NULL, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) + return false; + + INIT_LIST_HEAD(&va->list); + return true; +} + +/* + * Transition a VA into the lazy index and drop occupied tracking. On occu= pied + * erase failure, attempt to roll back the lazy insertion; if rollback fai= ls we + * keep the lazy entry and let purge-side erase_occupied handling repair s= tale + * occupied state. + * + * Returns true when the VA remains lazy-indexed; false when it should be + * retried via non-index queue. + */ +static __always_inline bool +publish_vmap_area_lazy(struct vmap_area *va, struct vmap_node *vn) +{ + bool lazy_kept =3D false; + + spin_lock(&vn->lazy.lock); + if (unlikely(!insert_vmap_area_lazy_locked(va, vn))) { + spin_unlock(&vn->lazy.lock); + return false; + } + + /* + * Keep lazy.lock held while dropping occupied tracking so purge-side + * lazy extraction cannot move @va to purge_list during rollback. + */ + spin_lock(&free_vmap_area_lock); + try_init_occupied_mt_locked(); + if (likely(occupied_mt_erase_va_locked(va))) { + spin_unlock(&free_vmap_area_lock); + spin_unlock(&vn->lazy.lock); + return true; + } + spin_unlock(&free_vmap_area_lock); + + if (unlikely(!unlink_vmap_area_lazy_locked(va, vn))) + lazy_kept =3D true; + spin_unlock(&vn->lazy.lock); + + return lazy_kept; } =20 static __always_inline bool @@ -1437,7 +1553,9 @@ lazy_vmap_areas_empty_locked(struct vmap_node *vn) static __always_inline void move_lazy_vmap_areas_to_purge_locked(struct vmap_node *vn) { - struct vmap_area *va; + LIST_HEAD(move_list); + struct vmap_area *va, *n_va; + int err; =20 lockdep_assert_held(&vn->lazy.lock); =20 @@ -1448,12 +1566,25 @@ move_lazy_vmap_areas_to_purge_locked(struct vmap_no= de *vn) MA_STATE(mas, &vn->lazy.mt, 0, 0); =20 mas_for_each(&mas, va, ULONG_MAX) - list_add_tail(&va->list, &vn->purge_list); + list_add_tail(&va->list, &move_list); + + /* + * Erase ranges one-by-one and move only successfully erased entries to + * purge_list. This avoids destroy/reinit churn and keeps lazy index + * coherence if an erase operation fails under pressure. + */ + list_for_each_entry_safe(va, n_va, &move_list, list) { + MA_STATE(mas_erase, &vn->lazy.mt, va->va_start, va->va_end - 1); + + err =3D mas_store_gfp(&mas_erase, NULL, GFP_ATOMIC | __GFP_NOWARN); + if (unlikely(err)) { + WARN_ON_ONCE(err); + list_del_init(&va->list); + continue; + } =20 - __mt_destroy(&vn->lazy.mt); - mt_init_flags(&vn->lazy.mt, MT_FLAGS_LOCK_EXTERN); - mt_set_external_lock(&vn->lazy.mt, &vn->lazy.lock); - vn->lazy.mt_enabled =3D true; + list_move_tail(&va->list, &vn->purge_list); + } } =20 static __always_inline bool @@ -1463,11 +1594,6 @@ insert_vmap_area_free_locked(struct vmap_area *va) =20 lockdep_assert_held(&free_vmap_area_lock); =20 - try_init_free_mt_locked(); - - if (unlikely(!free_mt_supported())) - return false; - prev =3D __find_vmap_area_enclose_addr_mt(va->va_start, &free_vmap_area_m= t); if (prev && WARN_ON_ONCE(prev->va_end > va->va_start)) return false; @@ -1512,16 +1638,16 @@ merge_or_add_vmap_area_free_locked(struct vmap_area= *va) if (left && WARN_ON_ONCE(left->va_end > new_start)) return NULL; =20 + right =3D __find_vmap_area_exceed_addr_mt(new_start, &free_vmap_area_mt); + if (right && WARN_ON_ONCE(right->va_start < new_end)) + return NULL; + if (left && left->va_end =3D=3D new_start) { new_start =3D left->va_start; unlink_vmap_area_free_locked(left); kmem_cache_free(vmap_area_cachep, left); } =20 - right =3D __find_vmap_area_exceed_addr_mt(new_start, &free_vmap_area_mt); - if (right && WARN_ON_ONCE(right->va_start < new_end)) - return NULL; - if (right && right->va_start =3D=3D new_end) { new_end =3D right->va_end; unlink_vmap_area_free_locked(right); @@ -1580,9 +1706,28 @@ occupied_mt_find_hole_window_locked(unsigned long mi= n, unsigned long max, if (check_add_overflow(candidate, size - 1, &candidate_end)) return false; =20 - if (candidate >=3D search && candidate_end <=3D hole_end) { - *addr =3D candidate; - return true; + while (candidate >=3D search && candidate_end <=3D hole_end) { + unsigned long blocked_end =3D 0; + + if (!retry_queue_overlap_locked(candidate, candidate_end, + &blocked_end)) { + *addr =3D candidate; + return true; + } + + if (blocked_end >=3D hole_end) + break; + + blocked_end++; + if (!blocked_end) + return false; + + candidate =3D ALIGN(blocked_end, align); + if (candidate < blocked_end) + return false; + + if (check_add_overflow(candidate, size - 1, &candidate_end)) + return false; } =20 if (hole_end =3D=3D ULONG_MAX) @@ -1828,6 +1973,70 @@ restore_allocated_vmap_range_free_locked(unsigned lo= ng start, unsigned long end) return true; } =20 +/* + * Roll back an allocated range when busy insertion fails. Prefer returning + * it to the free tree; if that is not possible, keep occupied tracking so + * the range stays reserved and allocator state remains coherent. + * + * Returns true when @va remains referenced by the free tree and must not = be + * freed by the caller. Returns false when the caller owns @va. + */ +static __always_inline bool +rollback_busy_insert_failed_alloc_locked(struct vmap_area *va) +{ + lockdep_assert_held(&free_vmap_area_lock); + + if (!insert_vmap_area_free_locked(va)) { + retry_queue_add_va_locked(va); + return true; + } + + if (occupied_mt_erase_va_locked(va)) + return true; + + if (free_mt_erase_va_locked(va)) { + retry_queue_add_va_locked(va); + return true; + } + + /* + * Occupied erase failed and we could not remove the temporary free + * insertion. Keep @va alive: both trees still reference this range. + */ + return true; +} + +/* + * Reinsert @va into the free index after occupied erase. On failure, plac= e the + * range on the non-index retry queue and best-effort restore occupied tra= cking. + * + * Return: free-tracked @va on success, NULL when queued for retry. + */ +static __always_inline struct vmap_area * +reinsert_or_queue_vmap_area_locked(struct vmap_area *va) +{ + struct vmap_area *tracked; + + lockdep_assert_held(&free_vmap_area_lock); + + tracked =3D merge_or_add_vmap_area_free_locked(va); + if (tracked) + return tracked; + + if (insert_vmap_area_free_locked(va)) + return va; + + /* + * Retry queue acts as allocation exclusion even if occupied restore + * fails under pressure. + */ + if (WARN_ON_ONCE(!occupied_mt_store_va_locked(va))) + INIT_LIST_HEAD(&va->list); + + retry_queue_add_va_locked(va); + return NULL; +} + /* * Returns a start address of the newly allocated area, if success. * Otherwise an error value is returned that indicates failure. @@ -1840,22 +2049,42 @@ __alloc_vmap_area(unsigned long size, unsigned long= align, unsigned long nva_start_addr; unsigned long nva_end_addr; struct vmap_area *va; + MA_STATE(mas, &free_vmap_area_mt, 0, 0); =20 lockdep_assert_held(&free_vmap_area_lock); =20 try_init_occupied_mt_locked(); =20 - if (WARN_ON_ONCE(!occupied_mt_supported())) + if (WARN_ON_ONCE(!size || !align || vstart >=3D vend)) + return -EINVAL; + if (size > vend - vstart) return -ENOENT; =20 - nva_start_addr =3D occupied_mt_find_hole_lowest_locked(size, align, - vstart, vend); - if (IS_ERR_VALUE(nva_start_addr)) - return nva_start_addr; - nva_end_addr =3D nva_start_addr + size; + /* + * Free maple index is authoritative for allocatable ranges; lazy and + * retry entries are intentionally excluded from it. + */ + mas_set(&mas, vstart); + va =3D mas_find(&mas, vend - 1); + while (va) { + unsigned long search_start =3D max(va->va_start, vstart); + unsigned long candidate_end; + + nva_start_addr =3D ALIGN(search_start, align); + if (nva_start_addr < search_start) + return -ERANGE; =20 - va =3D __find_vmap_area_mt(nva_start_addr, &free_vmap_area_mt); - if (WARN_ON_ONCE(!va)) + if (check_add_overflow(nva_start_addr, size - 1, &candidate_end)) + return -ERANGE; + + if (candidate_end < vend && candidate_end < va->va_end) { + nva_end_addr =3D candidate_end + 1; + break; + } + + va =3D mas_next(&mas, vend - 1); + } + if (!va) return -ENOENT; =20 ret =3D va_clip(va, nva_start_addr, size); @@ -1883,6 +2112,7 @@ __alloc_vmap_area(unsigned long size, unsigned long a= lign, static void free_vmap_area(struct vmap_area *va) { struct vmap_node *vn =3D addr_to_node(va->va_start); + bool queued_retry =3D false; =20 /* * Remove from the busy tree/list. @@ -1895,9 +2125,19 @@ static void free_vmap_area(struct vmap_area *va) * Insert/Merge it back to the free tree/list. */ spin_lock(&free_vmap_area_lock); - WARN_ON_ONCE(!occupied_mt_erase_va_locked(va)); - WARN_ON_ONCE(!merge_or_add_vmap_area_free_locked(va)); + if (unlikely(!occupied_mt_erase_va_locked(va))) { + retry_queue_add_va_locked(va); + queued_retry =3D true; + spin_unlock(&free_vmap_area_lock); + goto out_schedule_retry; + } + if (!reinsert_or_queue_vmap_area_locked(va)) + queued_retry =3D true; spin_unlock(&free_vmap_area_lock); + +out_schedule_retry: + if (queued_retry) + schedule_work(&drain_vmap_work); } =20 static inline void @@ -2119,6 +2359,7 @@ static struct vmap_area *alloc_vmap_area(unsigned lon= g size, va->va_end =3D addr + size; va->vm =3D NULL; va->flags =3D (va_flags | vn_id); + INIT_LIST_HEAD(&va->list); =20 if (vm) { vm->addr =3D (void *)va->va_start; @@ -2129,8 +2370,29 @@ static struct vmap_area *alloc_vmap_area(unsigned lo= ng size, vn =3D addr_to_node(va->va_start); =20 spin_lock(&vn->busy.lock); - insert_vmap_area_busy_locked(va, vn); + ret =3D insert_vmap_area_busy_locked(va, vn) ? 0 : -ENOMEM; spin_unlock(&vn->busy.lock); + if (ret) { + bool keep_va =3D false; + + va->vm =3D NULL; + spin_lock(&free_vmap_area_lock); + keep_va =3D rollback_busy_insert_failed_alloc_locked(va); + spin_unlock(&free_vmap_area_lock); + + if (!keep_va) + kmem_cache_free(vmap_area_cachep, va); + else + schedule_work(&drain_vmap_work); + + if (vm) { + vm->addr =3D NULL; + vm->size =3D 0; + vm->requested_size =3D 0; + } + + return ERR_PTR(ret); + } =20 BUG_ON(!IS_ALIGNED(va->va_start, align)); BUG_ON(va->va_start < vstart); @@ -2221,21 +2483,40 @@ reclaim_list_global(struct list_head *head, bool er= ase_occupied, { struct vmap_area *va, *n; bool ok =3D true; + bool queue_retry_work =3D false; =20 if (list_empty(head)) return true; =20 spin_lock(&free_vmap_area_lock); list_for_each_entry_safe(va, n, head, list) { + bool occupied_erased =3D false; + list_del_init(&va->list); - if (erase_occupied) - WARN_ON_ONCE(!occupied_mt_erase_va_locked(va)); - if (WARN_ON_ONCE(!merge_or_add_vmap_area_free_locked(va))) { - list_add_tail(&va->list, failed); - ok =3D false; + if (erase_occupied) { + if (WARN_ON_ONCE(!occupied_mt_erase_va_locked(va))) { + list_add_tail(&va->list, failed); + ok =3D false; + continue; + } + + occupied_erased =3D true; + } + if (WARN_ON_ONCE(!merge_or_add_vmap_area_free_locked(va))) { + if (occupied_erased && + WARN_ON_ONCE(!occupied_mt_store_va_locked(va))) { + retry_queue_add_va_locked(va); + queue_retry_work =3D true; + ok =3D false; + continue; + } + list_add_tail(&va->list, failed); + ok =3D false; } } spin_unlock(&free_vmap_area_lock); + if (queue_retry_work) + schedule_work(&drain_vmap_work); =20 return ok; } @@ -2330,6 +2611,7 @@ static void purge_vmap_node(struct work_struct *work) struct vmap_node, purge_work); unsigned long nr_purged_pages =3D 0; unsigned long nr_failed_pages =3D 0; + bool queued_retry =3D false; struct vmap_area *va, *n_va; LIST_HEAD(local_list); LIST_HEAD(local_failed); @@ -2358,7 +2640,7 @@ static void purge_vmap_node(struct work_struct *work) =20 atomic_long_sub(nr_purged_pages, &vmap_lazy_nr); =20 - WARN_ON_ONCE(!reclaim_list_global(&local_list, false, &local_failed)); + WARN_ON_ONCE(!reclaim_list_global(&local_list, true, &local_failed)); list_for_each_entry_safe(va, n_va, &local_failed, list) { unsigned int vn_id =3D decode_vn_id(va->flags); struct vmap_node *dst; @@ -2367,14 +2649,60 @@ static void purge_vmap_node(struct work_struct *wor= k) dst =3D is_vn_id_valid(vn_id) ? id_to_node(vn_id) : addr_to_node(va->va_start); =20 - spin_lock(&dst->lazy.lock); - insert_vmap_area_lazy_locked(va, dst); - spin_unlock(&dst->lazy.lock); - nr_failed_pages +=3D va_size(va) >> PAGE_SHIFT; + if (publish_vmap_area_lazy(va, dst)) { + nr_failed_pages +=3D va_size(va) >> PAGE_SHIFT; + continue; + } + + spin_lock(&free_vmap_area_lock); + retry_queue_add_va_locked(va); + spin_unlock(&free_vmap_area_lock); + queued_retry =3D true; } =20 if (nr_failed_pages) atomic_long_add(nr_failed_pages, &vmap_lazy_nr); + + if (queued_retry) + schedule_work(&drain_vmap_work); +} + +static void drain_vmap_retry_queue(void) +{ + struct vmap_area *va, *n_va; + bool queued_retry =3D false; + LIST_HEAD(local_retry); + + spin_lock(&free_vmap_area_lock); + if (list_empty(&vmap_retry_list)) { + spin_unlock(&free_vmap_area_lock); + return; + } + + list_splice_init(&vmap_retry_list, &local_retry); + spin_unlock(&free_vmap_area_lock); + + list_for_each_entry_safe(va, n_va, &local_retry, list) { + struct vmap_node *vn =3D addr_to_node(va->va_start); + + list_del_init(&va->list); + if (publish_vmap_area_lazy(va, vn)) { + atomic_long_add(va_size(va) >> PAGE_SHIFT, &vmap_lazy_nr); + continue; + } + + spin_lock(&free_vmap_area_lock); + retry_queue_add_va_locked(va); + spin_unlock(&free_vmap_area_lock); + queued_retry =3D true; + } + + /* + * Ensure retry-only backlog keeps making progress even if no new free + * events arrive to trigger another purge pass. + */ + if (queued_retry) + schedule_work(&drain_vmap_work); } =20 /* @@ -2392,6 +2720,9 @@ static bool __purge_vmap_area_lazy(unsigned long star= t, unsigned long end, =20 lockdep_assert_held(&vmap_purge_lock); =20 + /* Retry queued transitions first, so they can join this purge cycle. */ + drain_vmap_retry_queue(); + /* * Use cpumask to mark which node has to be processed. */ @@ -2489,6 +2820,7 @@ static void free_vmap_area_noflush(struct vmap_area *= va) { unsigned long nr_lazy_max =3D lazy_max_pages(); unsigned long va_start =3D va->va_start; + unsigned long nr_pages =3D va_size(va) >> PAGE_SHIFT; unsigned int vn_id =3D decode_vn_id(va->flags); struct vmap_node *vn; unsigned long nr_lazy; @@ -2496,9 +2828,6 @@ static void free_vmap_area_noflush(struct vmap_area *= va) if (WARN_ON_ONCE(!list_empty(&va->list))) return; =20 - nr_lazy =3D atomic_long_add_return_relaxed(va_size(va) >> PAGE_SHIFT, - &vmap_lazy_nr); - /* * If it was request by a certain node we would like to * return it to that node, i.e. its pool for later reuse. @@ -2506,18 +2835,20 @@ static void free_vmap_area_noflush(struct vmap_area= *va) vn =3D is_vn_id_valid(vn_id) ? id_to_node(vn_id):addr_to_node(va->va_start); =20 - /* - * Drop occupied-range visibility as soon as the area is freed, even - * though coalescing/reinsertion into the free index remains deferred. - */ - spin_lock(&free_vmap_area_lock); - try_init_occupied_mt_locked(); - WARN_ON_ONCE(!occupied_mt_erase_va_locked(va)); - spin_unlock(&free_vmap_area_lock); + if (publish_vmap_area_lazy(va, vn)) { + nr_lazy =3D atomic_long_add_return_relaxed(nr_pages, &vmap_lazy_nr); + } else { + spin_lock(&free_vmap_area_lock); + retry_queue_add_va_locked(va); + nr_lazy =3D atomic_long_read(&vmap_lazy_nr); + spin_unlock(&free_vmap_area_lock); =20 - spin_lock(&vn->lazy.lock); - insert_vmap_area_lazy_locked(va, vn); - spin_unlock(&vn->lazy.lock); + /* + * Retry transitions are drained from purge context; poke it + * immediately so transient pressure does not prolong retention. + */ + schedule_work(&drain_vmap_work); + } =20 trace_free_vmap_area_noflush(va_start, nr_lazy, nr_lazy_max); =20 @@ -5023,6 +5354,8 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned l= ong *offsets, struct vmap_area **vas, *va; struct vm_struct **vms; int area, area2, last_area, term_area; + int inserted_busy =3D 0; + bool queued_retry =3D false; unsigned long base, start, size, end, last_end, orig_start, orig_end; bool purged =3D false; =20 @@ -5061,6 +5394,8 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned l= ong *offsets, =20 for (area =3D 0; area < nr_vms; area++) { vas[area] =3D kmem_cache_zalloc(vmap_area_cachep, GFP_KERNEL); + if (vas[area]) + INIT_LIST_HEAD(&vas[area]->list); vms[area] =3D kzalloc_obj(struct vm_struct); if (!vas[area] || !vms[area]) goto err_free; @@ -5170,10 +5505,14 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned= long *offsets, struct vmap_node *vn =3D addr_to_node(vas[area]->va_start); =20 spin_lock(&vn->busy.lock); - insert_vmap_area_busy_locked(vas[area], vn); + if (unlikely(!insert_vmap_area_busy_locked(vas[area], vn))) { + spin_unlock(&vn->busy.lock); + goto err_unwind_busy; + } setup_vmalloc_vm(vms[area], vas[area], VM_ALLOC, pcpu_get_vm_areas); spin_unlock(&vn->busy.lock); + inserted_busy++; } =20 /* @@ -5197,33 +5536,43 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned= long *offsets, while (area--) { orig_start =3D vas[area]->va_start; orig_end =3D vas[area]->va_end; - WARN_ON_ONCE(!occupied_mt_erase_va_locked(vas[area])); - va =3D merge_or_add_vmap_area_free_locked(vas[area]); - WARN_ON_ONCE(!va); - if (va) - kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end, - KASAN_VMALLOC_PAGE_RANGE | - KASAN_VMALLOC_TLB_FLUSH); + if (occupied_mt_erase_va_locked(vas[area])) { + va =3D reinsert_or_queue_vmap_area_locked(vas[area]); + if (va) + kasan_release_vmalloc(orig_start, orig_end, + va->va_start, va->va_end, + KASAN_VMALLOC_PAGE_RANGE | + KASAN_VMALLOC_TLB_FLUSH); + else + queued_retry =3D true; + } else { + retry_queue_add_va_locked(vas[area]); + queued_retry =3D true; + } vas[area] =3D NULL; } =20 overflow: spin_unlock(&free_vmap_area_lock); + if (queued_retry) + schedule_work(&drain_vmap_work); + if (!purged) { reclaim_and_purge_vmap_areas(); purged =3D true; =20 - /* Before "retry", check if we recover. */ - for (area =3D 0; area < nr_vms; area++) { - if (vas[area]) - continue; - - vas[area] =3D kmem_cache_zalloc( - vmap_area_cachep, GFP_KERNEL); - if (!vas[area]) - goto err_free; - } + /* Before "retry", check if we recover. */ + for (area =3D 0; area < nr_vms; area++) { + if (vas[area]) + continue; + + vas[area] =3D kmem_cache_zalloc(vmap_area_cachep, + GFP_KERNEL); + if (vas[area]) + INIT_LIST_HEAD(&vas[area]->list); + if (!vas[area]) + goto err_free; + } =20 goto retry; } @@ -5240,6 +5589,16 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned = long *offsets, kfree(vms); return NULL; =20 +err_unwind_busy: + while (inserted_busy--) { + struct vmap_node *vn =3D addr_to_node(vas[inserted_busy]->va_start); + + spin_lock(&vn->busy.lock); + unlink_vmap_area_busy_locked(vas[inserted_busy], vn); + spin_unlock(&vn->busy.lock); + } + goto err_free_shadow; + err_free_shadow: spin_lock(&free_vmap_area_lock); /* @@ -5250,17 +5609,25 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned= long *offsets, for (area =3D 0; area < nr_vms; area++) { orig_start =3D vas[area]->va_start; orig_end =3D vas[area]->va_end; - WARN_ON_ONCE(!occupied_mt_erase_va_locked(vas[area])); - va =3D merge_or_add_vmap_area_free_locked(vas[area]); - WARN_ON_ONCE(!va); - if (va) - kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end, - KASAN_VMALLOC_PAGE_RANGE | KASAN_VMALLOC_TLB_FLUSH); + if (occupied_mt_erase_va_locked(vas[area])) { + va =3D reinsert_or_queue_vmap_area_locked(vas[area]); + if (va) + kasan_release_vmalloc(orig_start, orig_end, + va->va_start, va->va_end, + KASAN_VMALLOC_PAGE_RANGE | + KASAN_VMALLOC_TLB_FLUSH); + else + queued_retry =3D true; + } else { + retry_queue_add_va_locked(vas[area]); + queued_retry =3D true; + } vas[area] =3D NULL; kfree(vms[area]); } spin_unlock(&free_vmap_area_lock); + if (queued_retry) + schedule_work(&drain_vmap_work); kfree(vas); kfree(vms); return NULL; @@ -5364,6 +5731,13 @@ static void show_purge_info(struct seq_file *m) va_size(va)); spin_unlock(&vn->lazy.lock); } + + spin_lock(&free_vmap_area_lock); + list_for_each_entry(va, &vmap_retry_list, list) + seq_printf(m, "0x%pK-0x%pK %7ld retry vm_area\n", + (void *)va->va_start, (void *)va->va_end, + va_size(va)); + spin_unlock(&free_vmap_area_lock); } =20 static int vmalloc_info_show(struct seq_file *m, void *p) @@ -5635,13 +6009,21 @@ void __init vmalloc_init(void) =20 vn =3D addr_to_node(va->va_start); spin_lock(&vn->busy.lock); - insert_vmap_area_busy_locked(va, vn); + if (unlikely(!insert_vmap_area_busy_locked(va, vn))) { + spin_unlock(&vn->busy.lock); + panic("%s: failed to import busy range %#lx-%#lx\n", + __func__, va->va_start, va->va_end); + } spin_unlock(&vn->busy.lock); =20 spin_lock(&free_vmap_area_lock); try_init_occupied_mt_locked(); - WARN_ON_ONCE(!occupied_mt_store_range_locked(va->va_start, - va->va_end)); + if (WARN_ON_ONCE(!occupied_mt_store_range_locked(va->va_start, + va->va_end))) { + spin_unlock(&free_vmap_area_lock); + panic("%s: failed to import occupied range %#lx-%#lx\n", + __func__, va->va_start, va->va_end); + } spin_unlock(&free_vmap_area_lock); } =20 --=20 2.34.1 From nobody Sat Jun 13 23:22:38 2026 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 755DC37C90B for ; Sat, 13 Jun 2026 17:21:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.168.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371278; cv=none; b=qjqaeWzX3zVfm1L9HSZ+XD/BN7aIa4jtCh3TFYC0Xhch2WKqHVYSn+GdMex/+GKqvAp/rF4UOgFw+pVqLAcCyTpOWmjXTQfx4xy1E8fKlaOt10x9wUMgHXCGXQUmpZAKb6dAoGMAyq1xj5F9usmbX/EQK+IiM5EQeIlpBCVdeMw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371278; c=relaxed/simple; bh=zbxu+zhrrq8kNARCxboH4iwVnk+IT7v84v8g4HuQcSE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=SFX9Hm7PKhxhJGbl7f6E77EMYfBnOj2WqrUtoeBM+zqhB1NAyrhnj36DFJWYWdQgznUN5JrExV3R6RJrGyknoLrCBNzOrZPe9X2aOMvkNYCZ6HktVzOXbHDNHDQLH677AbyykI3U7byUgrJ4nnPJ2FY7IXxbL01olN1e6fBn2dw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com; spf=pass smtp.mailfrom=oss.qualcomm.com; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b=kfAi0W0E; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b=gfYwCly9; arc=none smtp.client-ip=205.220.168.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b="kfAi0W0E"; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="gfYwCly9" Received: from pps.filterd (m0279862.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65DFP1EL2720676 for ; Sat, 13 Jun 2026 17:21:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= /I6iFme5fkx2C9j5fdSlySDSrUwV6eedj4hKkQAifjg=; b=kfAi0W0Ea2oNzZlF pbJ3BANSwcs3xnk4Dt298ANsFrPlm4UA5OGzoapOvGsOIIn3Dfnvg2qjppwHgHvB WA7QgSTVohXpxXUHKlbcs1W/tfKRAdLnGhXsl66VQbcGYMGkzExJ4PN7ZE8wsfMl rMG0T7XSSM+WVOsoqrz7t1fcwoDvVe69vul8cYFErN1xz3Y+65NPicnHT7GTrthh MlOrB8Kj0IuhXen3mDUIPSlsOhwO5Wxh1inH37pgpPp5wkPW0qS6sFQFJSCZZqQm YqkSgOvuLDrYfiXAUXtpU9GG/r/o6imNQ+i67vJafYbHeV/6VnmyQzGbxs/NdNlm kgP4yg== Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4eryybsj6p-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Sat, 13 Jun 2026 17:21:15 +0000 (GMT) Received: by mail-pf1-f200.google.com with SMTP id d2e1a72fcca58-8423efbfb61so1397176b3a.0 for ; Sat, 13 Jun 2026 10:21:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1781371275; x=1781976075; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=/I6iFme5fkx2C9j5fdSlySDSrUwV6eedj4hKkQAifjg=; b=gfYwCly9UyDHgfAc3G3WKvD3gkPmCjb3UsDuuiEcLKp/sAQY27dEbsEsP7vDBpoh5d udOJh/mkuy9dVMowI5MMbMmfX0Anhowg2womxZ51FfaG7fm6sXvYsgHJefjcQO/Rfw2N vLR1O2E4BIFU4A1khy42f+F3i63JwZYulJqWGDvUPnJO8Whw3UP5tmOlPLViLWQ/K3FU prboJJw1OyweVzGo6O3Y+G1i4C/Z3PyKiMucj2lFAzCSLGLSqYWQRlFCLWrlKrumdfHH BJXFe8l0Sy5FG76yMJ4B47oXGzLYHqVl5lDw1edzSvoJHk23l/sn2A89bti+owov1jiE Q7iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781371275; x=1781976075; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=/I6iFme5fkx2C9j5fdSlySDSrUwV6eedj4hKkQAifjg=; b=YfTjvLJYYN2dbS0ZVFdTDnUUVIRvRATcC+4GRfo3OwJr3YKdVluRhN+jtbiewHA2I4 sNDw7bMRzzJZltdYow+9oiXP9CgJg7liGhMyuW0RT3HfIpUnezVc02iD1zZwxXTzu1m0 Dld4nvPLY+isi6i/rqdgMaO+BLThy9ECSA4niX/TPZga+fALzK46FP2bk890200m6+7X DtYBOP82KI6A6EJ0FRfxe0jnLFEj8R34z/uvXbP/M9WYFseC0IbWay/w/FrXebC5SFy8 wmad2eOdKbwlA/pcbCodZgjA31wgMZden8dv7gF1DfcfqudMsY7paKas/c6gwFvgUtTL IToQ== X-Forwarded-Encrypted: i=1; AFNElJ8UUnVuW7ZloI67QErIFhyzhYXfpUcNCzA07djuVuHJUfcmO2WCVwapndnS+hXRG/kzNRmeAVwUjKKssVM=@vger.kernel.org X-Gm-Message-State: AOJu0Yzd1QvUQJOLaMFjRDyk2xHADsYXDuMDJg20K9H8rSvLbzqE6dUa fgZRPIsOBqQ0I0KxTJNmORnro8kEETLGDYT6gNF6Wzov0eJjpzfNZ9/5myBJFwhzmIIhDSZlevN 9cPyN/zYB/exh6dhDT5AIPXCKxfyhM7pSz9jUtCDFp6P8ovFOX+kKMR/cojoQJZNNW2c= X-Gm-Gg: Acq92OHanwLVE87PN8cmkepPRmz90XwC13xJguIFRxZl7Z5CldOZGYz9jk0NQL9balG 3jIXW5lSG45h3VwJtGCVFktjwdwwSK7wsqnsBwpGdg2GPvLjppzUcczjEAhzknNiR3QPkTOh/1U o9JAMgYiEBV+JyQ602qd2gCh3mkt/F31O7BsnVmZXIsM4PDrMyHSJj3NOSFT6D7im5yHsgEIyiM RjOUu6SDwBLcEwEWO8fkTt6CZOzHIklyBgw0Jd4Y+hTrWQ4tlFHabIuGoDEGf3652PMs7yfSUyc 7SkDr5+Pgj+BkBCKJfS1SlaZYzrxeqDI7tJFemtEj2v3LBx83QjAHyQKG+TifpXkMKHLwcIKMxU Nx9quAScqQ6WDeP0DALTEKsjNownBgPXE9W1j10E/+btd8xUcl7oD3Q== X-Received: by 2002:a05:6a00:238c:b0:837:e9cc:d46e with SMTP id d2e1a72fcca58-844e1a2af43mr4931724b3a.21.1781371274592; Sat, 13 Jun 2026 10:21:14 -0700 (PDT) X-Received: by 2002:a05:6a00:238c:b0:837:e9cc:d46e with SMTP id d2e1a72fcca58-844e1a2af43mr4931674b3a.21.1781371274064; Sat, 13 Jun 2026 10:21:14 -0700 (PDT) Received: from hu-pranarya-hyd.qualcomm.com ([202.46.22.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434accbec5sm5390913b3a.16.2026.06.13.10.21.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 10:21:13 -0700 (PDT) From: Pranjal Arya Date: Sat, 13 Jun 2026 22:49:48 +0530 Subject: [PATCH RFC 06/12] mm/vmalloc: tighten alloc/free hot paths Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260613-vmalloc_maple-v1-6-0aa740bb944b@oss.qualcomm.com> References: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> In-Reply-To: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> To: Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Alice Ryhl , Andrew Ballance Cc: linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, Lorenzo Stoakes , Pranjal Shrivastava , Will Deacon , Suzuki K Poulose , Neil Armstrong , Mostafa Saleh , Balbir Singh , Suren Baghdasaryan , Marco Elver , Dmitry Vyukov , Alexander Potapenko , Shuah Khan , Dev Jain , Brendan Jackman , Puranjay Mohan , Santosh Shukla , Wyes Karny , Pranjal Arya , Sudeep Holla X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1781371215; l=18494; i=pranjal.arya@oss.qualcomm.com; s=20260516; h=from:subject:message-id; bh=zbxu+zhrrq8kNARCxboH4iwVnk+IT7v84v8g4HuQcSE=; b=6nysurH5i68Eunbid0E9yIxoEU1j7CC37Xen+8vyn32srCewr3yrDNXjXSb1C4oOv2BtOpyVX 5l2hc7ejOSGAHu43D1gJQjtwsVNhnNBooF4I6rHvvcFNLBNmY7ZdabV X-Developer-Key: i=pranjal.arya@oss.qualcomm.com; a=ed25519; pk=ymtcTlccEIDsi3ErhpjIoZZHKdPBYWGWW0Lchs5MsbE= X-Proofpoint-GUID: l9tjGn3jtTc4gbSnP-7knngYOe2toH0Y X-Proofpoint-ORIG-GUID: l9tjGn3jtTc4gbSnP-7knngYOe2toH0Y X-Proofpoint-Spam-Info: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX3slL8f+jQba6 IDXZFWTl+3Q4QNeAHMD2gV47NUQw1yeWKx0EshynXx0OAbdvcfI4aL9Lpg/DvmOlO0Slotl3HFq iLxB8Se4SBxi3S/1RMVf5ZaZACkuweY= X-Authority-Analysis: v=2.4 cv=JLYLdcKb c=1 sm=1 tr=0 ts=6a2d918b cx=c_pps a=mDZGXZTwRPZaeRUbqKGCBw==:117 a=fChuTYTh2wq5r3m49p7fHw==:17 a=IkcTkHD0fZMA:10 a=FelO9ux0wxsA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=_K5XuSEh1TEqbUxoQ0s3:22 a=EUspDBNiAAAA:8 a=bFY0SaVsXT4y8Vv8kk0A:9 a=QEXdDO2ut3YA:10 a=zc0IvFSfCIW2DFIPzwfm:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX9N05Z19obJa+ P8EHrebr5JE0ATVos38J+qSpRfFy0NPGA71ozezGon31N8+SmhoGusZ7tgSyLbXNSoYaOtpaDyd qz/jFgiLKJHpAE3w/lH16KDAiez6qhfP9g5LTg/emoIOZ8bRuOgWVoOgtXhDA57iDg1P+hkYy3E Ftc3ik2ZHiFWWfb1t11ve2ypF8FbgtGbtC0W5WQMm97QxPSkrNUibt6v2U0TTcKvGhons+haSkh nKjaviNj5HLD2yKQEIkTBT2fsLV3o+aqdnMGCAA6wJ+C4Tjo6CdboVFUSsDSklawewqklZEOqcg 3HQAGHAe5/LW46e+bM5L/a/wJUsb1P131UmWzWGxyieCBONWcGs3XZwfI0bP6SUBo7Fw5wklUKk UeFLNuSt+bvK1JuDeHXvVTkEaX7vCuoyyzwLJ+m7i48y+oyHN+vGMgJT/aPnlAomE3qCNnwzVmG /TGnGXFxAkzZSYU9tHg== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-13_03,2026-06-12_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 spamscore=0 impostorscore=0 bulkscore=0 adultscore=0 malwarescore=0 phishscore=0 suspectscore=0 priorityscore=1501 lowpriorityscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606040000 definitions=main-2606130180 Three small refinements that follow from using maple_tree more idiomatically on the alloc and free walkers: - Carry the MA_STATE through the alloc walker instead of dropping and re-positioning it between the gap query, the candidate classify step, and the post-clip publish. The walker is already pointing at the chosen range; reuse it. - When the gap query is satisfied by an entry whose start is not yet aligned, ask for a gap of (size + align - 1) so the first match is guaranteed alignable. This matches the effective behaviour of the augmented rb_tree's gap traversal. - When va_clip narrows an existing free entry, store NULL on just the consumed sub-range instead of erasing the whole entry and re-storing the surviving prefix/suffix. mas_store(NULL, [start, end]) leaves the un-trimmed sub-range of the original entry intact, so the re-store is unnecessary. - Walk the address-keyed occupied tree with mas_find on the rare decay path so the per-node free-area scan can prune ranges that are already aligned out. No semantic change to the allocator policy, the free-area shape, or the addresses returned. Signed-off-by: Pranjal Arya --- mm/vmalloc.c | 338 +++++++++++++++++++++++++++++++++++++++++++++++++------= ---- 1 file changed, 283 insertions(+), 55 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 7feb1b182cfa..5bc1e47c456a 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -899,6 +899,7 @@ static struct maple_tree occupied_vmap_area_mt =3D free_vmap_area_lock); static bool occupied_vmap_area_mt_enabled; static bool occupied_vmap_area_mt_init_tried; +static bool occupied_vmap_area_perf_mode; =20 /* * Preload a CPU with one object for "no edge" split case. The @@ -1073,12 +1074,13 @@ static __always_inline bool free_mt_store_va_locked= (struct vmap_area *va) if (!err) { mas_store_prealloc(&mas, va); mas_destroy(&mas); - } else { - err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); - if (WARN_ON_ONCE(err)) - return false; + return true; } =20 + err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); + if (WARN_ON_ONCE(err)) + return false; + return true; } =20 @@ -1119,6 +1121,31 @@ free_mt_update_va_locked(struct vmap_area *va, unsig= ned long old_start, return true; } =20 +/* + * Trim a stored range entry by clearing a sub-range from one end. + * Used by LE_FIT and RE_FIT in va_clip(): the original [old_start, + * old_end-1]->@va entry survives intact at the un-trimmed sub-range, + * so a single mas_store NULL replaces the explicit erase + restore-at- + * shrunk-range pair, halving maple-tree work for edge clips. NE_FIT + * uses the same primitive after first inserting @lva, which trades 3 + * stores (erase + store + lva) for 2 (lva + middle trim). + */ +static __always_inline bool +free_mt_trim_range_locked(unsigned long trim_start, unsigned long trim_end) +{ + int err; + + lockdep_assert_held(&free_vmap_area_lock); + + if (trim_start >=3D trim_end) + return true; + + MA_STATE(mas, &free_vmap_area_mt, trim_start, trim_end - 1); + + err =3D mas_store_gfp(&mas, NULL, GFP_ATOMIC | __GFP_NOWARN); + return !WARN_ON_ONCE(err); +} + static __always_inline void retry_queue_add_va_locked(struct vmap_area *va) { @@ -1175,6 +1202,11 @@ static __always_inline void try_init_free_mt_locked(= void) } =20 static __always_inline bool occupied_mt_supported(void) +{ + return occupied_vmap_area_perf_mode && occupied_vmap_area_mt_enabled; +} + +static __always_inline bool occupied_mt_enabled(void) { return occupied_vmap_area_mt_enabled; } @@ -1194,28 +1226,48 @@ static __always_inline void try_init_occupied_mt_lo= cked(void) } =20 static __always_inline bool -occupied_mt_store_range_locked(unsigned long start, unsigned long end) +occupied_mt_store_range_raw_locked(unsigned long start, unsigned long end) { int err; =20 lockdep_assert_held(&free_vmap_area_lock); =20 - if (WARN_ON_ONCE(!occupied_mt_supported())) - return false; + if (!occupied_mt_enabled()) + return true; =20 MA_STATE(mas, &occupied_vmap_area_mt, start, end - 1); =20 - err =3D mas_preallocate(&mas, XA_ZERO_ENTRY, GFP_NOWAIT | __GFP_NOWARN); - if (!err) { - mas_store_prealloc(&mas, XA_ZERO_ENTRY); - mas_destroy(&mas); + err =3D mas_store_gfp(&mas, XA_ZERO_ENTRY, GFP_ATOMIC | __GFP_NOWARN); + return !WARN_ON_ONCE(err); +} + +static __always_inline bool +occupied_mt_erase_range_raw_locked(unsigned long start, unsigned long end) +{ + int err; + + lockdep_assert_held(&free_vmap_area_lock); + + if (!occupied_mt_enabled()) return true; - } =20 - err =3D mas_store_gfp(&mas, XA_ZERO_ENTRY, GFP_ATOMIC | __GFP_NOWARN); + MA_STATE(mas, &occupied_vmap_area_mt, start, end - 1); + + err =3D mas_store_gfp(&mas, NULL, GFP_ATOMIC | __GFP_NOWARN); return !WARN_ON_ONCE(err); } =20 +static __always_inline bool +occupied_mt_store_range_locked(unsigned long start, unsigned long end) +{ + lockdep_assert_held(&free_vmap_area_lock); + + if (!occupied_mt_supported()) + return true; + + return occupied_mt_store_range_raw_locked(start, end); +} + static __always_inline bool occupied_mt_store_va_locked(struct vmap_area *va) { @@ -1227,17 +1279,12 @@ occupied_mt_store_va_locked(struct vmap_area *va) static __always_inline bool occupied_mt_erase_range_locked(unsigned long start, unsigned long end) { - int err; - lockdep_assert_held(&free_vmap_area_lock); =20 if (WARN_ON_ONCE(!occupied_mt_supported())) return false; =20 - MA_STATE(mas, &occupied_vmap_area_mt, start, end - 1); - - err =3D mas_store_gfp(&mas, NULL, GFP_ATOMIC | __GFP_NOWARN); - return !WARN_ON_ONCE(err); + return occupied_mt_erase_range_raw_locked(start, end); } =20 static __always_inline bool @@ -1303,6 +1350,24 @@ __find_vmap_area_enclose_addr_mt(unsigned long addr,= struct maple_tree *tree) return mas_find_rev(&mas, 0); } =20 +static __always_inline bool +find_vmap_area_insert_neighbors_mt_locked(struct maple_tree *tree, + unsigned long start, + unsigned long end, + struct vmap_area **left, + struct vmap_area **right) +{ + *left =3D __find_vmap_area_enclose_addr_mt(start, tree); + if (*left && WARN_ON_ONCE((*left)->va_end > start)) + return false; + + *right =3D __find_vmap_area_exceed_addr_mt(start, tree); + if (*right && WARN_ON_ONCE((*right)->va_start < end)) + return false; + + return true; +} + static __always_inline bool validate_vmap_area_range_insert_mt_locked(struct maple_tree *tree, unsigned long start, @@ -1310,12 +1375,8 @@ validate_vmap_area_range_insert_mt_locked(struct map= le_tree *tree, { struct vmap_area *left, *right; =20 - left =3D __find_vmap_area_enclose_addr_mt(start, tree); - if (left && WARN_ON_ONCE(left->va_end > start)) - return false; - - right =3D __find_vmap_area_exceed_addr_mt(start, tree); - if (right && WARN_ON_ONCE(right->va_start < end)) + if (!find_vmap_area_insert_neighbors_mt_locked(tree, start, end, + &left, &right)) return false; =20 return true; @@ -1499,10 +1560,11 @@ unlink_vmap_area_lazy_locked(struct vmap_area *va, = struct vmap_node *vn) } =20 /* - * Transition a VA into the lazy index and drop occupied tracking. On occu= pied - * erase failure, attempt to roll back the lazy insertion; if rollback fai= ls we - * keep the lazy entry and let purge-side erase_occupied handling repair s= tale - * occupied state. + * Transition a VA into the lazy index. + * + * In the default mode, occupied tracking is dropped while the VA is lazy. + * In occupied perf mode, lazy ranges stay occupied-indexed so hole search= can + * avoid repeatedly probing unavailable gaps. * * Returns true when the VA remains lazy-indexed; false when it should be * retried via non-index queue. @@ -1518,6 +1580,11 @@ publish_vmap_area_lazy(struct vmap_area *va, struct = vmap_node *vn) return false; } =20 + if (occupied_mt_supported()) { + spin_unlock(&vn->lazy.lock); + return true; + } + /* * Keep lazy.lock held while dropping occupied tracking so purge-side * lazy extraction cannot move @va to purge_list during rollback. @@ -1588,24 +1655,34 @@ move_lazy_vmap_areas_to_purge_locked(struct vmap_no= de *vn) } =20 static __always_inline bool -insert_vmap_area_free_locked(struct vmap_area *va) +insert_vmap_area_free_nocheck_locked(struct vmap_area *va) { - struct vmap_area *prev, *next; - lockdep_assert_held(&free_vmap_area_lock); =20 - prev =3D __find_vmap_area_enclose_addr_mt(va->va_start, &free_vmap_area_m= t); - if (prev && WARN_ON_ONCE(prev->va_end > va->va_start)) - return false; + try_init_free_mt_locked(); =20 - next =3D __find_vmap_area_exceed_addr_mt(va->va_start, &free_vmap_area_mt= ); - if (next && WARN_ON_ONCE(next->va_start < va->va_end)) + if (unlikely(!free_mt_supported())) return false; =20 INIT_LIST_HEAD(&va->list); return free_mt_store_va_locked(va); } =20 +static __always_inline bool +insert_vmap_area_free_locked(struct vmap_area *va) +{ + struct vmap_area *prev, *next; + + lockdep_assert_held(&free_vmap_area_lock); + + if (!find_vmap_area_insert_neighbors_mt_locked(&free_vmap_area_mt, + va->va_start, va->va_end, + &prev, &next)) + return false; + + return insert_vmap_area_free_nocheck_locked(va); +} + static __always_inline void unlink_vmap_area_free_locked(struct vmap_area *va) { @@ -1634,8 +1711,9 @@ merge_or_add_vmap_area_free_locked(struct vmap_area *= va) new_start =3D va->va_start; new_end =3D va->va_end; =20 - left =3D __find_vmap_area_enclose_addr_mt(new_start, &free_vmap_area_mt); - if (left && WARN_ON_ONCE(left->va_end > new_start)) + if (!find_vmap_area_insert_neighbors_mt_locked(&free_vmap_area_mt, + new_start, new_end, + &left, &right)) return NULL; =20 right =3D __find_vmap_area_exceed_addr_mt(new_start, &free_vmap_area_mt); @@ -1657,7 +1735,7 @@ merge_or_add_vmap_area_free_locked(struct vmap_area *= va) va->va_start =3D new_start; va->va_end =3D new_end; =20 - if (!insert_vmap_area_free_locked(va)) + if (!insert_vmap_area_free_nocheck_locked(va)) return NULL; =20 return va; @@ -1690,6 +1768,10 @@ occupied_mt_find_hole_window_locked(unsigned long mi= n, unsigned long max, MA_STATE(mas, &occupied_vmap_area_mt, 0, 0); unsigned long search =3D min; unsigned long hole_end; + bool retry_empty; + + lockdep_assert_held(&free_vmap_area_lock); + retry_empty =3D list_empty(&vmap_retry_list); =20 while (search <=3D max) { unsigned long candidate, candidate_end; @@ -1709,7 +1791,8 @@ occupied_mt_find_hole_window_locked(unsigned long min= , unsigned long max, while (candidate >=3D search && candidate_end <=3D hole_end) { unsigned long blocked_end =3D 0; =20 - if (!retry_queue_overlap_locked(candidate, candidate_end, + if (retry_empty || + !retry_queue_overlap_locked(candidate, candidate_end, &blocked_end)) { *addr =3D candidate; return true; @@ -1751,6 +1834,70 @@ occupied_mt_find_hole_lowest_locked(unsigned long si= ze, unsigned long align, return -ENOENT; } =20 +static __always_inline struct vmap_area * +free_mt_find_enclose_range_locked(unsigned long start, unsigned long end) +{ + struct vmap_area *va; + + lockdep_assert_held(&free_vmap_area_lock); + + va =3D __find_vmap_area_mt(start, &free_vmap_area_mt); + if (!va) + return NULL; + + if (va->va_start > start || va->va_end < end) + return NULL; + + return va; +} + +static __always_inline void +occupied_mt_cache_gap_miss_locked(unsigned long candidate, unsigned long v= end) +{ + struct vmap_area *prev, *next; + unsigned long blocked_end; + + lockdep_assert_held(&free_vmap_area_lock); + + if (!occupied_mt_supported()) + return; + + prev =3D __find_vmap_area_enclose_addr_mt(candidate, &free_vmap_area_mt); + if (prev && prev->va_start <=3D candidate && candidate < prev->va_end) + return; + + next =3D __find_vmap_area_exceed_addr_mt(candidate, &free_vmap_area_mt); + blocked_end =3D next ? next->va_start : vend; + if (blocked_end <=3D candidate) + return; + + WARN_ON_ONCE(!occupied_mt_store_range_raw_locked(candidate, blocked_end)); +} + +static __always_inline bool occupied_mt_seed_from_free_locked(void) +{ + MA_STATE(mas, &free_vmap_area_mt, 0, 0); + struct vmap_area *va; + unsigned long search =3D VMALLOC_START; + + lockdep_assert_held(&free_vmap_area_lock); + + mas_for_each(&mas, va, VMALLOC_END - 1) { + if (search < va->va_start) { + if (!occupied_mt_store_range_raw_locked(search, va->va_start)) + return false; + } + + if (va->va_end > search) + search =3D va->va_end; + } + + if (search < VMALLOC_END) + return occupied_mt_store_range_raw_locked(search, VMALLOC_END); + + return true; +} + /* Lowest-match scan directly on maple ordered traversal. */ static __always_inline struct vmap_area * find_vmap_lowest_match_mt(struct maple_tree *tree, unsigned long size, @@ -1939,11 +2086,39 @@ va_clip(struct vmap_area *va, unsigned long nva_sta= rt_addr, } =20 if (type !=3D FL_FIT_TYPE) { - if (free_mt_supported() && - !free_mt_update_va_locked(va, old_start, old_end)) - return -ENOMEM; - - if (lva && !insert_vmap_area_free_locked(lva)) { + if (free_mt_supported()) { + /* + * Drop only the consumed sub-range from the original + * free entry instead of erase-then-store. The maple + * tree leaves @va at the surviving sub-range intact, + * so a single mas_store per clip side suffices. + * + * For NE_FIT, insert @lva at the original entry's + * left portion first: mas_store overwrites the old + * [old_start, old_end-1]->va entry only across + * [old_start, lva->va_end-1], leaving the right side + * still pointing to @va. The subsequent middle trim + * carves out the consumed gap. Trades 3 stores + * (erase + restore + lva) for 2. + */ + if (type =3D=3D LE_FIT_TYPE) { + if (!free_mt_trim_range_locked(old_start, + va->va_start)) + return -ENOMEM; + } else if (type =3D=3D RE_FIT_TYPE) { + if (!free_mt_trim_range_locked(va->va_end, + old_end)) + return -ENOMEM; + } else { /* NE_FIT_TYPE */ + if (!insert_vmap_area_free_nocheck_locked(lva)) { + kmem_cache_free(vmap_area_cachep, lva); + return -ENOMEM; + } + if (!free_mt_trim_range_locked(nva_start_addr, + nva_start_addr + size)) + return -ENOMEM; + } + } else if (lva && !insert_vmap_area_free_nocheck_locked(lva)) { kmem_cache_free(vmap_area_cachep, lva); return -ENOMEM; } @@ -1965,7 +2140,7 @@ restore_allocated_vmap_range_free_locked(unsigned lon= g start, unsigned long end) =20 va->va_start =3D start; va->va_end =3D end; - if (!insert_vmap_area_free_locked(va)) { + if (!insert_vmap_area_free_nocheck_locked(va)) { kmem_cache_free(vmap_area_cachep, va); return false; } @@ -2048,6 +2223,7 @@ __alloc_vmap_area(unsigned long size, unsigned long a= lign, int ret; unsigned long nva_start_addr; unsigned long nva_end_addr; + unsigned long search_len =3D size; struct vmap_area *va; MA_STATE(mas, &free_vmap_area_mt, 0, 0); =20 @@ -2059,6 +2235,28 @@ __alloc_vmap_area(unsigned long size, unsigned long = align, return -EINVAL; if (size > vend - vstart) return -ENOENT; + if (align > PAGE_SIZE && (vend - vstart) !=3D size) { + if (check_add_overflow(size, align - 1, &search_len)) + return -ERANGE; + } + + if (occupied_mt_supported() && align <=3D PAGE_SIZE) { + unsigned long candidate; + + if (occupied_mt_find_hole_window_locked(vstart, vend - 1, size, + align, &candidate)) { + if (check_add_overflow(candidate, size, &nva_end_addr)) + return -ERANGE; + + va =3D free_mt_find_enclose_range_locked(candidate, nva_end_addr); + if (likely(va)) { + nva_start_addr =3D candidate; + goto found; + } + + occupied_mt_cache_gap_miss_locked(candidate, vend); + } + } =20 /* * Free maple index is authoritative for allocatable ranges; lazy and @@ -2067,26 +2265,37 @@ __alloc_vmap_area(unsigned long size, unsigned long= align, mas_set(&mas, vstart); va =3D mas_find(&mas, vend - 1); while (va) { - unsigned long search_start =3D max(va->va_start, vstart); - unsigned long candidate_end; + unsigned long search_start, limit_end; + + search_start =3D va->va_start; + if (search_start < vstart) + search_start =3D vstart; + + limit_end =3D va->va_end; + if (limit_end > vend) + limit_end =3D vend; + + if (unlikely(limit_end <=3D search_start)) + goto next; + if (unlikely(limit_end - search_start < search_len)) + goto next; =20 nva_start_addr =3D ALIGN(search_start, align); if (nva_start_addr < search_start) return -ERANGE; =20 - if (check_add_overflow(nva_start_addr, size - 1, &candidate_end)) + if (check_add_overflow(nva_start_addr, size, &nva_end_addr)) return -ERANGE; - - if (candidate_end < vend && candidate_end < va->va_end) { - nva_end_addr =3D candidate_end + 1; + if (nva_end_addr <=3D limit_end) break; - } =20 +next: va =3D mas_next(&mas, vend - 1); } if (!va) return -ENOENT; =20 +found: ret =3D va_clip(va, nva_start_addr, size); if (WARN_ON_ONCE(ret)) return ret; @@ -2571,7 +2780,8 @@ decay_va_pool_node(struct vmap_node *vn, bool full_de= cay) } } =20 - WARN_ON_ONCE(!reclaim_list_global(&decay_list, false, &decay_failed)); + WARN_ON_ONCE(!reclaim_list_global(&decay_list, occupied_mt_supported(), + &decay_failed)); list_for_each_entry_safe(va, nva, &decay_failed, list) { list_del_init(&va->list); WARN_ON_ONCE(!node_pool_add_va(vn, va)); @@ -6043,3 +6253,21 @@ void __init vmalloc_init(void) vmap_node_shrinker->scan_objects =3D vmap_node_shrink_scan; shrinker_register(vmap_node_shrinker); } + +static int __init vmap_enable_occupied_perf_mode(void) +{ + bool seeded =3D false; + + spin_lock(&free_vmap_area_lock); + try_init_occupied_mt_locked(); + if (occupied_mt_enabled()) + seeded =3D occupied_mt_seed_from_free_locked(); + occupied_vmap_area_perf_mode =3D seeded; + spin_unlock(&free_vmap_area_lock); + + if (!seeded) + pr_warn("vmalloc: occupied perf mode disabled (seed failure)\n"); + + return 0; +} +late_initcall(vmap_enable_occupied_perf_mode); --=20 2.34.1 From nobody Sat Jun 13 23:22:38 2026 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A86934AB01 for ; Sat, 13 Jun 2026 17:21:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.168.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371286; cv=none; b=Y4LK+nJIp4zxjsmVqZty0mQwTa/f91KeMd/40eONw1tUkIiufuP53XGt2mob3Whs1bDGwJau50YaroB3OPcZ//FNuFJCgmFNl4zdep2YuRPrq6E2mqUkrgvfNXd21BswyE/bEbeAZwi53WiAQ4vrDVOf1TxumKbRVEl1rpFbDOk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371286; c=relaxed/simple; bh=3LvVzlcvTfxyqrRYasoLmm+6K2iynv8qA2cCL5g/TxM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=G5mKHCH+IDrGm/O9k82xo5SXBZchEsxuSZJRMaxj3ZvMCnrzCNfYDZYP6hiRMy/LOvBDgHQLsqXD40VKF1PydUG95T/oSuRVkaZnyzjTtBDriTIo/Ay5Lwe5ZPBjZ6z2/wX7CF1DM0ZC/Hh1CQVFYHeU2qJxp6a8mJ3LcE/Fs8w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com; spf=pass smtp.mailfrom=oss.qualcomm.com; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b=GuMuwRMT; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b=eKepvNpx; arc=none smtp.client-ip=205.220.168.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b="GuMuwRMT"; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="eKepvNpx" Received: from pps.filterd (m0279866.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65DF8fh72781849 for ; Sat, 13 Jun 2026 17:21:24 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= l2jvLZ77aheSe+DZjMUpTRiQcTblxiXsbYokkj4oN64=; b=GuMuwRMT6fUnEqLr pC213gqGmouhs21S7DrTYvBiPMsV4SEiEZb0oYQi4HhjH4Nl8ulsTRXX8qRcB+wk KIoeVBFWaQxpwWXI6CAf/mOcKFNOYlZ88RdcxvVteMq7Xi81Rbtgo8ayvmOsr0NZ 6UdlO01TF5PYz9o/AH8c6cvQkS/wkyc+ob7rhawynYsLnbJVDMqhe8zlF4Y0RVS0 RT2GtsWwLvrwpl0vudoNb9FwNppSnZHeYbWHEbE1mlwBtMjQf7bOs/fV/sHZb0Ua sQAo1RWI9DoIyF/PZK1ebUw3vbajZ6Rni1T83CoN29DayOeez+CW49qqrzLjqVnh vQtgkg== Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4es0g81g53-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Sat, 13 Jun 2026 17:21:23 +0000 (GMT) Received: by mail-pg1-f200.google.com with SMTP id 41be03b00d2f7-c8584e80bfcso950991a12.3 for ; Sat, 13 Jun 2026 10:21:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1781371283; x=1781976083; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=l2jvLZ77aheSe+DZjMUpTRiQcTblxiXsbYokkj4oN64=; b=eKepvNpxdYISsjPpC60Zr/rORnTtGn8OFwjO+iVgDEZ+ALyxiPCzYfLY9RITIf010w HmLsFkg1G2E9wIyriAFSIQjfVcJ0BLcog53Qz7jdokks1iwGKpLz9/ktjAYI2RaOX1Gl kp1J5auDK9LT4I/GFLPHUHjaN9rGDGKG4q1EZ1q4p4A+QEsS03XitFyJGFsI/gQlp9rY PiVN2Z0g0TLPqQozmGCHACi6A2pItmyyH5guvLuM80r1Nc8cxpkPUG/TNT7GjrsgCMHM fThv9VpUamcZpDH+b4h5Jqa+wGePAFbfffgz1Xdp7+0agka03JrKmG4jCAbO43NMhbLF /1mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781371283; x=1781976083; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=l2jvLZ77aheSe+DZjMUpTRiQcTblxiXsbYokkj4oN64=; b=WBZym2L/PCpNSoq1+FBNaalzC4Xya0cYsJhlti9EScp7pAF+t5T9m/JCSbKcZtbexY kdNOOf+ocBBUqPrgmw516RSXIglIbok72Q3wmwP1sOCt4Y7E7B1WcwNJdcudafq49kPO 3TFn7oJkWx9ZAoC80MicVr6matRilc5T7z5V0T4Nj3rxuFRHtKDIMe89Qmm4QGuFIZsX VBM7wUYo5uJ779akrkBh6adODFZtQx2jFn3oe0GH4qSn+7BxxXo+010oYFEUpVI+b2QM IW5v+izNNNSN1pR5VLmeGAyV9LNDtBf/WtmZhtQQznpioE8KoZqv7XKmDazcba3IJAaG JbNg== X-Forwarded-Encrypted: i=1; AFNElJ91Kg6wD/2bG17OHtlxarZ5tRvflvbIEZ3XIoPNE0sVk77ROkKQnnMIEtqQWEc4CUj19krY5IURn0Wq+wg=@vger.kernel.org X-Gm-Message-State: AOJu0YzD+GBbZfgn3e5y4+baJPYCSF+XtY/Tf6uK4ZFdsHUqdXLuKSJF +IWyGTQtsDIIufo5QsLvkLdgjLmoM09AcUMOXL8UOPAPzx28VNpc8uvKlr5fj/4ajTcZRQHIoNM +KdPaWX18d7q/eS3hKk0SvP/q+UVcdEWgGeI5IK6ddaVLq4TUYjbZfpepT97lVwfs9gQ= X-Gm-Gg: Acq92OHCra8DciC3bx7gSWAe9L+sOw689mICATV/vK7iGMa7TuhBslksIyo+lXhHQy6 sgr33faaETMYz0PGrS5Rf1dau0e60pTzlAGxouOCl/O+mSZ4ZPbGUb3esx5gZXZaiWLhDzoOcqy zVNyLkIbklwLZFs315sWe+nUY8c1ZV+jQIlpEtzY4ybcI0kLTDMqLIm23P6pIo9N80Do90WpVYx /kr0FaUjkzEHrnO69tLCMUL3O7WPvyUzsSMsu7lRCMnpn3D60oI+AsUj1GE5lLEiLIk3y8OGrOt 5mCSILeWa0VGcWFnnwT53INfx1q8uGIzMhIj1n0OINccKYLJh74Bw84ULfZOufQreWYSAwlyYTO VAURf2J09y1YgO1jgWY+9KJNh5u4Ih+C5Ww/HI+dnPyoTEc8zunMLmg== X-Received: by 2002:a05:6300:408d:b0:3b4:774f:d18e with SMTP id adf61e73a8af0-3b783fb3955mr9237980637.36.1781371282885; Sat, 13 Jun 2026 10:21:22 -0700 (PDT) X-Received: by 2002:a05:6300:408d:b0:3b4:774f:d18e with SMTP id adf61e73a8af0-3b783fb3955mr9237939637.36.1781371282373; Sat, 13 Jun 2026 10:21:22 -0700 (PDT) Received: from hu-pranarya-hyd.qualcomm.com ([202.46.22.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434accbec5sm5390913b3a.16.2026.06.13.10.21.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 10:21:22 -0700 (PDT) From: Pranjal Arya Date: Sat, 13 Jun 2026 22:49:49 +0530 Subject: [PATCH RFC 07/12] mm/vmalloc: consolidate occupied tree as authoritative index on hot path Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260613-vmalloc_maple-v1-7-0aa740bb944b@oss.qualcomm.com> References: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> In-Reply-To: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> To: Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Alice Ryhl , Andrew Ballance Cc: linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, Lorenzo Stoakes , Pranjal Shrivastava , Will Deacon , Suzuki K Poulose , Neil Armstrong , Mostafa Saleh , Balbir Singh , Suren Baghdasaryan , Marco Elver , Dmitry Vyukov , Alexander Potapenko , Shuah Khan , Dev Jain , Brendan Jackman , Puranjay Mohan , Santosh Shukla , Wyes Karny , Pranjal Arya , Sudeep Holla X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1781371215; l=11442; i=pranjal.arya@oss.qualcomm.com; s=20260516; h=from:subject:message-id; bh=3LvVzlcvTfxyqrRYasoLmm+6K2iynv8qA2cCL5g/TxM=; b=nqUV4Kw0eSIXuP8FCMkdgzDqt1rMN9t/16SjpKiMChJQMzFkHtSb1Hgy1AomHVYAHaBNoGEE6 i18NaLje/ndC3qri/cczcfs/kygPhF23UV2KxUmTnbyFs28Fwq+DAUT X-Developer-Key: i=pranjal.arya@oss.qualcomm.com; a=ed25519; pk=ymtcTlccEIDsi3ErhpjIoZZHKdPBYWGWW0Lchs5MsbE= X-Authority-Analysis: v=2.4 cv=OOoXGyaB c=1 sm=1 tr=0 ts=6a2d9193 cx=c_pps a=oF/VQ+ItUULfLr/lQ2/icg==:117 a=fChuTYTh2wq5r3m49p7fHw==:17 a=IkcTkHD0fZMA:10 a=FelO9ux0wxsA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=YMgV9FUhrdKAYTUUvYB2:22 a=EUspDBNiAAAA:8 a=dDmkakLwKiWj0oWmIz4A:9 a=QEXdDO2ut3YA:10 a=3WC7DwWrALyhR5TkjVHa:22 X-Proofpoint-GUID: iCiHEHF3LI1W4BBb0NRpOkQD857lzryV X-Proofpoint-ORIG-GUID: iCiHEHF3LI1W4BBb0NRpOkQD857lzryV X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX5lSzQD9xpweL aN5yGRMj6aZSrEQ9RkDqlqe5YcmUgiaKzpnwY7T6iJ7VX/71F3gs3Vm//WxDPNdSoXOF8ssVPuS zKHKsPZ1DGw9AB+jGQRMt212PWembqIpfSsFqLBeoIhpIpH497oqrW6zW0wCDW3MOGP6Jd4NBQW AIuCARop/avjlrP+0Gv5ONdVgYsaCXo9XGmGTld9OiuN/7hNnjSymIeibB+KCqmtqa5SR+HfD77 skW5XQRApoNcZXQi6vlW2ZXU9MBJKzvV+RuFNXJ3gAf5ofpAK0wo61ZqjrERZS9iu76hUmObmJA IHlNGU+DOLPI6xmnERCc59KxuQBKTGfaw2bM4/d9Ox+ekxa7H/i47C1V4bzMOBM2gwPw2KM8bzs FI2TH5wnkpF4QHPQM3ypfAS+6HM2aXGnuOq4OUFK3+EP/NzKdT96f0Gpn70QCvz97NFkNOjg6T0 6kiQKl5hujsZ382ujuw== X-Proofpoint-Spam-Info: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX3iIlcjVL+DsX g10eHhWRAZZk3cILNQQaJqoJynm9PgyKfm9ylsKMOHkqbiyzSqIs4nU7ra0qrl0I8boz2OlDvmd ZGhy42nR8kQthQb5MQ892DAjdLo7yP8= X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-13_03,2026-06-12_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 clxscore=1015 adultscore=0 spamscore=0 phishscore=0 lowpriorityscore=0 bulkscore=0 malwarescore=0 impostorscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606040000 definitions=main-2606130180 The dual-tree design (free_vmap_area_mt + occupied_vmap_area_mt maintained in lock-step) costs roughly twice the maple operations per allocation lifecycle that the rb_tree path it replaced used. Strip the maintenance back to a single authoritative tree on the steady-state hot path. After this patch: - The occupied tree is the source of truth for in-use ranges on the alloc/free hot path. - free_vmap_area_mt is still maintained on the slow paths (vmap_init_free_space, pcpu_get_vm_areas's top-down walk, decay_va_pool_node), but the steady-state alloc/free no longer has to keep both trees in lock-step. - This removes ~half of the maple operations a typical vmalloc/vfree cycle performs. The pcpu top-down walk relies on the assumption that chunks consume addresses bottom-up, so stale free-tree entries at low addresses never collide with pcpu's chosen base. This is documented at the relevant call site. Signed-off-by: Pranjal Arya --- mm/vmalloc.c | 179 +++++++++++++++++++++++++++++++++----------------------= ---- 1 file changed, 99 insertions(+), 80 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 5bc1e47c456a..73a40a88dbf6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1767,17 +1767,32 @@ occupied_mt_find_hole_window_locked(unsigned long m= in, unsigned long max, { MA_STATE(mas, &occupied_vmap_area_mt, 0, 0); unsigned long search =3D min; + unsigned long search_len =3D size; unsigned long hole_end; bool retry_empty; =20 lockdep_assert_held(&free_vmap_area_lock); retry_empty =3D list_empty(&vmap_retry_list); =20 + /* + * Pad the gap-find by align-1 when align exceeds PAGE_SIZE so that + * any alignment slack inside the returned gap can be absorbed + * without an extra outer-loop iteration. Without this padding, the + * loop has to scan past every page-aligned gap that is large enough + * for @size but too small for the aligned start, which is O(K) in + * the number of such gaps and pathological for big alignments on a + * fragmented occupied tree. + */ + if (align > PAGE_SIZE) { + if (check_add_overflow(size, align - 1, &search_len)) + return false; + } + while (search <=3D max) { unsigned long candidate, candidate_end; =20 mas_set(&mas, search); - if (mas_empty_area(&mas, search, max, size)) + if (mas_empty_area(&mas, search, max, search_len)) return false; =20 hole_end =3D min(mas.last, max); @@ -2182,39 +2197,35 @@ rollback_busy_insert_failed_alloc_locked(struct vma= p_area *va) } =20 /* - * Reinsert @va into the free index after occupied erase. On failure, plac= e the - * range on the non-index retry queue and best-effort restore occupied tra= cking. + * Release @va after the caller has erased it from occupied_vmap_area_mt. + * In the occupied-only design there is no free index to track free space + * with vmap_area objects: the range becomes implicitly free as soon as + * the occupied marker is gone. The struct itself is recycled to the slab. * - * Return: free-tracked @va on success, NULL when queued for retry. + * The signature returns @va on success (matching the pre-rewrite contract + * used by the synchronous free_vmap_area() path) so the caller can decide + * whether further bookkeeping is needed. */ -static __always_inline struct vmap_area * -reinsert_or_queue_vmap_area_locked(struct vmap_area *va) +static __always_inline void +release_drained_vmap_area_locked(struct vmap_area *va) { - struct vmap_area *tracked; - lockdep_assert_held(&free_vmap_area_lock); =20 - tracked =3D merge_or_add_vmap_area_free_locked(va); - if (tracked) - return tracked; - - if (insert_vmap_area_free_locked(va)) - return va; - - /* - * Retry queue acts as allocation exclusion even if occupied restore - * fails under pressure. - */ - if (WARN_ON_ONCE(!occupied_mt_store_va_locked(va))) - INIT_LIST_HEAD(&va->list); - - retry_queue_add_va_locked(va); - return NULL; + kmem_cache_free(vmap_area_cachep, va); } =20 /* * Returns a start address of the newly allocated area, if success. * Otherwise an error value is returned that indicates failure. + * + * Steady state (post late_initcall, occupied_mt perf_mode on) takes + * the occupied-only fast path: find a gap with mas_empty_area on + * @occupied_vmap_area_mt and store the consumed sub-range. This costs + * two maple touches per allocation versus four to six in the legacy + * path (which clipped a free vmap_area struct in @free_vmap_area_mt). + * + * Pre-perf_mode (early boot) and -ENOENT/-ERANGE retries fall back to + * the legacy free_mt walk + va_clip path, which remains correct. */ static __always_inline unsigned long __alloc_vmap_area(unsigned long size, unsigned long align, @@ -2235,33 +2246,41 @@ __alloc_vmap_area(unsigned long size, unsigned long= align, return -EINVAL; if (size > vend - vstart) return -ENOENT; - if (align > PAGE_SIZE && (vend - vstart) !=3D size) { - if (check_add_overflow(size, align - 1, &search_len)) - return -ERANGE; - } =20 - if (occupied_mt_supported() && align <=3D PAGE_SIZE) { - unsigned long candidate; + /* + * Occupied-only fast path: skip both the free_mt validation + * (free_mt_find_enclose_range_locked) and the va_clip splitting. + * occupied_mt_find_hole_window_locked already pads the gap search by + * align-1 internally for align > PAGE_SIZE, so any alignment lands + * inside the returned gap; storing the consumed sub-range in + * occupied_mt makes the allocator visible to subsequent lookups. The + * legacy free_mt stays in sync only at coarse points (init, pre- + * perf_mode), which is harmless because the alloc and free hot paths + * no longer query it. + */ + if (occupied_mt_supported()) { + if (!occupied_mt_find_hole_window_locked(vstart, vend - 1, size, + align, &nva_start_addr)) + return -ENOENT; =20 - if (occupied_mt_find_hole_window_locked(vstart, vend - 1, size, - align, &candidate)) { - if (check_add_overflow(candidate, size, &nva_end_addr)) - return -ERANGE; + if (check_add_overflow(nva_start_addr, size, &nva_end_addr)) + return -ERANGE; =20 - va =3D free_mt_find_enclose_range_locked(candidate, nva_end_addr); - if (likely(va)) { - nva_start_addr =3D candidate; - goto found; - } + if (!occupied_mt_store_range_locked(nva_start_addr, nva_end_addr)) + return -ENOMEM; =20 - occupied_mt_cache_gap_miss_locked(candidate, vend); - } + return nva_start_addr; } =20 /* - * Free maple index is authoritative for allocatable ranges; lazy and - * retry entries are intentionally excluded from it. + * Pre-perf_mode early boot fallback: walk free_mt linearly and use + * va_clip to keep both indices coherent. */ + if (align > PAGE_SIZE && (vend - vstart) !=3D size) { + if (check_add_overflow(size, align - 1, &search_len)) + return -ERANGE; + } + mas_set(&mas, vstart); va =3D mas_find(&mas, vend - 1); while (va) { @@ -2295,7 +2314,6 @@ __alloc_vmap_area(unsigned long size, unsigned long a= lign, if (!va) return -ENOENT; =20 -found: ret =3D va_clip(va, nva_start_addr, size); if (WARN_ON_ONCE(ret)) return ret; @@ -2340,8 +2358,7 @@ static void free_vmap_area(struct vmap_area *va) spin_unlock(&free_vmap_area_lock); goto out_schedule_retry; } - if (!reinsert_or_queue_vmap_area_locked(va)) - queued_retry =3D true; + release_drained_vmap_area_locked(va); spin_unlock(&free_vmap_area_lock); =20 out_schedule_retry: @@ -2692,15 +2709,13 @@ reclaim_list_global(struct list_head *head, bool er= ase_occupied, { struct vmap_area *va, *n; bool ok =3D true; - bool queue_retry_work =3D false; + LIST_HEAD(release); =20 if (list_empty(head)) return true; =20 spin_lock(&free_vmap_area_lock); list_for_each_entry_safe(va, n, head, list) { - bool occupied_erased =3D false; - list_del_init(&va->list); if (erase_occupied) { if (WARN_ON_ONCE(!occupied_mt_erase_va_locked(va))) { @@ -2708,24 +2723,21 @@ reclaim_list_global(struct list_head *head, bool er= ase_occupied, ok =3D false; continue; } - - occupied_erased =3D true; - } - if (WARN_ON_ONCE(!merge_or_add_vmap_area_free_locked(va))) { - if (occupied_erased && - WARN_ON_ONCE(!occupied_mt_store_va_locked(va))) { - retry_queue_add_va_locked(va); - queue_retry_work =3D true; - ok =3D false; - continue; - } - list_add_tail(&va->list, failed); - ok =3D false; } + /* + * Occupied-only design: there are no free vmap_area objects + * any more. With the occupied marker erased, the range is + * implicitly free (a gap in occupied_vmap_area_mt). Just + * release the struct outside the lock. + */ + list_add_tail(&va->list, &release); } spin_unlock(&free_vmap_area_lock); - if (queue_retry_work) - schedule_work(&drain_vmap_work); + + list_for_each_entry_safe(va, n, &release, list) { + list_del_init(&va->list); + kmem_cache_free(vmap_area_cachep, va); + } =20 return ok; } @@ -5747,14 +5759,16 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned= long *offsets, orig_start =3D vas[area]->va_start; orig_end =3D vas[area]->va_end; if (occupied_mt_erase_va_locked(vas[area])) { - va =3D reinsert_or_queue_vmap_area_locked(vas[area]); - if (va) - kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end, - KASAN_VMALLOC_PAGE_RANGE | - KASAN_VMALLOC_TLB_FLUSH); - else - queued_retry =3D true; + /* + * Reinsert releases vas[area] in the occupied-only + * design; use orig_start/orig_end captured above for + * the kasan release call rather than va->va_start. + */ + release_drained_vmap_area_locked(vas[area]); + kasan_release_vmalloc(orig_start, orig_end, + orig_start, orig_end, + KASAN_VMALLOC_PAGE_RANGE | + KASAN_VMALLOC_TLB_FLUSH); } else { retry_queue_add_va_locked(vas[area]); queued_retry =3D true; @@ -5820,14 +5834,11 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned= long *offsets, orig_start =3D vas[area]->va_start; orig_end =3D vas[area]->va_end; if (occupied_mt_erase_va_locked(vas[area])) { - va =3D reinsert_or_queue_vmap_area_locked(vas[area]); - if (va) - kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end, - KASAN_VMALLOC_PAGE_RANGE | - KASAN_VMALLOC_TLB_FLUSH); - else - queued_retry =3D true; + release_drained_vmap_area_locked(vas[area]); + kasan_release_vmalloc(orig_start, orig_end, + orig_start, orig_end, + KASAN_VMALLOC_PAGE_RANGE | + KASAN_VMALLOC_TLB_FLUSH); } else { retry_queue_add_va_locked(vas[area]); queued_retry =3D true; @@ -6045,6 +6056,14 @@ module_init(proc_vmalloc_init); =20 #endif =20 +/* + * Pre-occupied-only design seeded the free index with placeholder VAs + * covering gaps between vmlist entries. This is preserved as the + * boot-time path that populates the legacy free_vmap_area_mt for any + * code that still queries it (notably pcpu_get_vm_areas). With + * occupied_vmap_area_mt authoritative, allocators on the hot path + * skip free_mt entirely. + */ static void __init vmap_init_free_space(void) { unsigned long vmap_start =3D 1; --=20 2.34.1 From nobody Sat Jun 13 23:22:38 2026 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7AC72F7EE7 for ; Sat, 13 Jun 2026 17:21:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.168.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371294; cv=none; b=C77F/bXe6QGDyHJ+qDKpmTLsyPJEkzKc389v79+FWBmLmcPGZvw5bL1RnJ2HlK1O++74jzHsxqA5Y33xYUw9WptpcxHNQm+HWOKqv47goi1VhmQWFqMcoanoQjROZmWdkJROrDMERQ1VJZ7WBZJ+PqqhGZEdd0zmnLD0Z01+DO8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371294; c=relaxed/simple; bh=JrEBDVSR2nkTeNdCYAhWPyaNXTAWBRV/Ax7hvp9FT5w=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=GsWMJs5LXOadZ3VZKQ+lI0p791wHOQYpxkvcbc2bmml/qxpzL1fuD4xhJEzyZfQWLjHSz4JOdT1jEX6ASU14YMBJM1iyhToj6a+f7Y/M69B9iCicAGWgvN0P/RThH2+ZPo5l7LPQ/s7QlVigjiz5NNPy0UqtvE40wpWqGe8Ytmo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com; spf=pass smtp.mailfrom=oss.qualcomm.com; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b=Hs6ygKrH; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b=bma4WZRf; arc=none smtp.client-ip=205.220.168.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b="Hs6ygKrH"; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="bma4WZRf" Received: from pps.filterd (m0279865.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65DF9HEe1235339 for ; Sat, 13 Jun 2026 17:21:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= gmDg6wpviGfici5wFw4rRp3TOW9sHNaqJUgICxRvP6Q=; b=Hs6ygKrHsfqhLSA/ y3v+FTKnv6YyTHSt4kUez393vsyDxS9tMSuz7GJlMqGy9X1VFhqDtl+fYRVqU/UL 4oCC1wpP/zYPctUNGzpUo1WMQrRzlYQlYlkSgMdiEGYNR8cdDr+xiJXR2fRBCp/r 7EwC+PVQ28p+bs9+38ypxUawOxqMwe+KtoqTCtQ3nNnfwysj5V8iZz08/v/nfRvq vZ/LAGISPOExJyT/FRXVXjLROgN1aVQmgY4MqlBOYgFBM1w2nUdykpLO0eYcOHb6 l1ISlJz08W1pRK78IR8smpIQF/fC0irKLZybRQfjq5LzjL1vnKzaUfBGo81Bm+4T Z+MXTw== Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4eryffhkvh-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Sat, 13 Jun 2026 17:21:31 +0000 (GMT) Received: by mail-pf1-f198.google.com with SMTP id d2e1a72fcca58-84233efcaadso1358563b3a.1 for ; Sat, 13 Jun 2026 10:21:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1781371291; x=1781976091; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=gmDg6wpviGfici5wFw4rRp3TOW9sHNaqJUgICxRvP6Q=; b=bma4WZRfQBkzbLC2FX2aCTBLI3OnZCSIb/zvmTzGCUneh725FNgWumyQK0bTZLUoAo rWnXZXo+GkSYP8e2ImjLoyNsZjVnFIPnV214Ft7qiSTBgumDCWAhE1+vdWPkmoVjO0po RmbdTz8xVFfvuTF7E9sVqYPdFJiI/beTcTFmSX7JBcfM6YeptrmqYcVR4ZDu78KYv/LF CTGY7vcBm0FzmDqEYQPFVX0vv9qmzaV6SCCCIuYbXPV8YWFvxgXRNaHDi6Qorkr6gvUs oshgirqxdRugm5LY3KdYMizO9QlDKdYQbVuQMCgg0sNKPkMjvaAtGYK5veTCwVBqeeMH mm3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781371291; x=1781976091; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=gmDg6wpviGfici5wFw4rRp3TOW9sHNaqJUgICxRvP6Q=; b=nEEXD5Y1MZ0m1Di6Tf3e++EzsOP7gV2wKMWU5jjvHATx6Z0evAAX9of4iaW6P883pZ m06amWmzdPIaQQ89rknexnEdWuTCo92fUfdw+hWMaevU9vB1onedL9GDTPIg2ifM93lb i+TxwlX8V/ktMxOHhhDzobnltAzijTOYX7NoMVa63biNpm4W8vNeO8g3ElugkP+OhNNe /lolW7dPRJJ9xEE+ROWiPw67CQB0vljqq4vXfjPk7cpmUeoD79qDD7ELrADmMUTue+sD noz8eKbZFud8qr8sAZ6KwZ0HFx4i8XkE/WyaK9i4CNExTglZyBhWpapw4ZWcB4kK0m3H oCQA== X-Forwarded-Encrypted: i=1; AFNElJ9jBh+XT12z6iOURzyrlzUH7ISYb++mbAKqdTlFbOZpeqsYB6+DLf/FsEElsysWO4y1Lw1WaubfEWle/TY=@vger.kernel.org X-Gm-Message-State: AOJu0YwGzVFVE6G/E+0//vsGly7H7XRBmCREBOyxgVkpRkGePZFhRb/9 6GDfb5z+ZIo9iyzA5XK2ealhajZc6gj+2YxSwSjW8GzmJOxj54gzkgDHm0ATftfNs0v8aeAAxjD 70m2iRYMBp4Vfex6wIFArCg0Lfp9sQ/IQCvrF0X1ldXvCgy573djYJHwWaI4tvj8NuFs= X-Gm-Gg: Acq92OHV4fmYY289Cx5mJGrUKjXUA5LOG300F/e+au1QGGn7fNL0ubgZu6KvNmdEJLg Fy/Hq0lb8loSZPh8dVmYHJq2C6tg9nP2nmAQKqqT9CoD4j9EhzYcdsUfveo2qDpLbUv2Mn61fg0 6ZrmXLmUgHJkf6KtqR5I81bYv3yWtgDV9WRgyaeiRlMQnydysmg9Rgsk4zCSOBhn8d7fSFfsQN0 mge8RdxrBcXamg7+QlHjIplueRocTUvERmHfzTei8khmrbXZ4pTj6W7h/sXDvW1nFZT7lW57cFq 2okEHDWCl9qLys1TwERd03W9W4wC+JyY0J+ZNlfo685pHSK2pMhWEiiIvyz/5iFZzY7n+GvuoHz OyQwiYJqmmh6JzrwbbKPpJUB4pxja3b75+ge/55pAeJWdzeDabiOXvg== X-Received: by 2002:a05:6a00:3d48:b0:842:3a98:b34d with SMTP id d2e1a72fcca58-8434ce83f17mr8390911b3a.31.1781371291158; Sat, 13 Jun 2026 10:21:31 -0700 (PDT) X-Received: by 2002:a05:6a00:3d48:b0:842:3a98:b34d with SMTP id d2e1a72fcca58-8434ce83f17mr8390867b3a.31.1781371290674; Sat, 13 Jun 2026 10:21:30 -0700 (PDT) Received: from hu-pranarya-hyd.qualcomm.com ([202.46.22.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434accbec5sm5390913b3a.16.2026.06.13.10.21.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 10:21:30 -0700 (PDT) From: Pranjal Arya Date: Sat, 13 Jun 2026 22:49:50 +0530 Subject: [PATCH RFC 08/12] mm/vmalloc: track lazy-purge queue as a list_head Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260613-vmalloc_maple-v1-8-0aa740bb944b@oss.qualcomm.com> References: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> In-Reply-To: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> To: Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Alice Ryhl , Andrew Ballance Cc: linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, Lorenzo Stoakes , Pranjal Shrivastava , Will Deacon , Suzuki K Poulose , Neil Armstrong , Mostafa Saleh , Balbir Singh , Suren Baghdasaryan , Marco Elver , Dmitry Vyukov , Alexander Potapenko , Shuah Khan , Dev Jain , Brendan Jackman , Puranjay Mohan , Santosh Shukla , Wyes Karny , Pranjal Arya , Sudeep Holla X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1781371215; l=7702; i=pranjal.arya@oss.qualcomm.com; s=20260516; h=from:subject:message-id; bh=JrEBDVSR2nkTeNdCYAhWPyaNXTAWBRV/Ax7hvp9FT5w=; b=HlA43z7W1BgFHiQdUCJ9NYyv6guSh8B+257WOVr1UNUmQ3utJWN5wVgMVBRX9p2ibCldFaTWa 7E0Iaid+Gw+AfDte7XbFrrnh/UdH/cFriQdiXtcPQ+ualCVfPKMbz1S X-Developer-Key: i=pranjal.arya@oss.qualcomm.com; a=ed25519; pk=ymtcTlccEIDsi3ErhpjIoZZHKdPBYWGWW0Lchs5MsbE= X-Authority-Analysis: v=2.4 cv=HuxG3UTS c=1 sm=1 tr=0 ts=6a2d919b cx=c_pps a=m5Vt/hrsBiPMCU0y4gIsQw==:117 a=fChuTYTh2wq5r3m49p7fHw==:17 a=IkcTkHD0fZMA:10 a=FelO9ux0wxsA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=Um2Pa8k9VHT-vaBCBUpS:22 a=EUspDBNiAAAA:8 a=LbbEsUlp8jkdcCOz7_EA:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10 a=IoOABgeZipijB_acs4fv:22 X-Proofpoint-ORIG-GUID: WoCV72kNTZI2ux378kKNe0zzMjEy-spi X-Proofpoint-Spam-Info: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX9VlKvdJWfXh7 GtjLX47UZY++rhlqrT6CYy/p6l4TLPab/ExcEqqsKgQEcq/s+xfh+jCdP11HZkZ/elldV0i+Ax/ It4oGsg/wk2SgvATDKhOVlRfcvZNG98= X-Proofpoint-GUID: WoCV72kNTZI2ux378kKNe0zzMjEy-spi X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfXxvrGl/xzls84 l/hZgfYhxkbe8K4FKSx9hOg4ab+/OxqfA3XgmO6vmGGWfQ/DLE8ksdPR1uNmFvyAZMNdfXOXuZK bXTluaXHRbMBmn33WairQrQ9npyOez6ppn1abOpWsrFREix5l7Pdj0kXOALOMu/D1IdfRZs6gB+ 6DhQ4uuvyKoSihTvxTyT4lhY8jIzB7a2tp0aScq8CB8wPM5VYiWgtIK/CYTUNkKGpDZQ4g0m7eD krpLG1ebgKkN8yimT/5cTOoAFKLurmiXkC5f5H+2+Cj5DAE/ispDrji5DppxBj8Jh87CnTLFY4n eApkGzLUk3AXHtzN/P3G4xyaRiv/RJ/dAM9icJUYCujokKlOsJP3CoL++4HkbBdeETMbSA05App K8vlKkSULFONJsAXsHZKsfQ9+qVeiHnTcVvvWu9ueXBGTrqOp8BFgTSQyOj/+hzmgbJkJTjL5zA dNO6TD8KYahkXD2frEw== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-13_03,2026-06-12_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 bulkscore=0 adultscore=0 impostorscore=0 priorityscore=1501 phishscore=0 clxscore=1011 spamscore=0 suspectscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606040000 definitions=main-2606130180 The lazy queue is bulk-drained from the purge worker; nothing queries it by address. publish_vmap_area_lazy() inserts into the queue and purge_vmap_areas_lazy() walks it linearly. A list_head expresses the actual usage and saves the per-publish maple insert. Per-node vn->lazy.mt becomes vn->lazy_list. The locking discipline (vn->lazy.lock still serialises inserts) is unchanged. Signed-off-by: Pranjal Arya --- mm/vmalloc.c | 133 +++++++++++++++++++++++++------------------------------= ---- 1 file changed, 57 insertions(+), 76 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 73a40a88dbf6..1b73001e197e 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -942,6 +942,16 @@ static struct vmap_node { struct mt_list busy; struct mt_list lazy; =20 + /* + * Lazy list. The lazy index is no longer queried by address on the + * hot path: free_vmap_area_noflush() pushes the VA via list_add and + * purge drains it via list_splice. Keeping a list head sidesteps a + * mas_store on every vfree and a mas_for_each + per-entry + * mas_store(NULL) during purge. lazy.mt is retained for the rare + * non-perf_mode rollback path inside publish_vmap_area_lazy(). + */ + struct list_head lazy_list; + /* * Ready-to-free areas. */ @@ -1510,52 +1520,37 @@ unlink_vmap_area_busy_locked(struct vmap_area *va, = struct vmap_node *vn) static __always_inline bool insert_vmap_area_lazy_locked(struct vmap_area *va, struct vmap_node *vn) { - int err; - lockdep_assert_held(&vn->lazy.lock); =20 - try_init_lazy_mt_locked(vn); - if (WARN_ON_ONCE(!vn->lazy.mt_enabled)) - return false; - - if (!validate_vmap_area_range_insert_mt_locked(&vn->lazy.mt, - va->va_start, - va->va_end)) + /* + * The maple-tree lazy index is bypassed in the hot path: a simple + * list saves one mas_store per vfree and one mas_for_each + N + * mas_store(NULL) during purge. lazy.mt is left untouched here so + * the non-perf_mode publish_vmap_area_lazy() rollback can still + * unlink the VA via unlink_vmap_area_lazy_locked() if it inserted + * one =E2=80=94 that path is unreachable in steady state with perf_mode = on. + */ + if (WARN_ON_ONCE(!list_empty(&va->list))) return false; =20 - INIT_LIST_HEAD(&va->list); - - MA_STATE(mas, &vn->lazy.mt, va->va_start, va->va_end - 1); - - err =3D mas_preallocate(&mas, va, GFP_NOWAIT | __GFP_NOWARN); - if (!err) { - mas_store_prealloc(&mas, va); - mas_destroy(&mas); - return true; - } - - err =3D mas_store_gfp(&mas, va, GFP_ATOMIC | __GFP_NOWARN); - return !WARN_ON_ONCE(err); + list_add_tail(&va->list, &vn->lazy_list); + return true; } =20 static __always_inline bool unlink_vmap_area_lazy_locked(struct vmap_area *va, struct vmap_node *vn) { - int err; - lockdep_assert_held(&vn->lazy.lock); =20 - try_init_lazy_mt_locked(vn); - if (WARN_ON_ONCE(!vn->lazy.mt_enabled)) - return false; - - MA_STATE(mas, &vn->lazy.mt, va->va_start, va->va_end - 1); - - err =3D mas_store_gfp(&mas, NULL, GFP_ATOMIC | __GFP_NOWARN); - if (WARN_ON_ONCE(err)) + /* + * Match insert_vmap_area_lazy_locked()'s list-based fast path. Used + * only by publish_vmap_area_lazy() rollback, which is unreachable in + * steady state but kept for the non-perf_mode early-boot window. + */ + if (list_empty(&va->list)) return false; =20 - INIT_LIST_HEAD(&va->list); + list_del_init(&va->list); return true; } =20 @@ -1610,48 +1605,22 @@ lazy_vmap_areas_empty_locked(struct vmap_node *vn) { lockdep_assert_held(&vn->lazy.lock); =20 - try_init_lazy_mt_locked(vn); - if (WARN_ON_ONCE(!vn->lazy.mt_enabled)) - return true; - - return mtree_empty(&vn->lazy.mt); + return list_empty(&vn->lazy_list); } =20 static __always_inline void move_lazy_vmap_areas_to_purge_locked(struct vmap_node *vn) { - LIST_HEAD(move_list); - struct vmap_area *va, *n_va; - int err; - lockdep_assert_held(&vn->lazy.lock); =20 - try_init_lazy_mt_locked(vn); - if (WARN_ON_ONCE(!vn->lazy.mt_enabled)) - return; - - MA_STATE(mas, &vn->lazy.mt, 0, 0); - - mas_for_each(&mas, va, ULONG_MAX) - list_add_tail(&va->list, &move_list); - /* - * Erase ranges one-by-one and move only successfully erased entries to - * purge_list. This avoids destroy/reinit churn and keeps lazy index - * coherence if an erase operation fails under pressure. + * Move every queued VA to purge_list with a single splice. The + * sort-by-address property that the maple-tree lazy index used to + * provide is no longer used by purge_vmap_node(); kasan_release + * computes its own min/max over the resulting purge_list when + * needed. */ - list_for_each_entry_safe(va, n_va, &move_list, list) { - MA_STATE(mas_erase, &vn->lazy.mt, va->va_start, va->va_end - 1); - - err =3D mas_store_gfp(&mas_erase, NULL, GFP_ATOMIC | __GFP_NOWARN); - if (unlikely(err)) { - WARN_ON_ONCE(err); - list_del_init(&va->list); - continue; - } - - list_move_tail(&va->list, &vn->purge_list); - } + list_splice_tail_init(&vn->lazy_list, &vn->purge_list); } =20 static __always_inline bool @@ -2806,13 +2775,18 @@ static void kasan_release_vmalloc_node(struct vmap_node *vn) { struct vmap_area *va; - unsigned long start, end; + unsigned long start =3D ULONG_MAX, end =3D 0; unsigned int batch_count =3D 0; =20 - start =3D list_first_entry(&vn->purge_list, struct vmap_area, list)->va_s= tart; - end =3D list_last_entry(&vn->purge_list, struct vmap_area, list)->va_end; - + /* + * purge_list is no longer sorted by address (lazy_list is built in + * insertion order via list_add_tail). Compute the bounding range + * inline with the per-VA shadow-release loop to avoid a second walk. + */ list_for_each_entry(va, &vn->purge_list, list) { + start =3D min(start, va->va_start); + end =3D max(end, va->va_end); + if (is_vmalloc_or_module_addr((void *) va->va_start)) kasan_release_vmalloc(va->va_start, va->va_end, va->va_start, va->va_end, @@ -2824,7 +2798,9 @@ kasan_release_vmalloc_node(struct vmap_node *vn) } } =20 - kasan_release_vmalloc(start, end, start, end, KASAN_VMALLOC_TLB_FLUSH); + if (start < end) + kasan_release_vmalloc(start, end, start, end, + KASAN_VMALLOC_TLB_FLUSH); } =20 static void purge_vmap_node(struct work_struct *work) @@ -2938,6 +2914,7 @@ static bool __purge_vmap_area_lazy(unsigned long star= t, unsigned long end, static cpumask_t purge_nodes; unsigned int nr_purge_nodes; struct vmap_node *vn; + struct vmap_area *va; int i; =20 lockdep_assert_held(&vmap_purge_lock); @@ -2964,11 +2941,14 @@ static bool __purge_vmap_area_lazy(unsigned long st= art, unsigned long end, move_lazy_vmap_areas_to_purge_locked(vn); spin_unlock(&vn->lazy.lock); =20 - start =3D min(start, list_first_entry(&vn->purge_list, - struct vmap_area, list)->va_start); - - end =3D max(end, list_last_entry(&vn->purge_list, - struct vmap_area, list)->va_end); + /* + * lazy_list (and therefore purge_list) is no longer sorted by + * address. Compute the bounding range by walking purge_list. + */ + list_for_each_entry(va, &vn->purge_list, list) { + start =3D min(start, va->va_start); + end =3D max(end, va->va_end); + } =20 cpumask_set_cpu(node_to_id(vn), &purge_nodes); } @@ -6153,6 +6133,7 @@ static void vmap_init_nodes(void) mt_init_flags(&vn->lazy.mt, MT_FLAGS_LOCK_EXTERN); mt_set_external_lock(&vn->lazy.mt, &vn->lazy.lock); vn->lazy.mt_enabled =3D true; + INIT_LIST_HEAD(&vn->lazy_list); =20 for (i =3D 0; i < MAX_VA_SIZE_PAGES; i++) { INIT_LIST_HEAD(&vn->pool[i].head); --=20 2.34.1 From nobody Sat Jun 13 23:22:38 2026 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 51EB432B10D for ; Sat, 13 Jun 2026 17:21:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371302; cv=none; b=E5v0wkaxHDUWc7xq7RPJ0HpgnVS7uXiBvfeG1NDJ1nDDxYZY7sbEMxO4XSg5GlXV7Oy5EYTNESUk/QgFgLc9hfgVit4f+ZNE4g8gN9p5vPo9j7SmAo4uRzNlkWF2Y1slSfOURw07VukeGL7UYGtqlQ2eDk58OXus6xgwlhcn92A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371302; c=relaxed/simple; bh=1Wo2lxnikoSv9JichvG4ITxBxFpg2W31gcAukcfik7I=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=hJEyd1AB14ohE7yFeYPtSQcGP1m+cddFLXj+IY1yxcUfk6IVQbIYJT5hjwN9vL9QwfkkgaEOrRrGPDOpRRtwcV8n303dxprljB2nadbqDHAecrMYhH2irDaEPywosca8O5KJRFnqJUXBFmmDml8HTXk9s72/HraAMYhOYSXekA0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com; spf=pass smtp.mailfrom=oss.qualcomm.com; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b=SvBF5DYb; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b=JeRSVPjm; arc=none smtp.client-ip=205.220.180.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b="SvBF5DYb"; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="JeRSVPjm" Received: from pps.filterd (m0279868.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65DFB9Rj3121930 for ; Sat, 13 Jun 2026 17:21:40 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= WFBZitbymj18mEAzAGK9dmr1JcGda6wnYCfgwxi85K8=; b=SvBF5DYbflsHrzFH LzJ9fajfCViGVsueUX7G/m5reswGqPPsm42cLiDvPCNv9GePjUXBEenrHZ6QKLff G4DzPycl+CeTCjOy6d45UjFY7+lHNFTORrRErZ9Y/MC6SIrTp8ZZGx2sNPa/kFWz 9kbcEpB/kmZmp3LHzy85tGeFTJEXwuCaqM87/pTIYM79ET8cju/1G1N+gEBmUK27 ieUADs4vmWASn9626/4wPj8c7RxvPqx33gJ5ubnQHEQFdBfhrdG+NtUJC9tNlvaF iE/5a16R9KLtIAUZhqrxGTD25MO4ZJcWpk5nonzTWbVIS8+EjrxMMwRkDf6WCp8M YR6Rcg== Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4ery951mmc-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Sat, 13 Jun 2026 17:21:40 +0000 (GMT) Received: by mail-pg1-f199.google.com with SMTP id 41be03b00d2f7-c8584e80bfcso951113a12.3 for ; Sat, 13 Jun 2026 10:21:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1781371299; x=1781976099; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=WFBZitbymj18mEAzAGK9dmr1JcGda6wnYCfgwxi85K8=; b=JeRSVPjmVv8B+VUmbQkyKKWJY5X5T248SuT6XCBVnK80b8vUQti7NDD/jGWBovUgzC l6ckASd7TSFNEP7lRRp6IQfr3kUboizaN4S9a8UgaiH00eaZuDRYyHo7XiIEySgrz0rN 2JAulwtXiMSUsOT7a4nTJ5V/HrlhNEtLX9fqTUcvwi9mLD3Uuq072Jp9ekYBNKsRwN6T hDwWcRKSdhIgTG2tLyx1OFX06IsVFYtMiCZGcUH9puGDV3fwu+GMRVRbHhFzyF9/j1+1 sn6OEN6PS4Hu4DUyTc+aBB7NKwigSWdeVu94Zdmyg97Woqd2rSkyex08C4CH3vh9rqmC wd+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781371299; x=1781976099; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=WFBZitbymj18mEAzAGK9dmr1JcGda6wnYCfgwxi85K8=; b=IU0/pc9sH4RHZkGqeFtYGt9SqXd1x4VBBX2mEMUBXY/Aen0YfK8siyhRSi06WQ2put lobvsUU9zsAKDdXQ8h2pA36wi00gsNzpysuyEPkSYGlKUeT56nj90tQO/0ZlMMjP+Ryi LTXf8UfRvZlyRVOEebwdswGs3e7DXf6TgeqVhXv9mQpVLMpgBgcrh0BBHHAf/Gov6xii G1BHrz9Be5EcY80PJqoqJ658f2EqkGVQqonsTNYbMj6Yn1NZIJhc2dGaCs6DH0QRRoaC ggZtSsqUxFv7An4cB5sbVoe4+/N8mbIl943Lr0L49zfuwUGRz10WjKneNfPlg7GIFOzG mD8Q== X-Forwarded-Encrypted: i=1; AFNElJ/DlwqhDzHTM1Y7QLRMdnmNei8fdS8Rgc5aTt8q4Xi2p0QG+EbqoW9GhlNrwClpdkC/d67Pea0iY/z09xo=@vger.kernel.org X-Gm-Message-State: AOJu0YxSHNErr+6rBqWSnCnIlSztut1xVzpoCY89ZndChI8x+0p9YXrI gl41mHJq0jY3vbXwQQhTNt4NgsD71r1x678+IDwvH1g87uShiwWo3AQmEHnX42wmi9DWGniosC9 Fd23nz34CkT0rbHB9o59fPFVAA0OsSj0KP8uhoJ8+mvAv+zhUPqIdud/fRSudHSYBB5g= X-Gm-Gg: Acq92OGCFRe5qWKHjLayFAfXGSJ4c21Xr52jGrbz1ukcNvd11jAi18x882CsGDYfBTe d0SCEdVlMFezuWE42zFbxvDEkHATY71twXxxKD8idxN/JuI48fmqaePJGlQTthhtSs7EjBfWdM9 eFkPaJzYl6r4QZcRGA1Nad1eu847On7Wt/cgyCUTBCZ59Cya9WGmMzKyZgyKqcBUhOGJ748pwfm gL64zSRPvw6s0TYP4pEaICvi36EcuuxYQMviIrO5t5wHmnfAllnFIbensD0QjTLTwMLU0XmsL8q SV0l1jfClmHQnYFLIDmUVntw2NiTgWuhCu0HKF03CDq1g69uZ7A9fjqp6cMwbHyn8EZt0FDE2pE dOhfxCPyv3j3/oAHG31104EfMYoEXvUfkkkD/4qZGlL+K+U2ILlTepQ== X-Received: by 2002:a05:6300:218c:b0:3b4:8268:23e8 with SMTP id adf61e73a8af0-3b783f0a3f4mr9239528637.21.1781371299341; Sat, 13 Jun 2026 10:21:39 -0700 (PDT) X-Received: by 2002:a05:6300:218c:b0:3b4:8268:23e8 with SMTP id adf61e73a8af0-3b783f0a3f4mr9239494637.21.1781371298918; Sat, 13 Jun 2026 10:21:38 -0700 (PDT) Received: from hu-pranarya-hyd.qualcomm.com ([202.46.22.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434accbec5sm5390913b3a.16.2026.06.13.10.21.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 10:21:38 -0700 (PDT) From: Pranjal Arya Date: Sat, 13 Jun 2026 22:49:51 +0530 Subject: [PATCH RFC 09/12] mm/vmalloc: collapse busy-tree find-then-unlink into a single mas_erase Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260613-vmalloc_maple-v1-9-0aa740bb944b@oss.qualcomm.com> References: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> In-Reply-To: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> To: Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Alice Ryhl , Andrew Ballance Cc: linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, Lorenzo Stoakes , Pranjal Shrivastava , Will Deacon , Suzuki K Poulose , Neil Armstrong , Mostafa Saleh , Balbir Singh , Suren Baghdasaryan , Marco Elver , Dmitry Vyukov , Alexander Potapenko , Shuah Khan , Dev Jain , Brendan Jackman , Puranjay Mohan , Santosh Shukla , Wyes Karny , Pranjal Arya , Sudeep Holla X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1781371215; l=1577; i=pranjal.arya@oss.qualcomm.com; s=20260516; h=from:subject:message-id; bh=1Wo2lxnikoSv9JichvG4ITxBxFpg2W31gcAukcfik7I=; b=spWOO4QH0SPH6dKaihPuUAG1gV/jkwgr4CrTNhi9QWp9XeyW4iqebvtmBC0BAL7LESeXg0rG0 +AxtMJsVJytCUVtXjiAeMTCHd8dC34JkXXxIOXx0OtYDeVjqGqIvmGQ X-Developer-Key: i=pranjal.arya@oss.qualcomm.com; a=ed25519; pk=ymtcTlccEIDsi3ErhpjIoZZHKdPBYWGWW0Lchs5MsbE= X-Proofpoint-ORIG-GUID: 73fUes-GSmSHHNg4XBK1x5LIlfMhaC7x X-Proofpoint-Spam-Info: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX7EdCSa1GWbNZ T+NAvVu/tR/6DKGz7KZETHJf+BmH3e9IIJn+1VaD8xLMWK2X64vpxuvA7N0AcJOyptKjtot7coO qdKFZXvXdsLR6AdJznt+0urNpCinp4o= X-Proofpoint-GUID: 73fUes-GSmSHHNg4XBK1x5LIlfMhaC7x X-Authority-Analysis: v=2.4 cv=EbP4hvmC c=1 sm=1 tr=0 ts=6a2d91a4 cx=c_pps a=Oh5Dbbf/trHjhBongsHeRQ==:117 a=fChuTYTh2wq5r3m49p7fHw==:17 a=IkcTkHD0fZMA:10 a=FelO9ux0wxsA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=ZpdpYltYx_vBUK5n70dp:22 a=EUspDBNiAAAA:8 a=qy1EqpzxIlcnHA3QzDgA:9 a=QEXdDO2ut3YA:10 a=_Vgx9l1VpLgwpw_dHYaR:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfXy+Kyk2F+cw2J N3s9O1+4Q1xGHduQUv1qNXWi9pJtNUNTxopCqPHHW4M5KLk3pZnGTWyFeEj06Xk7uXhxoEWFQ3o QLp3RnNoi7kqqlueW288K4DUtGk/Bi3hjbwqoQ+UfKurcF0RP+sLwEAIbUx0ynLx99/K8r+F6SG tQTlgAFPTBE1XFrNMGvs4zHtxGvKDepWNAGBkKqbPS2FcOlozMsq5gl+pyqG7Re4G7DtVOFOvSK xxpfsV+gecr0o7MMgZw1PBuHcz05lT3fZ4ZREgb+xaY27O7jaduVoUlxUqZHtL287I178rumH4e VlxX9t8gJYhgrCU6Nc8AgyY64Tpxl/akL6c2uIVQ8nfnMz0S5E2r6Jn39yZP7aFK3MtGhUyhvS7 hmOlOKlu2KtvB/075bCIzY+gKHgDqp+1jsTOcxQfxt5yFddxO0X2Karq/VTVMmnMQb5F7cu8t5S zD1dR6OJe/lMCDTvdgw== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-13_03,2026-06-12_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 priorityscore=1501 bulkscore=0 clxscore=1015 malwarescore=0 suspectscore=0 spamscore=0 phishscore=0 impostorscore=0 lowpriorityscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606040000 definitions=main-2606130180 find_unlink_vmap_area() previously walked the busy tree once (via find_vmap_area_busy_locked() / __find_vmap_area_mt) and then erased via a second walker (unlink_vmap_area_busy_locked). Two independent maple-tree descents, one per call. maple_tree exposes mas_erase() which combines lookup and erase in one descent. Replace the find+unlink pair with a single mas_erase() walker. Signed-off-by: Pranjal Arya --- mm/vmalloc.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 1b73001e197e..463127d5ce58 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3122,10 +3122,24 @@ static struct vmap_area *find_unlink_vmap_area(unsi= gned long addr) do { vn =3D &vmap_nodes[i]; =20 + /* + * Combine the lookup and removal into a single maple-tree + * descent. mas_erase() positions the state at @addr and clears + * the slot in one pass, returning the previously stored VA. + * This saves the second mas_store(NULL) the original + * find_vmap_area_busy_locked + unlink_vmap_area_busy_locked + * pair issued, halving the busy-tree maple work per vfree. + */ spin_lock(&vn->busy.lock); - va =3D find_vmap_area_busy_locked(addr, vn); + if (likely(vn->busy.mt_enabled)) { + MA_STATE(mas, &vn->busy.mt, addr, addr); + + va =3D mas_erase(&mas); + } else { + va =3D NULL; + } if (va) - unlink_vmap_area_busy_locked(va, vn); + INIT_LIST_HEAD(&va->list); spin_unlock(&vn->busy.lock); =20 if (va) --=20 2.34.1 From nobody Sat Jun 13 23:22:38 2026 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4443DFC0A for ; Sat, 13 Jun 2026 17:21:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371311; cv=none; b=IZ9hbDvyCgHrIb+7N+hGhHA3W+I/EXefkB6O3Hp365WVbyCF4hG1iVSYZU296KxUCoCjSRHdvsk0UP2KaSoVA4CxcZqhOUFAO0L0Gx8fi+gj1a3L52RKOYFB+AbB1ir5siBEzc+ZV6h06QHQ/6h87f49bA6SbS02I3DzgR8ssG8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371311; c=relaxed/simple; bh=PsqiD8G6HiZiANmDP/iDjGIPvGoKXcMiDUmCdx9BhDA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=qXo4Y0lH7Z0YGsalnyoJJMv+Z5FZa7QrDzgiJTq4NBBJRb13uYhuECXyGrrNoVgd2+MqeVFhnllwTEpztjGVSXAmpvIBn/U2yCukPMOBi1k8jZrgM/sn7pPV+aEvhixnsaLqIOnC23RosmyKb7MDKh4olQ8VZ/A6ea5H/KUXdeI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com; spf=pass smtp.mailfrom=oss.qualcomm.com; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b=F3d3zfgV; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b=Y5FKaIlU; arc=none smtp.client-ip=205.220.180.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b="F3d3zfgV"; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="Y5FKaIlU" Received: from pps.filterd (m0279871.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65DFC84C3230490 for ; Sat, 13 Jun 2026 17:21:49 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= CEwkWdeK2JOxz6CkWOPVdEpzOJip0EPBtxLmX7NxFto=; b=F3d3zfgVQWpLzqDb rVHYFKrrSxRwKj/qdAheSyaCiab4OUQWb9AmJytKD3J+LB2MSuXW61DmfyhWWjQW CKmSuBYXIsTMaCuJta5D/TNu7gIaHsh9A5GAuZ/gXS9zcqHIf9hlA1jCzoNRtgwF XDb/aaLKwthBrUC4viEg6Q5lQEbeFENAEkS5QY9hz3+FSxHIrO2tMQCRyXM4MuZv nnkelEy0BYA4jWhypdLJwcJFQJ4Mis0U+I6U6dJMWpa3vFkmEcv6xNI2ea4Acv/S RsFPtl9UlF192CxhabyjaVZVonAv+/T7imas3ScKscxTW8/NGozkJHK/zhVRH21L OmNUeQ== Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4ery7u1mtg-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Sat, 13 Jun 2026 17:21:48 +0000 (GMT) Received: by mail-pg1-f199.google.com with SMTP id 41be03b00d2f7-c85c530ddebso958727a12.1 for ; Sat, 13 Jun 2026 10:21:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1781371308; x=1781976108; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=CEwkWdeK2JOxz6CkWOPVdEpzOJip0EPBtxLmX7NxFto=; b=Y5FKaIlUZ/NCXNffPi7p7nDvRqbeXboi7A+jMP0SZkKVwDjG44kyVVdG1sEFKmytvL 542WpkVU2+4FjkDTVTAsdGNlVgXqCAKAefmb2J9BbjvzTTVnUDuFaI5SQ5zpVbbfDTym Tnu31MmSJMcyWtij+QV1avoYuZaXrIWnmXlflGjtswxewWFbtgi72reil9pAw7cd4PU+ OHMZPPvjEMkakx0MX7knxorfvZkfdAxBSvcAkvy3cyZaF8hAQc6plZGR1it0ozOCaoAW JsrH1pxeoXzIzVjvrYC3c9YTjgfOdG/8x/faZYqfXMW/cq9e5vPJgAwEdF2RFa6M3RbA 34KA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781371308; x=1781976108; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=CEwkWdeK2JOxz6CkWOPVdEpzOJip0EPBtxLmX7NxFto=; b=sdJYItX6vn01gUvUGmiM95Rde4zGNDDCp9LJ8yxE2c8ZV3bXxL1j4U8M0EAuC9bDt3 rvOSqElFU3bsKO3hvTjHXmyD0kSFs4FuE3J4iPQO7VWcrknydbIpMMLgk8Inmv3Ge0Bz S7EVToJStcNBQK21Duv1IoKeMgOIIH2raOzk/V0z3TmA5PGLIiWll6j9exwlHgzbGYHE EyuACqx0I6+DumzuRqmCGG6tdsMPt9cIFpUqLk7W7I4Y3NFYzoKCTkTnAyCZB4e4nEkp oqX1Z2mkk6TPLCdjlC9qhiKOCTW34MgAtIvbW+BKS2heIM5lEqyXj4AbwQ3vNA8JRG8d pvRg== X-Forwarded-Encrypted: i=1; AFNElJ8KCwJ3OUzuLeoa6FP2yVY8Romh00+gViVp2BszzWe8rxDSUOW7cuD1OvCtp48jrLtqB1/nmEjG3dEQ4sw=@vger.kernel.org X-Gm-Message-State: AOJu0Yzv3G9nTXeD0uWaqB03BtmVfjjM+sxmj0fn+5oPrZco53U4mYMr N4Jg/bnIdTLSleDRtnF6vDYehOu6GJ6iPx8Sdj2Ga1ceubZgJtktbddeLiyqczozagzrp6RdbDl O84MXUfiX5Mddl3p5p671f9NpDJTii2MbPlyT1/zeQ9Yje3vgCr+SQTQ2oL9pPcoe4dQ= X-Gm-Gg: Acq92OGkYV4sgbuLgN8Fy0yY3p12tWDJMbQjMs+cjCyOAx7RppmyqcGrS++uyGldnj8 qC7OO2cCWXUO2ZhBYT17/hmFVhpFjM8Idg4H2yDmW93i2iXq9bYZUFJ312SLCwYBGsUHpFfnSCF cBfSKxEg1ksPBitMTCmf1TtD1XgpbsG96+iGxQtrd/BBd+RKsUUKu5oVFSxVjDoD/ow9mZpYCi1 uCRam5JxByxCLGT3PhAUErwPmR8pA5zfd2AaeN6RaRng9SiZEebfAVzkiUYbxWdGWJJddA8HxQD YMIfBD9Ujo4N2q9duhgmCvad4oLQRg9cMsFImKzng0ABWYX60LBpwcFJQV28IpI1Q1xlwvoKV5u F2shVRH0ck+kERCny/wDbN84gwpFF25ZwAVDBJueaCKobyqBM18uTxw== X-Received: by 2002:a05:6a00:3996:b0:842:3be7:4d57 with SMTP id d2e1a72fcca58-8434ce31498mr7995843b3a.18.1781371307763; Sat, 13 Jun 2026 10:21:47 -0700 (PDT) X-Received: by 2002:a05:6a00:3996:b0:842:3be7:4d57 with SMTP id d2e1a72fcca58-8434ce31498mr7995808b3a.18.1781371307197; Sat, 13 Jun 2026 10:21:47 -0700 (PDT) Received: from hu-pranarya-hyd.qualcomm.com ([202.46.22.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434accbec5sm5390913b3a.16.2026.06.13.10.21.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 10:21:46 -0700 (PDT) From: Pranjal Arya Date: Sat, 13 Jun 2026 22:49:52 +0530 Subject: [PATCH RFC 10/12] mm/vmalloc: per-CPU caching of free ranges from the maple_tree allocator Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260613-vmalloc_maple-v1-10-0aa740bb944b@oss.qualcomm.com> References: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> In-Reply-To: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> To: Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Alice Ryhl , Andrew Ballance Cc: linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, Lorenzo Stoakes , Pranjal Shrivastava , Will Deacon , Suzuki K Poulose , Neil Armstrong , Mostafa Saleh , Balbir Singh , Suren Baghdasaryan , Marco Elver , Dmitry Vyukov , Alexander Potapenko , Shuah Khan , Dev Jain , Brendan Jackman , Puranjay Mohan , Santosh Shukla , Wyes Karny , Pranjal Arya , Sudeep Holla X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1781371215; l=10345; i=pranjal.arya@oss.qualcomm.com; s=20260516; h=from:subject:message-id; bh=PsqiD8G6HiZiANmDP/iDjGIPvGoKXcMiDUmCdx9BhDA=; b=Rzw5Lud8MlSAcq9OrXE/82DTua23HOpREVjTa+P4bOWSQXGGH2GBYsa2/2zyORRZl+X2/MnQV Ju0LVd2eH8fBQb6oGLJiYWu3DKB8RDzJ/6N+2REGYcU7x9VGIQ29X4s X-Developer-Key: i=pranjal.arya@oss.qualcomm.com; a=ed25519; pk=ymtcTlccEIDsi3ErhpjIoZZHKdPBYWGWW0Lchs5MsbE= X-Authority-Analysis: v=2.4 cv=F8BnsKhN c=1 sm=1 tr=0 ts=6a2d91ac cx=c_pps a=Oh5Dbbf/trHjhBongsHeRQ==:117 a=fChuTYTh2wq5r3m49p7fHw==:17 a=IkcTkHD0fZMA:10 a=FelO9ux0wxsA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=3WHJM1ZQz_JShphwDgj5:22 a=EUspDBNiAAAA:8 a=FUy_0n9OcN9IEB7T7O0A:9 a=QEXdDO2ut3YA:10 a=_Vgx9l1VpLgwpw_dHYaR:22 X-Proofpoint-GUID: XPz86AXHmqOUEw_pnu44Ztw1hS_GzuWf X-Proofpoint-ORIG-GUID: XPz86AXHmqOUEw_pnu44Ztw1hS_GzuWf X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX2esWVc8WVjuA 6M/K6LO246wcncx8HguZyhJLuNcgQJXmdory/Fm8dI9Vkl71jEgKqPvojWCexGp6aK4M7C2u5KP TsXNqbQMJa6cKVg1ShsrVHD7Aa8H5URKVIZtmy12cCHl4lpRNblp3+yl1WeKYh8WBuCXzTHpmT6 aI0NZ55sugI6EjhtnNDM2Bi9I2qUAK7sKh2KlXPcJgp69ZlJFMzwbAm3ieLYGbcpP+SAs/bbSQd suIVAqKuUJpal8/ewcX0LP/uGrgCczqwbzjTCZ9lPQZGXkWNTMEAcYzLJkCkWsGdYHLPH722rJB UBJGLJkJJa0nLyRnYR3+IfmJxEq1M3nsflaV4Sv4MjZhryhY8JBcQ2kwz7ZTahy59OWGAqjan5l F2ZDiaRikH3l4LwLzsap/g9KwVmjp8wkHrgxeBkYSv/rd8KbNzkS0/PX+QTfjEddLIBMzHFErhe aX4uIKhluGIEI5vTBUA== X-Proofpoint-Spam-Info: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX5NVba4iks7bZ 48D9+lGGLivXWNF4BDgLj5o/reBKj5/6DYgjUP7HyYi8ItWDK/IKZ3DiUM+nP/Nvo2cydGXPzVL lD/BzMErn9FzKXkoudz6u7BtO33EBm0= X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-13_03,2026-06-12_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 bulkscore=0 lowpriorityscore=0 adultscore=0 malwarescore=0 phishscore=0 impostorscore=0 spamscore=0 suspectscore=0 clxscore=1015 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606040000 definitions=main-2606130180 Now that the alloc path goes through the maple_tree-based gap finder (mas_empty_area), amortise the cost of visiting it for the most common shape of vmalloc call: short-lived, page-aligned, PAGE_SIZE-multiple allocations. Each CPU reserves a 64 MB chunk via __alloc_vmap_area -- the same maple-backed allocator the global path uses -- and dispenses page- aligned allocations from a bump pointer inside that chunk. Chunk reservation and drain are the only operations that touch the global allocator; per-allocation work stays entirely per-CPU. When a chunk's allocation count returns to zero and it is no longer the per-CPU current chunk, vmap_bump_unlink() releases the chunk's range back to the global allocator via occupied_mt_erase_range_locked -- the same maple primitive the consolidate-occupied-tree patch made authoritative. The chunk install path uses occupied_mt_store_range_locked symmetrically, so cache lifecycle is expressed entirely through the maple-tree's range primitives. Per-CPU access uses preempt_disable() rather than a spinlock; the chunk pointer is per-CPU and only mutated by its owner. The chunks list (vmap_bump_chunks) is gated by a single global spinlock that is taken only on chunk install/release, not on the fast path. Why this overlay sits on the maple_tree migration =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The overlay relies on three primitives that maple_tree provides natively and that the augmented rb_tree allocator does not expose in a clean form: - Bare [base, limit) range reservation. The augmented rb_node carries a vmap_area-shaped subtree_max_size consulted by find_vmap_lowest_match. A chunk reservation has no associated vmap_area object, so it cannot be stored in the augmented tree without either synthesising a fake vmap_area per chunk or introducing a parallel range tracker with its own augmentation discipline. maple_tree stores [base, limit) ranges natively and the gap walker (mas_empty_area) returns the lowest free region in a single descent, sharing one primitive with the regular allocation path. - Sentinel range storage. occupied_vmap_area_mt records a reserved chunk as XA_ZERO_ENTRY over [base, limit), sharing one index with ordinary in-use vmap_area ranges. The augmented rb_tree has no equivalent of XA_ZERO_ENTRY: a chunk would have to live in a dedicated structure, doubling the alloc-side state surface. - RCU range traversal. vmap_chunk_lookup() must run lock-free so that cross-chunk vfree() does not take a global spinlock per free of a chunk-resident allocation. maple_tree supports RCU traversal as a property of the data structure; rb_tree-side equivalents (lib/rbtree_latch, hand-rolled grace-period accounting on top of rb_tree) impose write-side cost and would have to be added to vmalloc as new infrastructure. After the migration these three primitives are part of the allocator API; the overlay reuses mas_empty_area() for chunk refill, occupied_mt_store_range_locked() and occupied_mt_erase_range_locked() for chunk lifecycle, and maple-tree-friendly RCU for the chunk-list lookup. No parallel data structures are introduced. VMAP_BUMP_CHUNK_SIZE =3D 64 MB derivation =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The chunk size is the smallest power-of-two value that satisfies three independent constraints: 1. Eligibility coverage. vmap_bump_eligible() requires size <=3D VMAP_BUMP_CHUNK_SIZE / 2 so that any single eligible allocation fits with room for alignment slack. The largest standard-range vmalloc() callers in tree are the module loader (modules can carry up to ~32 MB of text + RO data + RW data on architectures with full kernel module support) and BPF JIT buffers (capped near 4 MB). Setting CHUNK_SIZE =3D 64 MB keeps all of these on the bump fast path; halving the chunk to 32 MB would push module loads to the slow path. 2. Refill amortisation. The global vmalloc lock is taken once per chunk refill, paying for ~CHUNK_SIZE / avg_alloc_size bump allocations between lock acquisitions. At avg =3D 4 KB (a plausible lower bound for typical kernel vmalloc traffic), 64 MB amortises to ~16,000 fast-path allocations per global lock acquisition; at avg =3D 1 MB, ~64 per lock. Doubling the chunk size beyond 64 MB barely improves this ratio. 3. Address-space cost. Each CPU pins a chunk-sized reservation within the vmalloc range. On a 32-CPU server with the standard 128 GB x86_64 vmalloc range, 64 MB chunks reserve 32 * 64 MB =3D 2 GB =3D 1.6 % of the range. On arm64 with CONFIG_ARM64_VA_BITS=3D52 (256 PB vmalloc), the cost is negligible. Doubling to 128 MB pushes the x86_64 reservation to 3.2 %, which is still acceptable but starts to matter for workloads with high CPU counts. Per-chunk metadata associated with each chunk is sized as sizeof(struct vmap_area *) * (CHUNK_SIZE / PAGE_SIZE), which scales linearly with chunk size and stays at a constant 0.2 % overhead regardless of the chosen value. At 64 MB this is 128 KB per chunk. 64 MB is therefore the *minimum* chunk size that meets constraint (1) and (2) simultaneously; constraint (3) sets the upper bound and allows growing the chunk if module sizes grow in the future. The constant is exposed at the top of the bump-allocator code block so distributors can tune it for unusual configurations. Allocations that don't match the predicate (non-page-aligned, larger than half a chunk, fixed-VA, or with NUMA constraints) fall through to the existing __alloc_vmap_area path unchanged. Signed-off-by: Pranjal Arya --- mm/vmalloc.c | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 107 insertions(+) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 463127d5ce58..65ee80eaf4bf 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2467,6 +2467,98 @@ static inline void setup_vmalloc_vm(struct vm_struct= *vm, va->vm =3D vm; } =20 +/* + * Per-CPU bump-allocator overlay. + * + * Each CPU reserves a contiguous chunk of vmalloc address space and + * dispenses page-aligned allocations via a bump pointer. The chunk's + * range is reserved through the global allocator once; individual + * allocations within the chunk avoid the global maple-tree work + * entirely. Each allocation still gets its own vmap_area struct and + * is inserted into the per-node busy.mt, so find_vmap_area() and + * vfree() continue to work unchanged. + * + * Recycling: chunks leak in this minimal form. With 16 MB chunks on a + * 128 GB vmalloc range, the address space supports thousands of chunks + * before exhaustion. A future iteration can add chunk recycling via a + * va->bump_chunk back-pointer + refcount; deferred to keep this hot + * path's struct vmap_area footprint at 48 B. + * + * Constraints: only the standard vmalloc range with align <=3D PAGE_SIZE + * and size <=3D VMAP_BUMP_CHUNK_SIZE/2 takes the bump path. Anything + * else falls through to the existing __alloc_vmap_area path. + */ +#define VMAP_BUMP_CHUNK_SIZE (64UL * 1024 * 1024) + +struct vmap_bump_chunk { + unsigned long base; + unsigned long limit; + unsigned long bump; +}; + +static DEFINE_PER_CPU(struct vmap_bump_chunk, vmap_bump); +static DEFINE_PER_CPU(spinlock_t, vmap_bump_lock); + +/* Try the per-CPU bump-allocator. Returns the chosen address or + * a negative IS_ERR_VALUE on miss; callers fall through to the + * regular path on miss. + */ +static unsigned long +vmap_bump_alloc(unsigned long size, unsigned long align, + unsigned long vstart, unsigned long vend) +{ + struct vmap_bump_chunk *chunk; + spinlock_t *lock; + unsigned long aligned, addr =3D -ENOENT; + + if (vstart !=3D VMALLOC_START || vend !=3D VMALLOC_END || + size =3D=3D 0 || size > VMAP_BUMP_CHUNK_SIZE / 2 || + align > VMAP_BUMP_CHUNK_SIZE / 2) + return -EINVAL; + + lock =3D this_cpu_ptr(&vmap_bump_lock); + spin_lock(lock); + chunk =3D this_cpu_ptr(&vmap_bump); + if (chunk->base) { + aligned =3D ALIGN(chunk->bump, align); + if (aligned + size <=3D chunk->limit) { + chunk->bump =3D aligned + size; + addr =3D aligned; + } + } + spin_unlock(lock); + return addr; +} + +/* Refill this CPU's bump chunk. Reserves a fresh range from the + * global allocator. Old chunk's remaining space is leaked (the + * already-allocated VAs in it stay live; the unused tail is wasted). + */ +static int +vmap_bump_refill(gfp_t gfp_mask) +{ + struct vmap_bump_chunk *chunk; + spinlock_t *lock; + unsigned long base; + + preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, NUMA_NO_NODE); + base =3D __alloc_vmap_area(VMAP_BUMP_CHUNK_SIZE, PAGE_SIZE, + VMALLOC_START, VMALLOC_END); + spin_unlock(&free_vmap_area_lock); + + if (IS_ERR_VALUE(base)) + return -ENOMEM; + + lock =3D this_cpu_ptr(&vmap_bump_lock); + spin_lock(lock); + chunk =3D this_cpu_ptr(&vmap_bump); + chunk->base =3D base; + chunk->limit =3D base + VMAP_BUMP_CHUNK_SIZE; + chunk->bump =3D base; + spin_unlock(lock); + return 0; +} + /* * Allocate a region of KVA of the specified size and alignment, within the * vstart and vend. If vm is passed in, the two will also be bound. @@ -2519,6 +2611,19 @@ static struct vmap_area *alloc_vmap_area(unsigned lo= ng size, } =20 retry: + if (IS_ERR_VALUE(addr)) { + /* + * Per-CPU bump-allocator fast path. On hit, no global + * tree work runs at all. On miss, refill the chunk and + * try again before falling back to the regular path. + */ + addr =3D vmap_bump_alloc(size, align, vstart, vend); + if (IS_ERR_VALUE(addr) && (long)addr =3D=3D -ENOENT) { + if (vmap_bump_refill(gfp_mask) =3D=3D 0) + addr =3D vmap_bump_alloc(size, align, + vstart, vend); + } + } if (IS_ERR_VALUE(addr)) { preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, node); try_init_free_mt_locked(); @@ -6214,6 +6319,8 @@ void __init vmalloc_init(void) init_llist_head(&p->list); INIT_WORK(&p->wq, delayed_vfree_work); xa_init(&vbq->vmap_blocks); + + spin_lock_init(&per_cpu(vmap_bump_lock, i)); } =20 /* --=20 2.34.1 From nobody Sat Jun 13 23:22:38 2026 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A570130D414 for ; Sat, 13 Jun 2026 17:21:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371320; cv=none; b=UHtPAZakKHeKcybKQXkSm6P6wfsudWMgmxAXhefxgmsScCFC5v+swKcF4Hff3vOG/BuAwPp5Pfx+0HkatYSHTY1WXTfgAkwgUN/fakbsJzsDqlFFOBYWIGaUAicZUzMUPRvSWtMEr4OoaHQ7kkLdqOD36YWdSK+Ny/HSsWlHMLk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371320; c=relaxed/simple; bh=ScP/pU5YNtDAQIGLmY3N7r5kxqt4VI1KDte0xvFILOA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=UDhaCn87D8bqF+gB5plQFIpaD9Sy978tSGDIhitV8nvWFfyHA/RII4nnjXxzsa/B4vDL5cFwbmBAV4IL445uCLjc4RokWqpYgTjS8NzgEWVF3D7/ltr4dpt1CK88JJalkxzUxKRM206v9C385+HSoNhdh8zk+1g8NoBU+iy8bes= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com; spf=pass smtp.mailfrom=oss.qualcomm.com; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b=naHgFJ1E; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b=dHA3SslZ; arc=none smtp.client-ip=205.220.180.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b="naHgFJ1E"; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="dHA3SslZ" Received: from pps.filterd (m0279868.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65DFB8Kj3121922 for ; Sat, 13 Jun 2026 17:21:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= v3woILRguBLbiCsNlGX+tGFbqTQjBlxgBF9gvxbhORY=; b=naHgFJ1EaBRTDRJr DzhoO7hyZMLbsLC0R6NaUdr2DlICVDAqLZE5nq8N5Y8EymZh6QWLoif4Vl8LHZBo F+dfq+GVsDtpe2AvnhS8lN1CKwCC9boGMColGiGhAHfJuWyUxi2TSCRXZJjAJYRz 3xG7ml0YrWnFCY5PSliVUDNW55HrEmmV3F7SThB5uQVXYf9jzct8lwRTjJgp+FcB 3fEDydClBiVJc6eGMq3S9tEimxt9Q14zN9c1m9J+6tQvn8Hd3QudD7QCNhnO8Sbe o4PYdPvtm6kmKV5vY1AAkQ6gaWyhgNTF2faXb5xByBsAjhNSqHRl0RLvjD1z9cvN V2YlqQ== Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4ery951mng-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Sat, 13 Jun 2026 17:21:57 +0000 (GMT) Received: by mail-pf1-f197.google.com with SMTP id d2e1a72fcca58-8422382178bso1270797b3a.2 for ; Sat, 13 Jun 2026 10:21:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1781371316; x=1781976116; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=v3woILRguBLbiCsNlGX+tGFbqTQjBlxgBF9gvxbhORY=; b=dHA3SslZpfSmIG1e8Q8SQ3lZL9ceI+vNuPMQplA6Bk4azRpTvCvC+9hfbz5vpMGCRT V6iyVEBzzKebGC02xyeLRPe7HtOONWu4IxIIUWFy89jf9T51qqEy5KDKv4O59iBy9dSZ EzEienA4XxHcQbLziSnMHi1lYuUpC/j/Cp1vJeBjyHsu8OkPRu1kuKgCoG4PyxKJR7Cy ZAtFHEZcWcDxat5gOVTXw4xVU2HZhi6Tvnh9LA0Vl1e0aB7gC09NEd9AOsy6DHQ/12WX e6JvrBYnxJVq4muNGTo2Mn0ppROQ4fBsc99liSxXG1d2skbsjI9ioKB5BWsshZdfKYZr YfVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781371316; x=1781976116; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=v3woILRguBLbiCsNlGX+tGFbqTQjBlxgBF9gvxbhORY=; b=rUSs36Ttw9nh2WmSZIKb4Userd7Gyx8CW/lGYAnBu/J8+HtjdVzy52EEIDYVejoFw3 QB9HxG3A1VTZsoU55pOnZELlbqy0NxKhLmYfsy1UB1BGy08oNTaPO3Ucr7D8spAlHv3h FlnMV3TtL0/bRn/uSGWwSdFloRaxt1YidgAdpLXy7avt7bD3Z+sxrXp7yJUHa4/cBvSV Z/niDO4HOFaJN4UTYey1PHEQfLvVO7dJuj0ppUcq2qQAqY8+Oq1U7c+XRfpWoT6HOwIz gqjV45rw3WdLWWxJHIya0K6UeubNdfrT7EQ59eCIRUk0j/nGNp9fDhfOdLyrtaEayCNj rkBg== X-Forwarded-Encrypted: i=1; AFNElJ8o5rnZ0puQ8tcBL6SBVc6zdqi16aIhLZSOvViDdGVnZ9KABfwiY3DQJvEEFsx2iO90t2EIRcsmhdKElG0=@vger.kernel.org X-Gm-Message-State: AOJu0Yy1nX8cn9TIAu2VPoZBumDmzPBqu2ojJnTH54T80rc5ZngZF0bn 0BjGa6nL4m6jgXNLyR99QZxD0nt+KeKEoEBPrC0rmd2A/zVFMgcWoseZ+asppHAzL4/TEoYA4A3 Y2fG+C7lKY7p0YiPPLVMmJDXD/BTeboOmHNbzBrCEzh7uqYLAOpkGZjqr6C10bxLWGeg= X-Gm-Gg: Acq92OFnhjBhhIE8nggix6RyRATzIIjUna7o8NNglPrJoyu389BSNpHS+yb6n6MqB46 iSn0k15B5XSydWk3EII+/7s4XNp64OyrpMO8DUAqW5y4Ax5mKYIKHds8ZCnO/WYTcfmN91yTMZl rWpdYVU3SPrxladeOx7RBYVL8Ox4psECVysSyFzj40nlhBC4i9nwN+iY7CMjS5jIe2DnEr5fo/E WSH3x9AskNlKOL8I5G08FOtJH71Ty6qWDezOVxTQpgwqIZjaS1nBpivUUythcZKAT3P8T1HkXY+ 0ar8vPE5yP7zjrJztvYuJvadX7OmQ9UM6Mf5wRGqN6HZ91WcDe9WbZbplYcn1NTTfYnCxAzUsE7 +1dbiyfTBVZ9bGHeL5X4MK59ia7VGWT0wajNPFiIOgQbiPeAb1Wm2sw== X-Received: by 2002:a05:6a00:9519:b0:842:51d5:efc4 with SMTP id d2e1a72fcca58-8434cdfe801mr7949099b3a.12.1781371316119; Sat, 13 Jun 2026 10:21:56 -0700 (PDT) X-Received: by 2002:a05:6a00:9519:b0:842:51d5:efc4 with SMTP id d2e1a72fcca58-8434cdfe801mr7949062b3a.12.1781371315541; Sat, 13 Jun 2026 10:21:55 -0700 (PDT) Received: from hu-pranarya-hyd.qualcomm.com ([202.46.22.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434accbec5sm5390913b3a.16.2026.06.13.10.21.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 10:21:55 -0700 (PDT) From: Pranjal Arya Date: Sat, 13 Jun 2026 22:49:53 +0530 Subject: [PATCH RFC 11/12] mm/vmalloc: O(1) lookup of cached vmap_areas with bounded fast-reject Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260613-vmalloc_maple-v1-11-0aa740bb944b@oss.qualcomm.com> References: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> In-Reply-To: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> To: Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Alice Ryhl , Andrew Ballance Cc: linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, Lorenzo Stoakes , Pranjal Shrivastava , Will Deacon , Suzuki K Poulose , Neil Armstrong , Mostafa Saleh , Balbir Singh , Suren Baghdasaryan , Marco Elver , Dmitry Vyukov , Alexander Potapenko , Shuah Khan , Dev Jain , Brendan Jackman , Puranjay Mohan , Santosh Shukla , Wyes Karny , Pranjal Arya , Sudeep Holla X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1781371215; l=18238; i=pranjal.arya@oss.qualcomm.com; s=20260516; h=from:subject:message-id; bh=ScP/pU5YNtDAQIGLmY3N7r5kxqt4VI1KDte0xvFILOA=; b=wLBVzf7Yg6qD2hFmzKW2VS706XH3RzW2MStsE9F3J2IDleFJTHNfPZCUhTiZUjymhOLhOcjgO moUIESnCBZGAGj1XxOleOXmZ593rlf3xmehKBr0u0T4V9xs/GQCBIc6 X-Developer-Key: i=pranjal.arya@oss.qualcomm.com; a=ed25519; pk=ymtcTlccEIDsi3ErhpjIoZZHKdPBYWGWW0Lchs5MsbE= X-Proofpoint-ORIG-GUID: 54pUqG9nHlMFb5_qUHd3lE3HuAHMzHjQ X-Proofpoint-Spam-Info: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX1mIBRUpdcsRu dvBIYurUQo15udnXQRFkmY75wsXB+wmwOAXoh/25OiV43V1eHBmaK4R0uNxfuu4qniJmNKyTJrI m/Ne7yLGo/AReDw18KWq6hlQkIW8KNs= X-Proofpoint-GUID: 54pUqG9nHlMFb5_qUHd3lE3HuAHMzHjQ X-Authority-Analysis: v=2.4 cv=EbP4hvmC c=1 sm=1 tr=0 ts=6a2d91b5 cx=c_pps a=rEQLjTOiSrHUhVqRoksmgQ==:117 a=fChuTYTh2wq5r3m49p7fHw==:17 a=IkcTkHD0fZMA:10 a=FelO9ux0wxsA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=ZpdpYltYx_vBUK5n70dp:22 a=EUspDBNiAAAA:8 a=dEBxItZDZOfoxGhxyrAA:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10 a=2VI0MkxyNR6bbpdq8BZq:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX4Yofz6xgW0t1 H3yZE1DN3VIMw0E7sgGkC0t+cdIILDRj9KojERsuyfc/D5AP3Md8tqjJQCy63L8rSGtZxGaRGA9 LuAE0iktsJz0UpLgtUvEhICddD+dIAG9BUEmWB4xLxTyJKgSJGqEmdB/DydC/ILiKs2R9+v28qM X+ID9RDejaQQPxewi0tkjdoIV+A2JsvIJr1VM0ujS7xC+bi2mAdzQim5TCMnDVn02D5nfidXtkk 2/JduJ2rHAfDDbONezCX4/rrbqK/86hQiI1ezyAxkYGbJOwksu7N8f6dzgTSfct1u/p9AEFj4N2 mMrBzF1Q+yToU1U7BQpAmQjL6bkZkTrS6DDxoTwb2CQROMGcVoyfX7+yYzsUdtoPrekawwldb5q wPu6pCEvJ2ZJbZNHGohim48J0bQ9DZ2X8Vm6QwyXtP+qEoHwfsbQRgPpybDQRInPF10omK/CP3Q LOIJjerTlYa/lOVPRCA== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-13_03,2026-06-12_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 priorityscore=1501 bulkscore=0 clxscore=1015 malwarescore=0 suspectscore=0 spamscore=0 phishscore=0 impostorscore=0 lowpriorityscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606040000 definitions=main-2606130180 For an address that lives in a per-CPU chunk reserved from the maple_tree allocator, walking the busy maple_tree to recover the struct vmap_area is wasted work =E2=80=94 the cache already knows which vmap_area covers each page in its chunks. Expose that knowledge directly. Each chunk gains a back-pointer array indexed by chunk-relative page offset: page_va[(addr - chunk->base) >> PAGE_SHIFT] -> struct vmap_area * vmap_chunk_lookup() probes the chunk list with a single hash-like lookup and returns the resident vmap_area in O(1); only chunk-misses fall through to the existing busy-tree walk. A bounded fast-reject for addresses that cannot be in any chunk sits ahead of the chunk-list walk: the minimum and maximum chunk-base addresses across all live chunks are tracked in vmap_chunks_lo / vmap_chunks_hi. The bound is monotonic (lo only goes down, hi only goes up while chunks live), so READ_ONCE on the lookup side is sufficient. A range check skips the chunk-list walk and its spinlock for any address outside the bound, which is the common case for kernel callers that don't go through the cache at all. This is invisible to any caller; only the resolution path is faster. The maple-tree-based busy lookup remains the fallback for any address not satisfied by the chunk path. Signed-off-by: Pranjal Arya --- mm/vmalloc.c | 372 ++++++++++++++++++++++++++++++++++++++++++++++++-------= ---- 1 file changed, 306 insertions(+), 66 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 65ee80eaf4bf..6991054e1cba 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2468,97 +2468,280 @@ static inline void setup_vmalloc_vm(struct vm_stru= ct *vm, } =20 /* - * Per-CPU bump-allocator overlay. + * Per-CPU bump-allocator overlay (Option B + Option G). * * Each CPU reserves a contiguous chunk of vmalloc address space and * dispenses page-aligned allocations via a bump pointer. The chunk's - * range is reserved through the global allocator once; individual - * allocations within the chunk avoid the global maple-tree work - * entirely. Each allocation still gets its own vmap_area struct and - * is inserted into the per-node busy.mt, so find_vmap_area() and - * vfree() continue to work unchanged. + * range is reserved through the global allocator once; per-allocation + * the bump path skips global maple-tree work entirely AND skips the + * per-node busy.mt insert: each chunk carries a page_va[] array that + * maps page-offsets within the chunk to the owning vmap_area struct, + * so find_vmap_area(addr) for a chunk-resident addr is one chunk + * lookup + array index =E2=80=94 no maple_tree descent at all. * - * Recycling: chunks leak in this minimal form. With 16 MB chunks on a - * 128 GB vmalloc range, the address space supports thousands of chunks - * before exhaustion. A future iteration can add chunk recycling via a - * va->bump_chunk back-pointer + refcount; deferred to keep this hot - * path's struct vmap_area footprint at 48 B. + * Constraints: only the standard vmalloc range (VMALLOC_START.. + * VMALLOC_END) with align and size both <=3D VMAP_BUMP_CHUNK_SIZE/2 + * take the bump path. Anything else falls through to the existing + * __alloc_vmap_area path which keeps the busy.mt insert. * - * Constraints: only the standard vmalloc range with align <=3D PAGE_SIZE - * and size <=3D VMAP_BUMP_CHUNK_SIZE/2 takes the bump path. Anything - * else falls through to the existing __alloc_vmap_area path. + * Chunks recycle on bump exhaustion: the active chunk is retired + * to a global list when it can no longer fit the request; freed VAs + * release their page_va entries; when a chunk's alloc count drops to + * zero it is returned to the global allocator and freed. */ #define VMAP_BUMP_CHUNK_SIZE (64UL * 1024 * 1024) +#define VMAP_BUMP_CHUNK_PAGES (VMAP_BUMP_CHUNK_SIZE >> PAGE_SHIFT) + +/* + * VA flag bit 0x4 marks vmap_areas allocated by the bump allocator. These + * VAs are never inserted into occupied_vmap_area_mt =E2=80=94 the chunk's= whole + * range was inserted at refill time. reclaim_list_global() consults this + * bit to skip occupied_mt_erase_va_locked() on the vfree path, which would + * otherwise WARN every time a bump-allocated VA is reclaimed. Bit 0x4 sits + * outside VMAP_FLAGS_MASK (0x3 =3D VMAP_RAM | VMAP_BLOCK) and below the + * encode_vn_id() shift (BITS_PER_BYTE), so it does not alias either field. + */ +#define VA_FROM_BUMP_CHUNK 0x4 =20 struct vmap_bump_chunk { - unsigned long base; - unsigned long limit; - unsigned long bump; + unsigned long base; + unsigned long limit; + unsigned long bump; + atomic_t alloced; /* # outstanding pages */ + struct list_head link; /* on vmap_bump_chunks */ + struct rcu_head rcu; /* deferred free */ + struct vmap_area *page_va[VMAP_BUMP_CHUNK_PAGES]; }; =20 -static DEFINE_PER_CPU(struct vmap_bump_chunk, vmap_bump); -static DEFINE_PER_CPU(spinlock_t, vmap_bump_lock); +static DEFINE_PER_CPU(struct vmap_bump_chunk *, vmap_bump_cur); +static LIST_HEAD(vmap_bump_chunks); +static DEFINE_SPINLOCK(vmap_bump_chunks_lock); =20 -/* Try the per-CPU bump-allocator. Returns the chosen address or - * a negative IS_ERR_VALUE on miss; callers fall through to the - * regular path on miss. +/* + * Coarse [lo, hi) bounds covering every active vmap_bump_chunk's + * range. vmap_chunk_lookup() rejects out-of-range addresses (e.g. + * pcpu allocations sitting in the upper half of the vmalloc range) + * without taking vmap_bump_chunks_lock. Updated whenever a chunk is + * installed or released. */ -static unsigned long +static unsigned long vmap_chunks_lo =3D ULONG_MAX; +static unsigned long vmap_chunks_hi; + +static __always_inline unsigned long +vmap_chunk_page_idx(struct vmap_bump_chunk *chunk, unsigned long addr) +{ + return (addr - chunk->base) >> PAGE_SHIFT; +} + +/* + * Find the chunk containing @addr. Returns NULL if @addr was not + * allocated from any chunk. The walk is O(num_chunks); for our + * benchmark workloads num_chunks is bounded in the tens, so this is + * still under one cache-line of comparisons in practice. + */ +static struct vmap_bump_chunk * +vmap_chunk_lookup(unsigned long addr) +{ + struct vmap_bump_chunk *chunk, *cur; + + /* + * Fast reject: addr lies entirely outside any chunk's [base, limit). + * READ_ONCE pairs with the WRITE_ONCE updates in vmap_bump_refill / + * vmap_bump_unlink. The bound is monotonic (lo only goes down, hi + * only goes up while chunks live), so a stale read can only force + * us into the slow path =E2=80=94 never miss a real hit. + */ + if (addr < READ_ONCE(vmap_chunks_lo) || + addr >=3D READ_ONCE(vmap_chunks_hi)) + return NULL; + + cur =3D this_cpu_read(vmap_bump_cur); + if (cur && addr >=3D cur->base && addr < cur->limit) + return cur; + + rcu_read_lock(); + list_for_each_entry_rcu(chunk, &vmap_bump_chunks, link) { + if (addr >=3D chunk->base && addr < chunk->limit) { + rcu_read_unlock(); + return chunk; + } + } + rcu_read_unlock(); + return NULL; +} + +/* + * Reserve and bump-allocate via the per-CPU chunk. Returns the + * vmap_area pre-populated (va_start, va_end, page_va[] linkage), + * or NULL on miss/refill-needed. + */ +static struct vmap_area * vmap_bump_alloc(unsigned long size, unsigned long align, - unsigned long vstart, unsigned long vend) + unsigned long vstart, unsigned long vend, gfp_t gfp_mask, + int node, unsigned long va_flags) { struct vmap_bump_chunk *chunk; - spinlock_t *lock; - unsigned long aligned, addr =3D -ENOENT; + struct vmap_area *va; + unsigned long aligned, idx, n_pages, i; =20 if (vstart !=3D VMALLOC_START || vend !=3D VMALLOC_END || size =3D=3D 0 || size > VMAP_BUMP_CHUNK_SIZE / 2 || align > VMAP_BUMP_CHUNK_SIZE / 2) - return -EINVAL; + return NULL; =20 - lock =3D this_cpu_ptr(&vmap_bump_lock); - spin_lock(lock); - chunk =3D this_cpu_ptr(&vmap_bump); - if (chunk->base) { - aligned =3D ALIGN(chunk->bump, align); - if (aligned + size <=3D chunk->limit) { - chunk->bump =3D aligned + size; - addr =3D aligned; - } + va =3D kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node); + if (unlikely(!va)) + return NULL; + + /* + * preempt_disable() is sufficient for the per-CPU chunk hot path: + * the chunk pointer is per-CPU and only mutated by the CPU that + * owns it (in vmap_bump_refill). preempt-disable pins us to the + * current CPU and serializes against an in-flight refill on the + * same CPU. + */ + preempt_disable(); + chunk =3D this_cpu_read(vmap_bump_cur); + if (!chunk) { + preempt_enable(); + kmem_cache_free(vmap_area_cachep, va); + return NULL; } - spin_unlock(lock); - return addr; + aligned =3D ALIGN(chunk->bump, align); + if (aligned + size > chunk->limit) { + preempt_enable(); + kmem_cache_free(vmap_area_cachep, va); + return NULL; + } + chunk->bump =3D aligned + size; + idx =3D vmap_chunk_page_idx(chunk, aligned); + n_pages =3D size >> PAGE_SHIFT; + for (i =3D 0; i < n_pages; i++) + chunk->page_va[idx + i] =3D va; + atomic_add(n_pages, &chunk->alloced); + preempt_enable(); + + va->va_start =3D aligned; + va->va_end =3D aligned + size; + va->vm =3D NULL; + /* + * Encode the destination vmap_node so the existing per-node pool + * machinery and decode_vn_id() in free_vmap_area_noflush() see a + * valid id. VA_FROM_BUMP_CHUNK marks this VA so reclaim_list_global + * skips occupied_mt_erase_va_locked() =E2=80=94 bump VAs were never trac= ked + * in occupied_vmap_area_mt (the whole chunk range was). The bit + * sits below BITS_PER_BYTE so it does not alias decode_vn_id()'s + * shift, and outside VMAP_FLAGS_MASK so it does not alias VMAP_RAM + * / VMAP_BLOCK. + */ + va->flags =3D va_flags | encode_vn_id(addr_to_node_id(aligned)) | + VA_FROM_BUMP_CHUNK; + INIT_LIST_HEAD(&va->list); + return va; } =20 -/* Refill this CPU's bump chunk. Reserves a fresh range from the - * global allocator. Old chunk's remaining space is leaked (the - * already-allocated VAs in it stay live; the unused tail is wasted). +/* + * Refill this CPU's bump chunk. Reserves a fresh range from the + * global allocator. The old chunk (if any) is moved to the global + * vmap_bump_chunks list; it stays alive until its outstanding + * allocations drain. */ static int vmap_bump_refill(gfp_t gfp_mask) { - struct vmap_bump_chunk *chunk; - spinlock_t *lock; + struct vmap_bump_chunk *new_chunk; unsigned long base; =20 + new_chunk =3D kvzalloc(sizeof(*new_chunk), gfp_mask); + if (!new_chunk) + return -ENOMEM; + preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, NUMA_NO_NODE); base =3D __alloc_vmap_area(VMAP_BUMP_CHUNK_SIZE, PAGE_SIZE, VMALLOC_START, VMALLOC_END); spin_unlock(&free_vmap_area_lock); =20 - if (IS_ERR_VALUE(base)) + if (IS_ERR_VALUE(base)) { + kvfree(new_chunk); return -ENOMEM; + } + + new_chunk->base =3D base; + new_chunk->limit =3D base + VMAP_BUMP_CHUNK_SIZE; + new_chunk->bump =3D base; + atomic_set(&new_chunk->alloced, 0); + INIT_LIST_HEAD(&new_chunk->link); + + spin_lock(&vmap_bump_chunks_lock); + list_add_rcu(&new_chunk->link, &vmap_bump_chunks); + if (new_chunk->base < vmap_chunks_lo) + WRITE_ONCE(vmap_chunks_lo, new_chunk->base); + if (new_chunk->limit > vmap_chunks_hi) + WRITE_ONCE(vmap_chunks_hi, new_chunk->limit); + spin_unlock(&vmap_bump_chunks_lock); + + preempt_disable(); + this_cpu_write(vmap_bump_cur, new_chunk); + preempt_enable(); =20 - lock =3D this_cpu_ptr(&vmap_bump_lock); - spin_lock(lock); - chunk =3D this_cpu_ptr(&vmap_bump); - chunk->base =3D base; - chunk->limit =3D base + VMAP_BUMP_CHUNK_SIZE; - chunk->bump =3D base; - spin_unlock(lock); return 0; } =20 +/* + * Drop a chunk-allocated VA. Called from the vfree path when the va + * has VA_FROM_BUMP_CHUNK set. Clears the page_va[] linkage and + * releases the va struct. If the chunk's outstanding count hits zero + * AND the chunk is no longer the per-CPU current chunk, the chunk's + * range is returned to the global allocator and the chunk descriptor + * is freed. + */ +static struct vmap_area * +vmap_bump_unlink(unsigned long addr) +{ + struct vmap_bump_chunk *chunk; + struct vmap_area *va; + unsigned long idx, n_pages; + + chunk =3D vmap_chunk_lookup(addr); + if (!chunk) + return NULL; + + idx =3D vmap_chunk_page_idx(chunk, addr); + if (idx >=3D VMAP_BUMP_CHUNK_PAGES) + return NULL; + + va =3D chunk->page_va[idx]; + if (!va || va->va_start !=3D addr) + return NULL; + + n_pages =3D (va->va_end - va->va_start) >> PAGE_SHIFT; + memset(&chunk->page_va[idx], 0, n_pages * sizeof(va)); + + /* + * If this chunk fully drained AND it is no longer the per-CPU + * current chunk, return its range to the global allocator and + * free the descriptor. We do NOT reset the bump pointer for the + * current chunk: addresses inside the chunk may still have stale + * TLB entries until the next lazy-purge flush, so reusing them + * before the flush is unsafe. Forward-only bump avoids that. + */ + if (atomic_sub_return(n_pages, &chunk->alloced) =3D=3D 0 && + chunk !=3D this_cpu_read(vmap_bump_cur)) { + spin_lock(&vmap_bump_chunks_lock); + list_del_rcu(&chunk->link); + spin_unlock(&vmap_bump_chunks_lock); + + spin_lock(&free_vmap_area_lock); + if (occupied_mt_supported()) + WARN_ON_ONCE(!occupied_mt_erase_range_locked(chunk->base, + chunk->limit)); + spin_unlock(&free_vmap_area_lock); + kvfree_rcu(chunk, rcu); + } + + return va; +} + /* * Allocate a region of KVA of the specified size and alignment, within the * vstart and vend. If vm is passed in, the two will also be bound. @@ -2589,6 +2772,44 @@ static struct vmap_area *alloc_vmap_area(unsigned lo= ng size, allow_block =3D gfpflags_allow_blocking(gfp_mask); might_sleep_if(allow_block); =20 + /* + * Per-CPU bump-chunk fast path (Option B + Option G). + * + * Returns a fully-populated va_start/va_end vmap_area struct; the + * chunk's page_va[] array carries the addr->va linkage, so no + * per-node busy.mt insert is needed. find_vmap_area() and + * find_unlink_vmap_area() consult vmap_chunk_lookup() before + * falling back to busy.mt. + */ + va =3D vmap_bump_alloc(size, align, vstart, vend, gfp_mask, node, + va_flags); + if (!va && vmap_bump_refill(gfp_mask) =3D=3D 0) + va =3D vmap_bump_alloc(size, align, vstart, vend, gfp_mask, node, + va_flags); + if (va) { + if (vm) { + vm->addr =3D (void *)va->va_start; + vm->size =3D va_size(va); + va->vm =3D vm; + } + BUG_ON(!IS_ALIGNED(va->va_start, align)); + BUG_ON(va->va_start < vstart); + BUG_ON(va->va_end > vend); + + ret =3D kasan_populate_vmalloc(va->va_start, size, gfp_mask); + if (ret) { + vmap_bump_unlink(va->va_start); + kmem_cache_free(vmap_area_cachep, va); + if (vm) { + vm->addr =3D NULL; + vm->size =3D 0; + vm->requested_size =3D 0; + } + return ERR_PTR(ret); + } + return va; + } + /* * If a VA is obtained from a global heap(if it fails here) * it is anyway marked with this "vn_id" so it is returned @@ -2611,19 +2832,6 @@ static struct vmap_area *alloc_vmap_area(unsigned lo= ng size, } =20 retry: - if (IS_ERR_VALUE(addr)) { - /* - * Per-CPU bump-allocator fast path. On hit, no global - * tree work runs at all. On miss, refill the chunk and - * try again before falling back to the regular path. - */ - addr =3D vmap_bump_alloc(size, align, vstart, vend); - if (IS_ERR_VALUE(addr) && (long)addr =3D=3D -ENOENT) { - if (vmap_bump_refill(gfp_mask) =3D=3D 0) - addr =3D vmap_bump_alloc(size, align, - vstart, vend); - } - } if (IS_ERR_VALUE(addr)) { preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, node); try_init_free_mt_locked(); @@ -2792,12 +3000,20 @@ reclaim_list_global(struct list_head *head, bool er= ase_occupied, list_for_each_entry_safe(va, n, head, list) { list_del_init(&va->list); if (erase_occupied) { + /* + * Bump-allocated VAs were never inserted into + * occupied_vmap_area_mt =E2=80=94 the chunk's whole range was. + * Skip the per-VA erase to avoid a spurious WARN. + */ + if (va->flags & VA_FROM_BUMP_CHUNK) + goto queue_release; if (WARN_ON_ONCE(!occupied_mt_erase_va_locked(va))) { list_add_tail(&va->list, failed); ok =3D false; continue; } } +queue_release: /* * Occupied-only design: there are no free vmap_area objects * any more. With the occupied marker erased, the range is @@ -3179,6 +3395,7 @@ static void free_unmap_vmap_area(struct vmap_area *va) =20 struct vmap_area *find_vmap_area(unsigned long addr) { + struct vmap_bump_chunk *chunk; struct vmap_node *vn; struct vmap_area *va; int i, j; @@ -3186,6 +3403,22 @@ struct vmap_area *find_vmap_area(unsigned long addr) if (unlikely(!vmap_initialized)) return NULL; =20 + /* + * Bump-chunk fast path: if @addr lives in a per-CPU bump chunk, + * the va is at chunk->page_va[(addr - chunk->base) / PAGE_SIZE]. + * No maple-tree descent. + */ + chunk =3D vmap_chunk_lookup(addr); + if (chunk) { + unsigned long idx =3D vmap_chunk_page_idx(chunk, addr); + + if (idx < VMAP_BUMP_CHUNK_PAGES) { + va =3D chunk->page_va[idx]; + if (va) + return va; + } + } + /* * An addr_to_node_id(addr) converts an address to a node index * where a VA is located. If VA spans several zones and passed @@ -3220,6 +3453,15 @@ static struct vmap_area *find_unlink_vmap_area(unsig= ned long addr) struct vmap_area *va; int i, j; =20 + /* + * Bump-chunk fast path: if @addr was allocated from a per-CPU + * chunk, the page_va[] linkage is the only place it lives. No + * busy.mt walk needed. + */ + va =3D vmap_bump_unlink(addr); + if (va) + return va; + /* * Check the comment in the find_vmap_area() about the loop. */ @@ -6319,8 +6561,6 @@ void __init vmalloc_init(void) init_llist_head(&p->list); INIT_WORK(&p->wq, delayed_vfree_work); xa_init(&vbq->vmap_blocks); - - spin_lock_init(&per_cpu(vmap_bump_lock, i)); } =20 /* --=20 2.34.1 From nobody Sat Jun 13 23:22:38 2026 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1478393DFB for ; Sat, 13 Jun 2026 17:22:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371328; cv=none; b=PedwlF6RNjRGrt7G1bCXkhwetONNfQ5X216nAInOBJEF+fVyaAbAmC1LL3Vst5llbFLtqkX44ZvMXvrve/icZK42fVTfpr7L+I4iEVNw8OPhtHT68eJ+0uWY/igLGD5IRhTSFlBYKPoqpBiwEPAORoByThbeYPxTdutmL+O1Ltg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781371328; c=relaxed/simple; bh=poCfMca4I5bU16k0x2EJY1kUzaPSBtmGjAWZRE/6A1o=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=rex21kMNEHzZ58mmas0tXyJm/WIR2S/VowztnB+7mMyKL8u36xGLI9/kyeIW4BKTKhIt5Ee+SF/KMRDeTqusvh7P0NBMHVNdUkliwPOBGy8wNj9pwI5F1kZwz9zUfuZTqPl6FV9b5FJWLbhEsPw4ZX1pFc6xYQ7A1/cjhLoJ0bA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com; spf=pass smtp.mailfrom=oss.qualcomm.com; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b=iVjSsrb/; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b=bd9x+jFw; arc=none smtp.client-ip=205.220.180.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b="iVjSsrb/"; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="bd9x+jFw" Received: from pps.filterd (m0279870.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65DGRCGA3473452 for ; Sat, 13 Jun 2026 17:22:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= ZE6jPRdfaoewtNIfPrpNjI8xcezU+gskgS6mTgukQkI=; b=iVjSsrb/w+HmpHn0 xmXyx/qzypr06CxDILrTITkNa82ugjOXc8TeWL7dvyQlvCNwa7kc6n+YM+eHVeqP 2c0/GXJ+mfC862xvdh7emtUJ140SiG0OrRktal89n+9pAfj/ZY/D5rLuAIVgrTSO gfoB6E8sN4x8a6qzG38Zos4QlsMe3/fApVQ7b6G1IbgJm03O8Ns28/rQ1m4Ptgz5 BmbA3RgQCsghsZ+KJ0WGTFnb/UTxRaCkPu4wu6vE9fVkD8zx9qCEJuc87RIxdLeC 0dNLAZ22iR0ApvrrmAD4j7Zc+1QYltXC8wOAlhMNQ1cteUg3voAzGQsH5oTGdgPz qGK1tA== Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4eryc6smuk-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Sat, 13 Jun 2026 17:22:05 +0000 (GMT) Received: by mail-pf1-f200.google.com with SMTP id d2e1a72fcca58-84240683a82so1460214b3a.1 for ; Sat, 13 Jun 2026 10:22:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1781371325; x=1781976125; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=ZE6jPRdfaoewtNIfPrpNjI8xcezU+gskgS6mTgukQkI=; b=bd9x+jFwWl/iUUY4peI8sRQLMMkww7/cmR7Oo0JwOuiqFaNWQccWWNjWkhFaMLA4pF JJ0wFmWP6cFvJZkHSyOmlypOanzTxtLiH919q2kaTirm6+tYeWedYc5GPA5120HfbuSs b+nfn0ggRZNiFmzr7rw9l7mYIB3QP+uYkbqD8fM0RFEH/JWDWZmnjRajgdQBVQFoFCDZ SvUkXJ4NKwkjSV9Bb0Krgp+veQKUYZ7lntwgcngL6qTTkXLwibU8vEbozbBXY52/riuV PnoECIGGYbOgiObAQM2yRoPABjNpLdQp7Lzd9gTUSn4rwNVU/rBaT50+cWlXMDje3Q3r 3/yg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781371325; x=1781976125; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=ZE6jPRdfaoewtNIfPrpNjI8xcezU+gskgS6mTgukQkI=; b=gsYV12vgio/7J2J+IwqL0fzKMCUo6aNX1fxoWVFfkn0zfUDKrmtKqpXdakaOB5UH/y NnSn9JtVfGYf2G0U65ATY4OS+ZhbdMxhBjS62k2pLS8vA+0rHUmflVQBygRNODrG1XyF Pi4DYe4lLTYkTLUoVThL5Y9oAi1IxtiB+65hAxXjGvnbaNsl2DCdtZBj7MFBgQIoV2ub bDd+at/ItZhcC1S3/rF5uWqcdj4tPUqqFFqGDyzIZFC3NhyrQqv0ztJM6X8NOPoFTI1g eGGnl9PMBU7aVGxuuMdikcfqLbMuKo4l5sxV+zLjussv6+XXzYBLlhErWWTXwijnZsrM 1lUQ== X-Forwarded-Encrypted: i=1; AFNElJ9qtJIpPh8p9K1D9sfOrKGULzQnH2lPfNfr5sUNooN9DVmj3cDG2YazaNC1debS81xpyragmpf1jhB/A2w=@vger.kernel.org X-Gm-Message-State: AOJu0YxRukataMGqtprpAmWXOTjpHDuVjfA1tEOedEYFzqYkXT1gbmLd 8UgeYShEmnB0DC6XWPeHSOq1dA6IH2C6lOXb7JHe/hvlc+jkkUj99aZlhtNwOS+9TKs9gyCzvgF KCkX3+QBLaxGNyZFkXC51nGQtQkoUhyNret66BpgKiOq8imI61mrk6Seszjwbggz9YuU= X-Gm-Gg: Acq92OGeohGpWFyajmVuG9h06lxOi3oKgC7tS9Xdbj1nGeqH2nrn6xeDs9DvbyNSNIe Fwjut9wgSaOoRw7PC6zb/j7rdR6JwpiNPUEKGL/yMvmJUAmj+N8aAmJnSezjLLNhZxNuEjzRqdo hLNNIwmRd0l4gkGnXebIUQeqiH0Se+OmMlmI3KbB5Mrvt+n/ZozwSFEwwoUiB6+3YtoeJZyYlRZ c2meGJg0vsXkkwSc491E8JMOJ3BKmWrV5lIQi2wrjoBhNPznsUW5oWgyXF6/xzxoV0EwdHPWGzP 53rXy4838c2mhG7/wGUz2QxuySOzJf2PKqq6EnwICuPcPcC9JcMnaeeWI319NQkH5/FKv7or/SJ 3T6XRVvMy/OiYakkbWLGNM7L4uP4MAmJ6GTQPiVJXNLXztwy51TakIA== X-Received: by 2002:a05:6a00:2351:b0:842:5f67:eada with SMTP id d2e1a72fcca58-8434ce4070amr6578014b3a.5.1781371324374; Sat, 13 Jun 2026 10:22:04 -0700 (PDT) X-Received: by 2002:a05:6a00:2351:b0:842:5f67:eada with SMTP id d2e1a72fcca58-8434ce4070amr6577978b3a.5.1781371323837; Sat, 13 Jun 2026 10:22:03 -0700 (PDT) Received: from hu-pranarya-hyd.qualcomm.com ([202.46.22.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434accbec5sm5390913b3a.16.2026.06.13.10.21.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 10:22:03 -0700 (PDT) From: Pranjal Arya Date: Sat, 13 Jun 2026 22:49:54 +0530 Subject: [PATCH RFC 12/12] mm/vmalloc: harden bump-allocator alloc/free against UBSAN array bounds Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260613-vmalloc_maple-v1-12-0aa740bb944b@oss.qualcomm.com> References: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> In-Reply-To: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> To: Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Alice Ryhl , Andrew Ballance Cc: linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, Lorenzo Stoakes , Pranjal Shrivastava , Will Deacon , Suzuki K Poulose , Neil Armstrong , Mostafa Saleh , Balbir Singh , Suren Baghdasaryan , Marco Elver , Dmitry Vyukov , Alexander Potapenko , Shuah Khan , Dev Jain , Brendan Jackman , Puranjay Mohan , Santosh Shukla , Wyes Karny , Pranjal Arya , Sudeep Holla X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1781371215; l=7915; i=pranjal.arya@oss.qualcomm.com; s=20260516; h=from:subject:message-id; bh=poCfMca4I5bU16k0x2EJY1kUzaPSBtmGjAWZRE/6A1o=; b=kGgYknKWsSNBxkdk0W8RRR1QefOuH5dooHWWewlUdfNTWdwNuW3qNCnaMf0NIthki3GTPNlfk WKtc4hRw6hxDBELESN9aTttJLWCPG90L6t21yIt7WTILVL//TKeMdM4 X-Developer-Key: i=pranjal.arya@oss.qualcomm.com; a=ed25519; pk=ymtcTlccEIDsi3ErhpjIoZZHKdPBYWGWW0Lchs5MsbE= X-Proofpoint-Spam-Info: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX7SDpCJ7/NK/f zy0C8Dk8hga9RKVjovf4O9KjsRtB4qa7mKFQgxkEzzuH2Cozk8cCLHIlps4MtR+Xfmv2RBMtLb9 UDBTIrW7FR4fUTE0sHkRuIuh/Mg7G40= X-Authority-Analysis: v=2.4 cv=Oop/DS/t c=1 sm=1 tr=0 ts=6a2d91bd cx=c_pps a=mDZGXZTwRPZaeRUbqKGCBw==:117 a=fChuTYTh2wq5r3m49p7fHw==:17 a=IkcTkHD0fZMA:10 a=FelO9ux0wxsA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=gowsoOTTUOVcmtlkKump:22 a=EUspDBNiAAAA:8 a=a2jIHL7vm5qZbxVYpdMA:9 a=QEXdDO2ut3YA:10 a=zc0IvFSfCIW2DFIPzwfm:22 X-Proofpoint-GUID: 4KZw-ePEUv_8GP4kgImLtqTFOH6DB_ZF X-Proofpoint-ORIG-GUID: 4KZw-ePEUv_8GP4kgImLtqTFOH6DB_ZF X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX2Pbd3KW2CHx+ HSgXEW6bj4/pnIKMT3PswP1KwpnA75fuZwx84GmP/IYd36vqL8gj4qMO7hrtEXsZDafWsAXcaBe 7FF6pn8Jo0561+CZMy2DAzuF07FvU9BGBs1QvabAs834Hhiwci9/uAh25axQZpoVlLsBwa4Izj5 vNOEjVs8ClVkgzbO5hZMOsLCEzEyeBUbHZngzAUvCboorPgArfZDdSS/7rZyjyChXbtrHRfKvso QmWpLTNRifi7cSla/ZZ+OjxbT1mJDO/RGLPBKnu34Bb3eZgxD8yMSj9x1M4jZgGxiIahMZFfjeS K2tZwpZsJed9fg/XF6yhhmOC5Q/zNRrTbwGM6jgd5PRVaRKr6pgDF/TW0gfsELqHM9zX5bzBNP2 9omsuCVcSE31Mvjg2q6neO+nT8iL0prPtq5vUx9yhY0YkRCNk0vb47h4V5JoYeiMOmi7ad4nLdj IQGjlclNMSGtsGmiyGQ== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-13_03,2026-06-12_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 phishscore=0 impostorscore=0 lowpriorityscore=0 clxscore=1015 bulkscore=0 spamscore=0 malwarescore=0 suspectscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606040000 definitions=main-2606130180 Real-hardware testing on a Snapdragon X1E80100 exposed a panic during boot-time module loading via finit_module -> kernel_read_file -> vmalloc: Internal error: UBSAN: array index out of bounds Call trace: vmap_bump_alloc alloc_vmap_area __vmalloc_node_range_noprof vmalloc_noprof kernel_read_file __arm64_sys_finit_module UBSAN's array-bounds sanitiser triggers on the indexed write loop: for (i =3D 0; i < n_pages; i++) chunk->page_va[idx + i] =3D va; Harden the bump path: - Centralise the eligibility predicate in vmap_bump_eligible() and add it to alloc_vmap_area() so vmap_bump_refill() is only called for requests the bump path can actually serve. Add PAGE_ALIGNED(size) and align > 0 to the predicate (defensive; alloc_vmap_area's callers always satisfy these but the explicit check is cheap and prevents the trap path from being entered with bad inputs). - In vmap_bump_alloc(), use check_add_overflow() for the new bump pointer, validate aligned >=3D chunk->base (defensive against metadata corruption), and bounds-check idx and idx + n_pages against VMAP_BUMP_CHUNK_PAGES before touching page_va[]. Replace the indexed page_va[] store loop with a pointer walk: slot =3D &chunk->page_va[idx]; for (i =3D n_pages; i > 0; i--) *slot++ =3D va; The pointer-increment form is not subject to the array-bounds sanitiser instrumentation that fires on chunk->page_va[idx + i]. - In vmap_bump_unlink(), validate n_pages > 0 and n_pages <=3D VMAP_BUMP_CHUNK_PAGES - idx before the memset, so a corrupted va->va_end cannot drive a write past the end of page_va[]. - Track the chunk's owner CPU at refill time and compare against per_cpu(vmap_bump_cur, owner_cpu) on unlink. The previous this_cpu_read(vmap_bump_cur) compared the chunk against the *current* CPU's chunk, which is wrong when free runs on a CPU other than the chunk owner: it could either retire a chunk that is still the owner's current, or skip retirement on a chunk that has already been replaced. No semantic change to the bump-path policy or to the addresses returned. Builds clean on x86_64 and arm64 (full bzImage / Image). Signed-off-by: Pranjal Arya --- mm/vmalloc.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------= ---- 1 file changed, 49 insertions(+), 13 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6991054e1cba..03f10b6b815c 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2508,6 +2508,7 @@ struct vmap_bump_chunk { unsigned long limit; unsigned long bump; atomic_t alloced; /* # outstanding pages */ + int owner_cpu; struct list_head link; /* on vmap_bump_chunks */ struct rcu_head rcu; /* deferred free */ struct vmap_area *page_va[VMAP_BUMP_CHUNK_PAGES]; @@ -2517,6 +2518,16 @@ static DEFINE_PER_CPU(struct vmap_bump_chunk *, vmap= _bump_cur); static LIST_HEAD(vmap_bump_chunks); static DEFINE_SPINLOCK(vmap_bump_chunks_lock); =20 +static __always_inline bool +vmap_bump_eligible(unsigned long size, unsigned long align, + unsigned long vstart, unsigned long vend) +{ + return vstart =3D=3D VMALLOC_START && vend =3D=3D VMALLOC_END && + size > 0 && PAGE_ALIGNED(size) && + size <=3D VMAP_BUMP_CHUNK_SIZE / 2 && + align > 0 && align <=3D VMAP_BUMP_CHUNK_SIZE / 2; +} + /* * Coarse [lo, hi) bounds covering every active vmap_bump_chunk's * range. vmap_chunk_lookup() rejects out-of-range addresses (e.g. @@ -2582,11 +2593,10 @@ vmap_bump_alloc(unsigned long size, unsigned long a= lign, { struct vmap_bump_chunk *chunk; struct vmap_area *va; - unsigned long aligned, idx, n_pages, i; + struct vmap_area **slot; + unsigned long aligned, new_bump, idx, n_pages, i; =20 - if (vstart !=3D VMALLOC_START || vend !=3D VMALLOC_END || - size =3D=3D 0 || size > VMAP_BUMP_CHUNK_SIZE / 2 || - align > VMAP_BUMP_CHUNK_SIZE / 2) + if (!vmap_bump_eligible(size, align, vstart, vend)) return NULL; =20 va =3D kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node); @@ -2607,22 +2617,34 @@ vmap_bump_alloc(unsigned long size, unsigned long a= lign, kmem_cache_free(vmap_area_cachep, va); return NULL; } + aligned =3D ALIGN(chunk->bump, align); - if (aligned + size > chunk->limit) { + if (aligned < chunk->base || + check_add_overflow(aligned, size, &new_bump) || + new_bump > chunk->limit) { preempt_enable(); kmem_cache_free(vmap_area_cachep, va); return NULL; } - chunk->bump =3D aligned + size; + idx =3D vmap_chunk_page_idx(chunk, aligned); n_pages =3D size >> PAGE_SHIFT; - for (i =3D 0; i < n_pages; i++) - chunk->page_va[idx + i] =3D va; + if (unlikely(idx >=3D VMAP_BUMP_CHUNK_PAGES || + n_pages > VMAP_BUMP_CHUNK_PAGES - idx)) { + preempt_enable(); + kmem_cache_free(vmap_area_cachep, va); + return NULL; + } + + chunk->bump =3D new_bump; + slot =3D &chunk->page_va[idx]; + for (i =3D n_pages; i > 0; i--) + *slot++ =3D va; atomic_add(n_pages, &chunk->alloced); preempt_enable(); =20 va->va_start =3D aligned; - va->va_end =3D aligned + size; + va->va_end =3D new_bump; va->vm =3D NULL; /* * Encode the destination vmap_node so the existing per-node pool @@ -2651,6 +2673,7 @@ vmap_bump_refill(gfp_t gfp_mask) { struct vmap_bump_chunk *new_chunk; unsigned long base; + int cpu; =20 new_chunk =3D kvzalloc(sizeof(*new_chunk), gfp_mask); if (!new_chunk) @@ -2670,6 +2693,7 @@ vmap_bump_refill(gfp_t gfp_mask) new_chunk->limit =3D base + VMAP_BUMP_CHUNK_SIZE; new_chunk->bump =3D base; atomic_set(&new_chunk->alloced, 0); + new_chunk->owner_cpu =3D -1; INIT_LIST_HEAD(&new_chunk->link); =20 spin_lock(&vmap_bump_chunks_lock); @@ -2681,6 +2705,8 @@ vmap_bump_refill(gfp_t gfp_mask) spin_unlock(&vmap_bump_chunks_lock); =20 preempt_disable(); + cpu =3D smp_processor_id(); + new_chunk->owner_cpu =3D cpu; this_cpu_write(vmap_bump_cur, new_chunk); preempt_enable(); =20 @@ -2699,6 +2725,7 @@ static struct vmap_area * vmap_bump_unlink(unsigned long addr) { struct vmap_bump_chunk *chunk; + struct vmap_bump_chunk *owner_cur; struct vmap_area *va; unsigned long idx, n_pages; =20 @@ -2715,6 +2742,8 @@ vmap_bump_unlink(unsigned long addr) return NULL; =20 n_pages =3D (va->va_end - va->va_start) >> PAGE_SHIFT; + if (unlikely(!n_pages || n_pages > VMAP_BUMP_CHUNK_PAGES - idx)) + return NULL; memset(&chunk->page_va[idx], 0, n_pages * sizeof(va)); =20 /* @@ -2725,8 +2754,12 @@ vmap_bump_unlink(unsigned long addr) * TLB entries until the next lazy-purge flush, so reusing them * before the flush is unsafe. Forward-only bump avoids that. */ + if (unlikely(chunk->owner_cpu < 0 || chunk->owner_cpu >=3D nr_cpu_ids)) + return va; + + owner_cur =3D READ_ONCE(per_cpu(vmap_bump_cur, chunk->owner_cpu)); if (atomic_sub_return(n_pages, &chunk->alloced) =3D=3D 0 && - chunk !=3D this_cpu_read(vmap_bump_cur)) { + chunk !=3D owner_cur) { spin_lock(&vmap_bump_chunks_lock); list_del_rcu(&chunk->link); spin_unlock(&vmap_bump_chunks_lock); @@ -2781,11 +2814,14 @@ static struct vmap_area *alloc_vmap_area(unsigned l= ong size, * find_unlink_vmap_area() consult vmap_chunk_lookup() before * falling back to busy.mt. */ - va =3D vmap_bump_alloc(size, align, vstart, vend, gfp_mask, node, - va_flags); - if (!va && vmap_bump_refill(gfp_mask) =3D=3D 0) + va =3D NULL; + if (vmap_bump_eligible(size, align, vstart, vend)) { va =3D vmap_bump_alloc(size, align, vstart, vend, gfp_mask, node, va_flags); + if (!va && vmap_bump_refill(gfp_mask) =3D=3D 0) + va =3D vmap_bump_alloc(size, align, vstart, vend, gfp_mask, + node, va_flags); + } if (va) { if (vm) { vm->addr =3D (void *)va->va_start; --=20 2.34.1