From nobody Fri Dec 19 14:21:08 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7609C83F1E for ; Tue, 29 Aug 2023 08:12:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234079AbjH2IMK (ORCPT ); Tue, 29 Aug 2023 04:12:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234032AbjH2ILu (ORCPT ); Tue, 29 Aug 2023 04:11:50 -0400 Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D7A7BF for ; Tue, 29 Aug 2023 01:11:46 -0700 (PDT) Received: by mail-lf1-x12c.google.com with SMTP id 2adb3069b0e04-500a398cda5so6586848e87.0 for ; Tue, 29 Aug 2023 01:11:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693296705; x=1693901505; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sL90/r5YfKaetndDygiXgRj9+82zEERzAQ0GDLDkLpU=; b=KBNx8+rMsWH6nCoZ+bqmYd9kAnk7WAnmsK1kBQsrAlAXw5G8gE0GvO3mXpHwmsaxo6 2gpql4OfGm5mlUT87PooQWCwid+0V/L0gGFxqRSDE33LssOioQu9/J/GftDFL6eIFLP2 WxkgXO7BfDqg+QFz6qDLjaC8GnQeNtUf+7iVTY2FPuZQ/tnQIvQOLSskE2Oh9B9ZvZ4N k1zIv/bkiyfhvv0zqX0jqhpriTp5UMS5HJvEu/9ev2Hy7/A7kVeOcLNiym9jh4PUxSk8 3Seg6UEsuWTLjTaMi36vLbd4qiliQuH+YQaAE0zVR/5ngM7wauyPOpps+nix0j1IeD0M 0l1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693296705; x=1693901505; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sL90/r5YfKaetndDygiXgRj9+82zEERzAQ0GDLDkLpU=; b=fTJcdCbadPhkgnfsPBSjDM78kgoNWA+WwSvl/HdMG0ll6/8Y6PrFHD4G2XxVCxnXMI tJk4DM70Ye6FOSR6U4WlhY3MMq9q6E1D2+44+Gr1BUHWpz7C5uSFiotOlBPWNbvwZJAc vLWFz+R0zPySZdcHSGPaio5Qn7gpUQzml2FEQ7dU5gPKjAZQnBGE1wWPwD5n3LawvIfb IZq+CREyEP1sDc1ZL+v1GCeBXVUO7VaK1BhPasMoBtHnHqX9KjVWqzroNNDsI80dHy1a 7UmQwDl8jdcuqfMvVZk4Ga2RlhqFA0Yrieic5FmjEqv7oYARuyfCpRfH0dNRdqc7N2YX DTbw== X-Gm-Message-State: AOJu0YzFVVzvcUxXshggBhiLZPv0jWwsd5oY3WVGrQ1t3YmkMBslyUyD PQ4LRJW8Z5undOor/YKXTx8= X-Google-Smtp-Source: AGHT+IGMSDc2itGbnAweqsvba/FoxkJn/ThOjItzn70Gby98afKeT+YDst0o5W/Z6YiFsaM3uYPSNA== X-Received: by 2002:a19:911a:0:b0:500:acf1:b432 with SMTP id t26-20020a19911a000000b00500acf1b432mr5985858lfd.63.1693296704660; Tue, 29 Aug 2023 01:11:44 -0700 (PDT) Received: from pc638.lan ([155.137.26.201]) by smtp.gmail.com with ESMTPSA id f25-20020a19ae19000000b004fbad341442sm1868026lfc.97.2023.08.29.01.11.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 01:11:44 -0700 (PDT) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton Cc: LKML , Baoquan He , Lorenzo Stoakes , Christoph Hellwig , Matthew Wilcox , "Liam R . Howlett" , Dave Chinner , "Paul E . McKenney" , Joel Fernandes , Uladzislau Rezki , Oleksiy Avramchenko , Christoph Hellwig Subject: [PATCH v2 1/9] mm: vmalloc: Add va_alloc() helper Date: Tue, 29 Aug 2023 10:11:34 +0200 Message-Id: <20230829081142.3619-2-urezki@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230829081142.3619-1-urezki@gmail.com> References: <20230829081142.3619-1-urezki@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Currently __alloc_vmap_area() function contains an open codded logic that finds and adjusts a VA based on allocation request. Introduce a va_alloc() helper that adjusts found VA only. It will be used later at least in two places. There is no a functional change as a result of this patch. Reviewed-by: Christoph Hellwig Reviewed-by: Lorenzo Stoakes Signed-off-by: Uladzislau Rezki (Sony) Reviewed-by: Baoquan He --- mm/vmalloc.c | 41 ++++++++++++++++++++++++++++------------- 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 93cf99aba335..00afc1ee4756 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1481,6 +1481,32 @@ adjust_va_to_fit_type(struct rb_root *root, struct l= ist_head *head, return 0; } =20 +static unsigned long +va_alloc(struct vmap_area *va, + struct rb_root *root, struct list_head *head, + unsigned long size, unsigned long align, + unsigned long vstart, unsigned long vend) +{ + unsigned long nva_start_addr; + int ret; + + if (va->va_start > vstart) + nva_start_addr =3D ALIGN(va->va_start, align); + else + nva_start_addr =3D ALIGN(vstart, align); + + /* Check the "vend" restriction. */ + if (nva_start_addr + size > vend) + return vend; + + /* Update the free vmap_area. */ + ret =3D adjust_va_to_fit_type(root, head, va, nva_start_addr, size); + if (WARN_ON_ONCE(ret)) + return vend; + + return nva_start_addr; +} + /* * Returns a start address of the newly allocated area, if success. * Otherwise a vend is returned that indicates failure. @@ -1493,7 +1519,6 @@ __alloc_vmap_area(struct rb_root *root, struct list_h= ead *head, bool adjust_search_size =3D true; unsigned long nva_start_addr; struct vmap_area *va; - int ret; =20 /* * Do not adjust when: @@ -1511,18 +1536,8 @@ __alloc_vmap_area(struct rb_root *root, struct list_= head *head, if (unlikely(!va)) return vend; =20 - if (va->va_start > vstart) - nva_start_addr =3D ALIGN(va->va_start, align); - else - nva_start_addr =3D ALIGN(vstart, align); - - /* Check the "vend" restriction. */ - if (nva_start_addr + size > vend) - return vend; - - /* Update the free vmap_area. */ - ret =3D adjust_va_to_fit_type(root, head, va, nva_start_addr, size); - if (WARN_ON_ONCE(ret)) + nva_start_addr =3D va_alloc(va, root, head, size, align, vstart, vend); + if (nva_start_addr =3D=3D vend) return vend; =20 #if DEBUG_AUGMENT_LOWEST_MATCH_CHECK --=20 2.30.2 From nobody Fri Dec 19 14:21:08 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03781C83F21 for ; Tue, 29 Aug 2023 08:12:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234088AbjH2IML (ORCPT ); Tue, 29 Aug 2023 04:12:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234034AbjH2ILu (ORCPT ); Tue, 29 Aug 2023 04:11:50 -0400 Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 60E6DDE for ; Tue, 29 Aug 2023 01:11:47 -0700 (PDT) Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-4ff8f2630e3so6345634e87.1 for ; Tue, 29 Aug 2023 01:11:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693296705; x=1693901505; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ywq8StQp/KBu06JcZ9GL4Zmn1HAvlwzTQDj/mQ29ODE=; b=cCHg9QJ7bV1PJkKT69qc78hMud5WO2XMZ2rC7Kq/zpCRpFTJt+XtyZcbFkXILGntyx kv1hCReR5H4Aousj8EzBEJKZoBYWg4HoA9bqmampWukHud4vwWjnBSYW2mLSV/uw7dBk 5J6TpDKuzrxGfeGZU6O/FjXI4FI9E6Swt258hqnD6CIFS4Pk6uZiQHDIJoiwTt5mbJkm di96B1OScPREmCOYj0GqBT18wYEM6wMvyEpII87nIOVOr9vYyOeJYuZNw2axR5sDg/17 GeMTgqQaxLFGx28j0AGBMtPnwqHSlnS8Q6w0W7oLv2PFIhdTmBN66ld6OxfCNZjikDnr QtqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693296705; x=1693901505; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ywq8StQp/KBu06JcZ9GL4Zmn1HAvlwzTQDj/mQ29ODE=; b=Iqk0K3jTKcqDDE0po/mt35SVqYlOX/WkLWWggZjQohIzypKBwV3kEjFz+6LVxOGfLt BIyuTrB2QdF9SjR+RtdxRy2NjfF2jfVl8jojNrnQhS0+VcqympNiiB/a2Pp3kifToPQN tv8gEsBCv1l9+nTPtNPyMAmKIJDU6HRihD9DMQVXKvQefspSMWzEKCdM3YRnD/+N3w1a a8/qGHz6XbaCWufS8OSz5QesZcc3nhlf4YyRvBgGLppKnlgZE9P/JjsT6+E6RnuJHRxp Cbeld8LCzZwVkc3ospdOygEGX2KvAkycFgf1vUAMY4HDiEEKyO721CQvRI+PjoEKk66i +Tdg== X-Gm-Message-State: AOJu0YwXyqGJgrbcTnM5Nynw/9AYNQHs5xRCwwnxTAfstAelxs0j3PLV Skz8zSxevw+1graQwXnubwc= X-Google-Smtp-Source: AGHT+IGwl4ZbrmE9sKqjqkXHLjLzQEeOulEYPjEqSQBVjeNPTFEyetQGhLr+FtcamNWOsT1eDs4/fw== X-Received: by 2002:a19:6705:0:b0:4f8:7513:8cb0 with SMTP id b5-20020a196705000000b004f875138cb0mr16773184lfc.2.1693296705456; Tue, 29 Aug 2023 01:11:45 -0700 (PDT) Received: from pc638.lan ([155.137.26.201]) by smtp.gmail.com with ESMTPSA id f25-20020a19ae19000000b004fbad341442sm1868026lfc.97.2023.08.29.01.11.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 01:11:45 -0700 (PDT) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton Cc: LKML , Baoquan He , Lorenzo Stoakes , Christoph Hellwig , Matthew Wilcox , "Liam R . Howlett" , Dave Chinner , "Paul E . McKenney" , Joel Fernandes , Uladzislau Rezki , Oleksiy Avramchenko , Christoph Hellwig Subject: [PATCH v2 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function Date: Tue, 29 Aug 2023 10:11:35 +0200 Message-Id: <20230829081142.3619-3-urezki@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230829081142.3619-1-urezki@gmail.com> References: <20230829081142.3619-1-urezki@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch renames the adjust_va_to_fit_type() function to va_clip() which is shorter and more expressive. There is no a functional change as a result of this patch. Reviewed-by: Christoph Hellwig Reviewed-by: Lorenzo Stoakes Signed-off-by: Uladzislau Rezki (Sony) Reviewed-by: Baoquan He --- mm/vmalloc.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 00afc1ee4756..09e315f8ea34 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1382,9 +1382,9 @@ classify_va_fit_type(struct vmap_area *va, } =20 static __always_inline int -adjust_va_to_fit_type(struct rb_root *root, struct list_head *head, - struct vmap_area *va, unsigned long nva_start_addr, - unsigned long size) +va_clip(struct rb_root *root, struct list_head *head, + struct vmap_area *va, unsigned long nva_start_addr, + unsigned long size) { struct vmap_area *lva =3D NULL; enum fit_type type =3D classify_va_fit_type(va, nva_start_addr, size); @@ -1500,7 +1500,7 @@ va_alloc(struct vmap_area *va, return vend; =20 /* Update the free vmap_area. */ - ret =3D adjust_va_to_fit_type(root, head, va, nva_start_addr, size); + ret =3D va_clip(root, head, va, nva_start_addr, size); if (WARN_ON_ONCE(ret)) return vend; =20 @@ -4151,9 +4151,8 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned l= ong *offsets, /* It is a BUG(), but trigger recovery instead. */ goto recovery; =20 - ret =3D adjust_va_to_fit_type(&free_vmap_area_root, - &free_vmap_area_list, - va, start, size); + ret =3D va_clip(&free_vmap_area_root, + &free_vmap_area_list, va, start, size); if (WARN_ON_ONCE(unlikely(ret))) /* It is a BUG(), but trigger recovery instead. */ goto recovery; --=20 2.30.2 From nobody Fri Dec 19 14:21:08 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8009C83F1F for ; Tue, 29 Aug 2023 08:12:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234074AbjH2IMI (ORCPT ); Tue, 29 Aug 2023 04:12:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234038AbjH2ILv (ORCPT ); Tue, 29 Aug 2023 04:11:51 -0400 Received: from mail-lf1-x134.google.com (mail-lf1-x134.google.com [IPv6:2a00:1450:4864:20::134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C4B3E9 for ; Tue, 29 Aug 2023 01:11:48 -0700 (PDT) Received: by mail-lf1-x134.google.com with SMTP id 2adb3069b0e04-5007616b756so6548801e87.3 for ; Tue, 29 Aug 2023 01:11:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693296706; x=1693901506; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZR6SvlG6RVqjpwTXvW3wIJv5GQWXT7NLdjvAZjLWLsM=; b=QxThSYRwDtcE6nVhtK77Bm0vpeVqL1+GW7jmP7pJXd2x1LSjjVHUSOzUkmQ50ayv1A Wk8PDFDtcTBiDNVJyGosExWMqHj0UnI11W5Howqzjd/g6CiGQZrRGR4cSSv151uKSnAF pMN4P9e398gh/DtM2IioA/kT2cgyNgezI62AjKiJNw/sm/8YPwqFO5ear/XGoPX9ZJ3r cydejULmZlzX3LouEZVkLDFq5LpxOEG7Qhil0DCO2+8SSC9q8je8V6+goqRnyIaBvyhI UgMFNWk7WtmxZoO+GcfZ3aR8jO+hBkTj6pnIxKqEebqGcKrtimaMyY/nopENrKiqtx2G r5Jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693296706; x=1693901506; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZR6SvlG6RVqjpwTXvW3wIJv5GQWXT7NLdjvAZjLWLsM=; b=FDp6XaXhl78r7zODNO34lVPc8ceTMdNiI0sbCe2U2CF2M2qfsxKf4DLfE4PFKrF6si SPibEikr+KMhSSFOlvrGMPH7kB1Q66fp/6wns9hGSUHhogexSsigUQoaPWEzayR5Er4c v2a8QmKDinYgeToM27W+7ID+BHSutkwkzXDMaE0BnlAAFh946Evq3jwAXJb6NQ5qgto9 gVNs51WmrNWtAvgFPP1dJ3tJXPZFl2h86PDJf80l7pV0jfSRv7OeWll6nE0jwC3UgrYp XUkFO5Q8IDHvnl5pkN0jFGSmCVMKsAxC3JCozSM00XkHpkYPcIWCk2fjPvyfZvhG4mPy OfAQ== X-Gm-Message-State: AOJu0YwWJbSd8Wmxwn85fTDsSGYaiQIOOxdyRZkvM8OSvrEBwpsuEh4l mNJl45D2+xNx9rq00149VSE= X-Google-Smtp-Source: AGHT+IGEnacQTpxADckRzHoYIXvZaUhpJE6tHjbYNN3uY99M3n0FzgtKGoq4Q7h2iZ7uuaPLaYrrjQ== X-Received: by 2002:a05:6512:304f:b0:500:b890:fb38 with SMTP id b15-20020a056512304f00b00500b890fb38mr5023278lfb.24.1693296706241; Tue, 29 Aug 2023 01:11:46 -0700 (PDT) Received: from pc638.lan ([155.137.26.201]) by smtp.gmail.com with ESMTPSA id f25-20020a19ae19000000b004fbad341442sm1868026lfc.97.2023.08.29.01.11.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 01:11:45 -0700 (PDT) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton Cc: LKML , Baoquan He , Lorenzo Stoakes , Christoph Hellwig , Matthew Wilcox , "Liam R . Howlett" , Dave Chinner , "Paul E . McKenney" , Joel Fernandes , Uladzislau Rezki , Oleksiy Avramchenko , Christoph Hellwig Subject: [PATCH v2 3/9] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c Date: Tue, 29 Aug 2023 10:11:36 +0200 Message-Id: <20230829081142.3619-4-urezki@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230829081142.3619-1-urezki@gmail.com> References: <20230829081142.3619-1-urezki@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" A vmap_init_free_space() is a function that setups a vmap space and is considered as part of initialization phase. Since a main entry which is vmalloc_init(), has been moved down in vmalloc.c it makes sense to follow the pattern. There is no a functional change as a result of this patch. Reviewed-by: Christoph Hellwig Reviewed-by: Lorenzo Stoakes Signed-off-by: Uladzislau Rezki (Sony) Reviewed-by: Baoquan He --- mm/vmalloc.c | 82 ++++++++++++++++++++++++++-------------------------- 1 file changed, 41 insertions(+), 41 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 09e315f8ea34..b7deacca1483 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2512,47 +2512,6 @@ void __init vm_area_register_early(struct vm_struct = *vm, size_t align) kasan_populate_early_vm_area_shadow(vm->addr, vm->size); } =20 -static void vmap_init_free_space(void) -{ - unsigned long vmap_start =3D 1; - const unsigned long vmap_end =3D ULONG_MAX; - struct vmap_area *busy, *free; - - /* - * B F B B B F - * -|-----|.....|-----|-----|-----|.....|- - * | The KVA space | - * |<--------------------------------->| - */ - list_for_each_entry(busy, &vmap_area_list, list) { - if (busy->va_start - vmap_start > 0) { - free =3D kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT); - if (!WARN_ON_ONCE(!free)) { - free->va_start =3D vmap_start; - free->va_end =3D busy->va_start; - - insert_vmap_area_augment(free, NULL, - &free_vmap_area_root, - &free_vmap_area_list); - } - } - - vmap_start =3D busy->va_end; - } - - if (vmap_end - vmap_start > 0) { - free =3D kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT); - if (!WARN_ON_ONCE(!free)) { - free->va_start =3D vmap_start; - free->va_end =3D vmap_end; - - insert_vmap_area_augment(free, NULL, - &free_vmap_area_root, - &free_vmap_area_list); - } - } -} - static inline void setup_vmalloc_vm_locked(struct vm_struct *vm, struct vmap_area *va, unsigned long flags, const void *caller) { @@ -4443,6 +4402,47 @@ module_init(proc_vmalloc_init); =20 #endif =20 +static void vmap_init_free_space(void) +{ + unsigned long vmap_start =3D 1; + const unsigned long vmap_end =3D ULONG_MAX; + struct vmap_area *busy, *free; + + /* + * B F B B B F + * -|-----|.....|-----|-----|-----|.....|- + * | The KVA space | + * |<--------------------------------->| + */ + list_for_each_entry(busy, &vmap_area_list, list) { + if (busy->va_start - vmap_start > 0) { + free =3D kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT); + if (!WARN_ON_ONCE(!free)) { + free->va_start =3D vmap_start; + free->va_end =3D busy->va_start; + + insert_vmap_area_augment(free, NULL, + &free_vmap_area_root, + &free_vmap_area_list); + } + } + + vmap_start =3D busy->va_end; + } + + if (vmap_end - vmap_start > 0) { + free =3D kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT); + if (!WARN_ON_ONCE(!free)) { + free->va_start =3D vmap_start; + free->va_end =3D vmap_end; + + insert_vmap_area_augment(free, NULL, + &free_vmap_area_root, + &free_vmap_area_list); + } + } +} + void __init vmalloc_init(void) { struct vmap_area *va; --=20 2.30.2 From nobody Fri Dec 19 14:21:08 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13E54C83F20 for ; Tue, 29 Aug 2023 08:12:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234095AbjH2IMM (ORCPT ); Tue, 29 Aug 2023 04:12:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234043AbjH2ILw (ORCPT ); Tue, 29 Aug 2023 04:11:52 -0400 Received: from mail-lf1-x132.google.com (mail-lf1-x132.google.com [IPv6:2a00:1450:4864:20::132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27903BF for ; Tue, 29 Aug 2023 01:11:49 -0700 (PDT) Received: by mail-lf1-x132.google.com with SMTP id 2adb3069b0e04-4fe27849e6aso6374413e87.1 for ; Tue, 29 Aug 2023 01:11:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693296707; x=1693901507; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7IWUAKCJIzOp/gxV9EkyFZD0b+FcbujfqgJIfgCDsRU=; b=drYl4gUd+Jtb9pJbnkmWDw0WW9znwqfm/dN9edYWOz5h87Zk4m0CZviGo0tqS209ji 7VllomAwTwIXfIpMe+6KrvXr6F1IRj+WCNZvkLczNr61sBgvSNMuVN/kJ8mEWH2X6nX0 uuj1sm2KsNDMVcam+3JEmn7cWzUahEwLgLL2nwnedEfakWQpVGG+Q7S+18rT44j8Dzpq ATERh9jcoFaEnq1Ed/WsI1swE7aFwgUQoLhOvI01UFm7uQIkzqXeEF5RNOuqplOGcqhV OI6BaeR2dQa3xX9JA5JQKEioDYYkPMYwPRQU0RLQg5irN7C6Y2ey+azm0BbbyJmONyR1 ZjEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693296707; x=1693901507; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7IWUAKCJIzOp/gxV9EkyFZD0b+FcbujfqgJIfgCDsRU=; b=mGsxDBH1f6nQeR91qH0qPjS09cPEgq4DsW2uxmDxK2e2LUkDXGIG48Op3B+RQqO1zZ OAY3EeKcKuADsxXm+kJ3WyhPKvZN/eTxhVoxPDRhDBhVfyJks/Ute/EpfUayJC/Hl9v2 jKEIhS+y9ykhaGL966oxAPdTpq09TVbDTZxPdRN0jZYJOEfJATJf64Ym3jiuKPmS7YR+ 88fppQfbKiMVO4vMHyxSqhxKkNimhKBsNSEUX+DKdxuikRy2IQQVbQhvkXePfzfSDqy3 +OqcwGqgR1sRh8j4n+B9aBMfALYvq1BUlNt6f4hOy3cpBibAMjfs2GAtlv14BLbAI5f7 ux7w== X-Gm-Message-State: AOJu0YyyU/MdMzUDcMLJCiXVJ9qQdUkhiDLIQU+25syTivAnKC2Uu5nX Ufy8AgpBkRqZ0tgDeDYHFr8= X-Google-Smtp-Source: AGHT+IF4shimJXURLpT/BCINR6HV4dj6RneDkH6uQA661sVN+/c+lRcPx1Hj3E6H1bDmojzFC3+KaQ== X-Received: by 2002:a19:6451:0:b0:4fd:bc33:e508 with SMTP id b17-20020a196451000000b004fdbc33e508mr18034118lfj.49.1693296707014; Tue, 29 Aug 2023 01:11:47 -0700 (PDT) Received: from pc638.lan ([155.137.26.201]) by smtp.gmail.com with ESMTPSA id f25-20020a19ae19000000b004fbad341442sm1868026lfc.97.2023.08.29.01.11.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 01:11:46 -0700 (PDT) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton Cc: LKML , Baoquan He , Lorenzo Stoakes , Christoph Hellwig , Matthew Wilcox , "Liam R . Howlett" , Dave Chinner , "Paul E . McKenney" , Joel Fernandes , Uladzislau Rezki , Oleksiy Avramchenko Subject: [PATCH v2 4/9] mm: vmalloc: Remove global vmap_area_root rb-tree Date: Tue, 29 Aug 2023 10:11:37 +0200 Message-Id: <20230829081142.3619-5-urezki@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230829081142.3619-1-urezki@gmail.com> References: <20230829081142.3619-1-urezki@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Store allocated objects in a separate nodes. A va->va_start address is converted into a correct node where it should be placed and resided. An addr_to_node() function is used to do a proper address conversion to determine a node that contains a VA. Such approach balances VAs across nodes as a result an access becomes scalable. Number of nodes in a system depends on number of CPUs divided by two. The density factor in this case is 1/2. Please note: 1. As of now allocated VAs are bound to a node-0. It means the patch does not give any difference comparing with a current behavior; 2. The global vmap_area_lock, vmap_area_root are removed as there is no need in it anymore. The vmap_area_list is still kept and is _empty_. It is exported for a kexec only; 3. The vmallocinfo and vread() have to be reworked to be able to handle multiple nodes. Signed-off-by: Uladzislau Rezki (Sony) Reviewed-by: Baoquan He --- mm/vmalloc.c | 209 +++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 161 insertions(+), 48 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index b7deacca1483..ae0368c314ff 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -728,11 +728,9 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define DEBUG_AUGMENT_LOWEST_MATCH_CHECK 0 =20 =20 -static DEFINE_SPINLOCK(vmap_area_lock); static DEFINE_SPINLOCK(free_vmap_area_lock); /* Export for kexec only */ LIST_HEAD(vmap_area_list); -static struct rb_root vmap_area_root =3D RB_ROOT; static bool vmap_initialized __read_mostly; =20 static struct rb_root purge_vmap_area_root =3D RB_ROOT; @@ -772,6 +770,38 @@ static struct rb_root free_vmap_area_root =3D RB_ROOT; */ static DEFINE_PER_CPU(struct vmap_area *, ne_fit_preload_node); =20 +/* + * An effective vmap-node logic. Users make use of nodes instead + * of a global heap. It allows to balance an access and mitigate + * contention. + */ +struct rb_list { + struct rb_root root; + struct list_head head; + spinlock_t lock; +}; + +struct vmap_node { + /* Bookkeeping data of this node. */ + struct rb_list busy; +}; + +static struct vmap_node *nodes, snode; +static __read_mostly unsigned int nr_nodes =3D 1; +static __read_mostly unsigned int node_size =3D 1; + +static inline unsigned int +addr_to_node_id(unsigned long addr) +{ + return (addr / node_size) % nr_nodes; +} + +static inline struct vmap_node * +addr_to_node(unsigned long addr) +{ + return &nodes[addr_to_node_id(addr)]; +} + static __always_inline unsigned long va_size(struct vmap_area *va) { @@ -803,10 +833,11 @@ unsigned long vmalloc_nr_pages(void) } =20 /* Look up the first VA which satisfies addr < va_end, NULL if none. */ -static struct vmap_area *find_vmap_area_exceed_addr(unsigned long addr) +static struct vmap_area * +find_vmap_area_exceed_addr(unsigned long addr, struct rb_root *root) { struct vmap_area *va =3D NULL; - struct rb_node *n =3D vmap_area_root.rb_node; + struct rb_node *n =3D root->rb_node; =20 addr =3D (unsigned long)kasan_reset_tag((void *)addr); =20 @@ -1552,12 +1583,14 @@ __alloc_vmap_area(struct rb_root *root, struct list= _head *head, */ static void free_vmap_area(struct vmap_area *va) { + struct vmap_node *vn =3D addr_to_node(va->va_start); + /* * Remove from the busy tree/list. */ - spin_lock(&vmap_area_lock); - unlink_va(va, &vmap_area_root); - spin_unlock(&vmap_area_lock); + spin_lock(&vn->busy.lock); + unlink_va(va, &vn->busy.root); + spin_unlock(&vn->busy.lock); =20 /* * Insert/Merge it back to the free tree/list. @@ -1600,6 +1633,7 @@ static struct vmap_area *alloc_vmap_area(unsigned lon= g size, int node, gfp_t gfp_mask, unsigned long va_flags) { + struct vmap_node *vn; struct vmap_area *va; unsigned long freed; unsigned long addr; @@ -1645,9 +1679,11 @@ static struct vmap_area *alloc_vmap_area(unsigned lo= ng size, va->vm =3D NULL; va->flags =3D va_flags; =20 - spin_lock(&vmap_area_lock); - insert_vmap_area(va, &vmap_area_root, &vmap_area_list); - spin_unlock(&vmap_area_lock); + vn =3D addr_to_node(va->va_start); + + spin_lock(&vn->busy.lock); + insert_vmap_area(va, &vn->busy.root, &vn->busy.head); + spin_unlock(&vn->busy.lock); =20 BUG_ON(!IS_ALIGNED(va->va_start, align)); BUG_ON(va->va_start < vstart); @@ -1871,26 +1907,61 @@ static void free_unmap_vmap_area(struct vmap_area *= va) =20 struct vmap_area *find_vmap_area(unsigned long addr) { + struct vmap_node *vn; struct vmap_area *va; + int i, j; + + /* + * An addr_to_node_id(addr) converts an address to a node index + * where a VA is located. If VA spans several zones and passed + * addr is not the same as va->va_start, what is not common, we + * may need to scan an extra nodes. See an example: + * + * <--va--> + * -|-----|-----|-----|-----|- + * 1 2 0 1 + * + * VA resides in node 1 whereas it spans 1 and 2. If passed + * addr is within a second node we should do extra work. We + * should mention that it is rare and is a corner case from + * the other hand it has to be covered. + */ + i =3D j =3D addr_to_node_id(addr); + do { + vn =3D &nodes[i]; =20 - spin_lock(&vmap_area_lock); - va =3D __find_vmap_area(addr, &vmap_area_root); - spin_unlock(&vmap_area_lock); + spin_lock(&vn->busy.lock); + va =3D __find_vmap_area(addr, &vn->busy.root); + spin_unlock(&vn->busy.lock); =20 - return va; + if (va) + return va; + } while ((i =3D (i + 1) % nr_nodes) !=3D j); + + return NULL; } =20 static struct vmap_area *find_unlink_vmap_area(unsigned long addr) { + struct vmap_node *vn; struct vmap_area *va; + int i, j; =20 - spin_lock(&vmap_area_lock); - va =3D __find_vmap_area(addr, &vmap_area_root); - if (va) - unlink_va(va, &vmap_area_root); - spin_unlock(&vmap_area_lock); + i =3D j =3D addr_to_node_id(addr); + do { + vn =3D &nodes[i]; =20 - return va; + spin_lock(&vn->busy.lock); + va =3D __find_vmap_area(addr, &vn->busy.root); + if (va) + unlink_va(va, &vn->busy.root); + spin_unlock(&vn->busy.lock); + + if (va) + return va; + } while ((i =3D (i + 1) % nr_nodes) !=3D j); + + return NULL; } =20 /*** Per cpu kva allocator ***/ @@ -2092,6 +2163,7 @@ static void *new_vmap_block(unsigned int order, gfp_t= gfp_mask) =20 static void free_vmap_block(struct vmap_block *vb) { + struct vmap_node *vn; struct vmap_block *tmp; struct xarray *xa; =20 @@ -2099,9 +2171,10 @@ static void free_vmap_block(struct vmap_block *vb) tmp =3D xa_erase(xa, addr_to_vb_idx(vb->va->va_start)); BUG_ON(tmp !=3D vb); =20 - spin_lock(&vmap_area_lock); - unlink_va(vb->va, &vmap_area_root); - spin_unlock(&vmap_area_lock); + vn =3D addr_to_node(vb->va->va_start); + spin_lock(&vn->busy.lock); + unlink_va(vb->va, &vn->busy.root); + spin_unlock(&vn->busy.lock); =20 free_vmap_area_noflush(vb->va); kfree_rcu(vb, rcu_head); @@ -2525,9 +2598,11 @@ static inline void setup_vmalloc_vm_locked(struct vm= _struct *vm, static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va, unsigned long flags, const void *caller) { - spin_lock(&vmap_area_lock); + struct vmap_node *vn =3D addr_to_node(va->va_start); + + spin_lock(&vn->busy.lock); setup_vmalloc_vm_locked(vm, va, flags, caller); - spin_unlock(&vmap_area_lock); + spin_unlock(&vn->busy.lock); } =20 static void clear_vm_uninitialized_flag(struct vm_struct *vm) @@ -3711,6 +3786,7 @@ static size_t vmap_ram_vread_iter(struct iov_iter *it= er, const char *addr, */ long vread_iter(struct iov_iter *iter, const char *addr, size_t count) { + struct vmap_node *vn; struct vmap_area *va; struct vm_struct *vm; char *vaddr; @@ -3724,8 +3800,11 @@ long vread_iter(struct iov_iter *iter, const char *a= ddr, size_t count) =20 remains =3D count; =20 - spin_lock(&vmap_area_lock); - va =3D find_vmap_area_exceed_addr((unsigned long)addr); + /* Hooked to node_0 so far. */ + vn =3D addr_to_node(0); + spin_lock(&vn->busy.lock); + + va =3D find_vmap_area_exceed_addr((unsigned long)addr, &vn->busy.root); if (!va) goto finished_zero; =20 @@ -3733,7 +3812,7 @@ long vread_iter(struct iov_iter *iter, const char *ad= dr, size_t count) if ((unsigned long)addr + remains <=3D va->va_start) goto finished_zero; =20 - list_for_each_entry_from(va, &vmap_area_list, list) { + list_for_each_entry_from(va, &vn->busy.head, list) { size_t copied; =20 if (remains =3D=3D 0) @@ -3792,12 +3871,12 @@ long vread_iter(struct iov_iter *iter, const char *= addr, size_t count) } =20 finished_zero: - spin_unlock(&vmap_area_lock); + spin_unlock(&vn->busy.lock); /* zero-fill memory holes */ return count - remains + zero_iter(iter, remains); finished: /* Nothing remains, or We couldn't copy/zero everything. */ - spin_unlock(&vmap_area_lock); + spin_unlock(&vn->busy.lock); =20 return count - remains; } @@ -4131,14 +4210,15 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned= long *offsets, } =20 /* insert all vm's */ - spin_lock(&vmap_area_lock); for (area =3D 0; area < nr_vms; area++) { - insert_vmap_area(vas[area], &vmap_area_root, &vmap_area_list); + struct vmap_node *vn =3D addr_to_node(vas[area]->va_start); =20 + spin_lock(&vn->busy.lock); + insert_vmap_area(vas[area], &vn->busy.root, &vn->busy.head); setup_vmalloc_vm_locked(vms[area], vas[area], VM_ALLOC, pcpu_get_vm_areas); + spin_unlock(&vn->busy.lock); } - spin_unlock(&vmap_area_lock); =20 /* * Mark allocated areas as accessible. Do it now as a best-effort @@ -4261,25 +4341,26 @@ bool vmalloc_dump_obj(void *object) =20 #ifdef CONFIG_PROC_FS static void *s_start(struct seq_file *m, loff_t *pos) - __acquires(&vmap_purge_lock) - __acquires(&vmap_area_lock) { + struct vmap_node *vn =3D addr_to_node(0); + mutex_lock(&vmap_purge_lock); - spin_lock(&vmap_area_lock); + spin_lock(&vn->busy.lock); =20 - return seq_list_start(&vmap_area_list, *pos); + return seq_list_start(&vn->busy.head, *pos); } =20 static void *s_next(struct seq_file *m, void *p, loff_t *pos) { - return seq_list_next(p, &vmap_area_list, pos); + struct vmap_node *vn =3D addr_to_node(0); + return seq_list_next(p, &vn->busy.head, pos); } =20 static void s_stop(struct seq_file *m, void *p) - __releases(&vmap_area_lock) - __releases(&vmap_purge_lock) { - spin_unlock(&vmap_area_lock); + struct vmap_node *vn =3D addr_to_node(0); + + spin_unlock(&vn->busy.lock); mutex_unlock(&vmap_purge_lock); } =20 @@ -4322,9 +4403,11 @@ static void show_purge_info(struct seq_file *m) =20 static int s_show(struct seq_file *m, void *p) { + struct vmap_node *vn; struct vmap_area *va; struct vm_struct *v; =20 + vn =3D addr_to_node(0); va =3D list_entry(p, struct vmap_area, list); =20 if (!va->vm) { @@ -4375,7 +4458,7 @@ static int s_show(struct seq_file *m, void *p) * As a final step, dump "unpurged" areas. */ final: - if (list_is_last(&va->list, &vmap_area_list)) + if (list_is_last(&va->list, &vn->busy.head)) show_purge_info(m); =20 return 0; @@ -4406,7 +4489,8 @@ static void vmap_init_free_space(void) { unsigned long vmap_start =3D 1; const unsigned long vmap_end =3D ULONG_MAX; - struct vmap_area *busy, *free; + struct vmap_area *free; + struct vm_struct *busy; =20 /* * B F B B B F @@ -4414,12 +4498,12 @@ static void vmap_init_free_space(void) * | The KVA space | * |<--------------------------------->| */ - list_for_each_entry(busy, &vmap_area_list, list) { - if (busy->va_start - vmap_start > 0) { + for (busy =3D vmlist; busy; busy =3D busy->next) { + if (busy->addr - vmap_start > 0) { free =3D kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT); if (!WARN_ON_ONCE(!free)) { free->va_start =3D vmap_start; - free->va_end =3D busy->va_start; + free->va_end =3D (unsigned long) busy->addr; =20 insert_vmap_area_augment(free, NULL, &free_vmap_area_root, @@ -4427,7 +4511,7 @@ static void vmap_init_free_space(void) } } =20 - vmap_start =3D busy->va_end; + vmap_start =3D (unsigned long) busy->addr + busy->size; } =20 if (vmap_end - vmap_start > 0) { @@ -4443,9 +4527,31 @@ static void vmap_init_free_space(void) } } =20 +static void vmap_init_nodes(void) +{ + struct vmap_node *vn; + int i; + + nodes =3D &snode; + + if (nr_nodes > 1) { + vn =3D kmalloc_array(nr_nodes, sizeof(*vn), GFP_NOWAIT); + if (vn) + nodes =3D vn; + } + + for (i =3D 0; i < nr_nodes; i++) { + vn =3D &nodes[i]; + vn->busy.root =3D RB_ROOT; + INIT_LIST_HEAD(&vn->busy.head); + spin_lock_init(&vn->busy.lock); + } +} + void __init vmalloc_init(void) { struct vmap_area *va; + struct vmap_node *vn; struct vm_struct *tmp; int i; =20 @@ -4467,6 +4573,11 @@ void __init vmalloc_init(void) xa_init(&vbq->vmap_blocks); } =20 + /* + * Setup nodes before importing vmlist. + */ + vmap_init_nodes(); + /* Import existing vmlist entries. */ for (tmp =3D vmlist; tmp; tmp =3D tmp->next) { va =3D kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT); @@ -4476,7 +4587,9 @@ void __init vmalloc_init(void) va->va_start =3D (unsigned long)tmp->addr; va->va_end =3D va->va_start + tmp->size; va->vm =3D tmp; - insert_vmap_area(va, &vmap_area_root, &vmap_area_list); + + vn =3D addr_to_node(va->va_start); + insert_vmap_area(va, &vn->busy.root, &vn->busy.head); } =20 /* --=20 2.30.2 From nobody Fri Dec 19 14:21:08 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 954C1C83F23 for ; Tue, 29 Aug 2023 08:12:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234101AbjH2IMN (ORCPT ); Tue, 29 Aug 2023 04:12:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234041AbjH2ILw (ORCPT ); Tue, 29 Aug 2023 04:11:52 -0400 Received: from mail-lf1-x131.google.com (mail-lf1-x131.google.com [IPv6:2a00:1450:4864:20::131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7C5FDE for ; Tue, 29 Aug 2023 01:11:49 -0700 (PDT) Received: by mail-lf1-x131.google.com with SMTP id 2adb3069b0e04-500b0f06136so4255742e87.0 for ; Tue, 29 Aug 2023 01:11:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693296708; x=1693901508; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zH+nVcQCC5XPUzS1kyDL4qDeiDz/PyOeL+vfo5mJUXo=; b=R8KeIq/3gtO3iTgd0IM5MatHKA1srv2GfCeW89vuYPHGX1HrOm/YGVm8W/cQQusDa+ k4bPIBJsDvaqObULuZhov2b43smtAmAGE3mE/O87ZlEQVF1j/x+eK7mRjKby8fzRKPZI IR/xvnboQVc9ZPuO9gmR6JyvftTpEkcxXUztdZKbYKEp2zcqAr8mLmQ95qbB0hx232ja 9uhNWbJTEO2HHCO2QyalVugYHNtvpuxPxCUy7tZtxEWn7u7kcohg9BOsznKV8ysYNJ/r e+D6evKP257A/+P9M2pdRttK1CsHqFt3XnOrHhp5Utqib5+jDwPlUL1gxQY0akAHcqLo VLFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693296708; x=1693901508; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zH+nVcQCC5XPUzS1kyDL4qDeiDz/PyOeL+vfo5mJUXo=; b=SNDjG/sdrkF805yelr2PmlrBJf8ZtJMzitUE9g1QwO5x+Z+OPteMu2oMMr2YNCCksD /yIuPo7+ZZfB3hnzyn9oTnmDB+Ow+EhedH6Bi59G17HLJIBJvEJ/2fF6cwJJOTsheGdh 4h8DAWupkUXP8ay0kwHSNYShGq8PpxhUhNT5SK4xo5vCuhwB4TLUcVPYFgSQqCCqITCm la6mLaWoYUQmg6T6eEF27WosJvo6M9WW72NGsEdsFSJSNqoLpw6ewePAW/kHevLbKSIu BbWRhtOaPpgrC7iDSvqJKPy8Vzp5r760MDA2YmkDjNkTvKID4n388IrRVXAMtvuQuODM syhA== X-Gm-Message-State: AOJu0YyInSB3/2WLk4cBt4Kc7oSTJbjAXScXlA1x5+3jVOVweuHUqfPT 4SrWWryM+Ny7l+abliHnPl8= X-Google-Smtp-Source: AGHT+IEdFI1NxFW8VIakie/KzaEK0YUPq/uRCFagJRaw5Bk5Hwv7EB3kWT6TkchlvGIBEwP/Xo/3aw== X-Received: by 2002:a05:6512:128c:b0:4fa:f96c:745f with SMTP id u12-20020a056512128c00b004faf96c745fmr25351346lfs.38.1693296707770; Tue, 29 Aug 2023 01:11:47 -0700 (PDT) Received: from pc638.lan ([155.137.26.201]) by smtp.gmail.com with ESMTPSA id f25-20020a19ae19000000b004fbad341442sm1868026lfc.97.2023.08.29.01.11.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 01:11:47 -0700 (PDT) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton Cc: LKML , Baoquan He , Lorenzo Stoakes , Christoph Hellwig , Matthew Wilcox , "Liam R . Howlett" , Dave Chinner , "Paul E . McKenney" , Joel Fernandes , Uladzislau Rezki , Oleksiy Avramchenko Subject: [PATCH v2 5/9] mm: vmalloc: Remove global purge_vmap_area_root rb-tree Date: Tue, 29 Aug 2023 10:11:38 +0200 Message-Id: <20230829081142.3619-6-urezki@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230829081142.3619-1-urezki@gmail.com> References: <20230829081142.3619-1-urezki@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Similar to busy VA, lazily-freed area is stored to a node it belongs to. Such approach does not require any global locking primitive, instead an access becomes scalable what mitigates a contention. This patch removes a global purge-lock, global purge-tree and global purge list. Signed-off-by: Uladzislau Rezki (Sony) Reviewed-by: Baoquan He --- mm/vmalloc.c | 135 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 82 insertions(+), 53 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ae0368c314ff..5a8a9c1370b6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -733,10 +733,6 @@ static DEFINE_SPINLOCK(free_vmap_area_lock); LIST_HEAD(vmap_area_list); static bool vmap_initialized __read_mostly; =20 -static struct rb_root purge_vmap_area_root =3D RB_ROOT; -static LIST_HEAD(purge_vmap_area_list); -static DEFINE_SPINLOCK(purge_vmap_area_lock); - /* * This kmem_cache is used for vmap_area objects. Instead of * allocating from slab we reuse an object from this cache to @@ -784,6 +780,12 @@ struct rb_list { struct vmap_node { /* Bookkeeping data of this node. */ struct rb_list busy; + struct rb_list lazy; + + /* + * Ready-to-free areas. + */ + struct list_head purge_list; }; =20 static struct vmap_node *nodes, snode; @@ -1768,40 +1770,22 @@ static DEFINE_MUTEX(vmap_purge_lock); =20 /* for per-CPU blocks */ static void purge_fragmented_blocks_allcpus(void); +static cpumask_t purge_nodes; =20 /* * Purges all lazily-freed vmap areas. */ -static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) +static unsigned long +purge_vmap_node(struct vmap_node *vn) { - unsigned long resched_threshold; - unsigned int num_purged_areas =3D 0; - struct list_head local_purge_list; + unsigned long num_purged_areas =3D 0; struct vmap_area *va, *n_va; =20 - lockdep_assert_held(&vmap_purge_lock); - - spin_lock(&purge_vmap_area_lock); - purge_vmap_area_root =3D RB_ROOT; - list_replace_init(&purge_vmap_area_list, &local_purge_list); - spin_unlock(&purge_vmap_area_lock); - - if (unlikely(list_empty(&local_purge_list))) - goto out; - - start =3D min(start, - list_first_entry(&local_purge_list, - struct vmap_area, list)->va_start); - - end =3D max(end, - list_last_entry(&local_purge_list, - struct vmap_area, list)->va_end); - - flush_tlb_kernel_range(start, end); - resched_threshold =3D lazy_max_pages() << 1; + if (list_empty(&vn->purge_list)) + return 0; =20 spin_lock(&free_vmap_area_lock); - list_for_each_entry_safe(va, n_va, &local_purge_list, list) { + list_for_each_entry_safe(va, n_va, &vn->purge_list, list) { unsigned long nr =3D (va->va_end - va->va_start) >> PAGE_SHIFT; unsigned long orig_start =3D va->va_start; unsigned long orig_end =3D va->va_end; @@ -1823,13 +1807,55 @@ static bool __purge_vmap_area_lazy(unsigned long st= art, unsigned long end) =20 atomic_long_sub(nr, &vmap_lazy_nr); num_purged_areas++; - - if (atomic_long_read(&vmap_lazy_nr) < resched_threshold) - cond_resched_lock(&free_vmap_area_lock); } spin_unlock(&free_vmap_area_lock); =20 -out: + return num_purged_areas; +} + +/* + * Purges all lazily-freed vmap areas. + */ +static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) +{ + unsigned long num_purged_areas =3D 0; + struct vmap_node *vn; + int i; + + lockdep_assert_held(&vmap_purge_lock); + purge_nodes =3D CPU_MASK_NONE; + + for (i =3D 0; i < nr_nodes; i++) { + vn =3D &nodes[i]; + + INIT_LIST_HEAD(&vn->purge_list); + + if (RB_EMPTY_ROOT(&vn->lazy.root)) + continue; + + spin_lock(&vn->lazy.lock); + WRITE_ONCE(vn->lazy.root.rb_node, NULL); + list_replace_init(&vn->lazy.head, &vn->purge_list); + spin_unlock(&vn->lazy.lock); + + start =3D min(start, list_first_entry(&vn->purge_list, + struct vmap_area, list)->va_start); + + end =3D max(end, list_last_entry(&vn->purge_list, + struct vmap_area, list)->va_end); + + cpumask_set_cpu(i, &purge_nodes); + } + + if (cpumask_weight(&purge_nodes) > 0) { + flush_tlb_kernel_range(start, end); + + for_each_cpu(i, &purge_nodes) { + vn =3D &nodes[i]; + num_purged_areas +=3D purge_vmap_node(vn); + } + } + trace_purge_vmap_area_lazy(start, end, num_purged_areas); return num_purged_areas > 0; } @@ -1848,16 +1874,9 @@ static void reclaim_and_purge_vmap_areas(void) =20 static void drain_vmap_area_work(struct work_struct *work) { - unsigned long nr_lazy; - - do { - mutex_lock(&vmap_purge_lock); - __purge_vmap_area_lazy(ULONG_MAX, 0); - mutex_unlock(&vmap_purge_lock); - - /* Recheck if further work is required. */ - nr_lazy =3D atomic_long_read(&vmap_lazy_nr); - } while (nr_lazy > lazy_max_pages()); + mutex_lock(&vmap_purge_lock); + __purge_vmap_area_lazy(ULONG_MAX, 0); + mutex_unlock(&vmap_purge_lock); } =20 /* @@ -1867,6 +1886,7 @@ static void drain_vmap_area_work(struct work_struct *= work) */ static void free_vmap_area_noflush(struct vmap_area *va) { + struct vmap_node *vn =3D addr_to_node(va->va_start); unsigned long nr_lazy_max =3D lazy_max_pages(); unsigned long va_start =3D va->va_start; unsigned long nr_lazy; @@ -1880,10 +1900,9 @@ static void free_vmap_area_noflush(struct vmap_area = *va) /* * Merge or place it to the purge tree/list. */ - spin_lock(&purge_vmap_area_lock); - merge_or_add_vmap_area(va, - &purge_vmap_area_root, &purge_vmap_area_list); - spin_unlock(&purge_vmap_area_lock); + spin_lock(&vn->lazy.lock); + merge_or_add_vmap_area(va, &vn->lazy.root, &vn->lazy.head); + spin_unlock(&vn->lazy.lock); =20 trace_free_vmap_area_noflush(va_start, nr_lazy, nr_lazy_max); =20 @@ -4390,15 +4409,21 @@ static void show_numa_info(struct seq_file *m, stru= ct vm_struct *v) =20 static void show_purge_info(struct seq_file *m) { + struct vmap_node *vn; struct vmap_area *va; + int i; =20 - spin_lock(&purge_vmap_area_lock); - list_for_each_entry(va, &purge_vmap_area_list, list) { - seq_printf(m, "0x%pK-0x%pK %7ld unpurged vm_area\n", - (void *)va->va_start, (void *)va->va_end, - va->va_end - va->va_start); + for (i =3D 0; i < nr_nodes; i++) { + vn =3D &nodes[i]; + + spin_lock(&vn->lazy.lock); + list_for_each_entry(va, &vn->lazy.head, list) { + seq_printf(m, "0x%pK-0x%pK %7ld unpurged vm_area\n", + (void *)va->va_start, (void *)va->va_end, + va->va_end - va->va_start); + } + spin_unlock(&vn->lazy.lock); } - spin_unlock(&purge_vmap_area_lock); } =20 static int s_show(struct seq_file *m, void *p) @@ -4545,6 +4570,10 @@ static void vmap_init_nodes(void) vn->busy.root =3D RB_ROOT; INIT_LIST_HEAD(&vn->busy.head); spin_lock_init(&vn->busy.lock); + + vn->lazy.root =3D RB_ROOT; + INIT_LIST_HEAD(&vn->lazy.head); + spin_lock_init(&vn->lazy.lock); } } =20 --=20 2.30.2 From nobody Fri Dec 19 14:21:08 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8C07C83F24 for ; Tue, 29 Aug 2023 08:12:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234109AbjH2IMR (ORCPT ); Tue, 29 Aug 2023 04:12:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234045AbjH2ILx (ORCPT ); Tue, 29 Aug 2023 04:11:53 -0400 Received: from mail-lf1-x12f.google.com (mail-lf1-x12f.google.com [IPv6:2a00:1450:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 866BA9E for ; Tue, 29 Aug 2023 01:11:50 -0700 (PDT) Received: by mail-lf1-x12f.google.com with SMTP id 2adb3069b0e04-5009969be25so6453467e87.3 for ; Tue, 29 Aug 2023 01:11:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693296709; x=1693901509; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3Ov9GwojXyegJrClaNh+dNhWRQYcZI0mb3uhXBk2808=; b=bJ6I4buxddwJbnV5khklHoNr8shneI1m2Pfz7Cn7lVn3Ct2TGx4NVrKFHMw0JCJm8K BMxCwi6WzwDxhS7jisbwydWfPFF2RvZ9E57hhAlUgeG9s7dO57fHKynARQd58sn5xkOu YU0ixCbxOslMZOn2Ct4ZXQKkUsSykzQLISy/dmgxa7AINLLz/rHk0FOsAmwAflAGW9NQ v56thAg/0Xxos/mutifkNdfXyC7i18IwGX5o1DFx7V18+tzrK/cckhszynKC6QzAYsBQ jA7iiREStn7vJ048+ZFJEwrzQz+pz+MPg0dv1ZXI+XOTiVY6jT9Isa7Aty3vQzXJb1JY Fevw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693296709; x=1693901509; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3Ov9GwojXyegJrClaNh+dNhWRQYcZI0mb3uhXBk2808=; b=DOBAXWDZ1kW7amjNhkBuLOl3IwyxLqIUOcmmdFT2yVZ/qAeZVyUxQsD5pUsbni6rpI ho3Zv99HsWfwDcgh3ZivcDTCVI3zzek/hXMN6uQ+NkfdAI9cYakpyPRskSbr0flKDbad eVQo9IzI1osAa05B5ogTHZlD1g+pxDTtD43xG306+SI7DJ0+UN2f5T8dwTlF2l/NNt0/ ZgprQvhY0AkLOf0VT5ydxbGsyu8xZ4xgO94neqERv9DRdVekN/ojubPEmFSt6YYILXo6 2eYYTiRoAfo4c7RXXg0AlHCC0jETxBplf0faUsww7FJwjYKOK4P3IQ6DWQVGDQHnJtNi XXrA== X-Gm-Message-State: AOJu0YwxAfE2o8gt1BuHn85sUZfS7DBJZ1o8IN9qOnD/Vc0lazDg09Do ks/DjyjH4VbQ1oRkb0xXTk4= X-Google-Smtp-Source: AGHT+IFvidEopg5kx+6gzDu497/g2a1bwH0SJKpU7lfNruEN5J+4hnUh3jCCpR0AMiZy4S6Hr/ENbQ== X-Received: by 2002:a05:6512:282c:b0:4ff:8f1b:8ccf with SMTP id cf44-20020a056512282c00b004ff8f1b8ccfmr22489868lfb.21.1693296708620; Tue, 29 Aug 2023 01:11:48 -0700 (PDT) Received: from pc638.lan ([155.137.26.201]) by smtp.gmail.com with ESMTPSA id f25-20020a19ae19000000b004fbad341442sm1868026lfc.97.2023.08.29.01.11.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 01:11:48 -0700 (PDT) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton Cc: LKML , Baoquan He , Lorenzo Stoakes , Christoph Hellwig , Matthew Wilcox , "Liam R . Howlett" , Dave Chinner , "Paul E . McKenney" , Joel Fernandes , Uladzislau Rezki , Oleksiy Avramchenko Subject: [PATCH v2 6/9] mm: vmalloc: Offload free_vmap_area_lock lock Date: Tue, 29 Aug 2023 10:11:39 +0200 Message-Id: <20230829081142.3619-7-urezki@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230829081142.3619-1-urezki@gmail.com> References: <20230829081142.3619-1-urezki@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Concurrent access to a global vmap space is a bottle-neck. We can simulate a high contention by running a vmalloc test suite. To address it, introduce an effective vmap node logic. Each node behaves as independent entity. When a node is accessed it serves a request directly(if possible) also it can fetch a new block from a global heap to its internals if no space or low capacity is left. This technique reduces a pressure on the global vmap lock. Signed-off-by: Uladzislau Rezki (Sony) --- mm/vmalloc.c | 316 +++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 279 insertions(+), 37 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 5a8a9c1370b6..4fd4915c532d 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -779,6 +779,7 @@ struct rb_list { =20 struct vmap_node { /* Bookkeeping data of this node. */ + struct rb_list free; struct rb_list busy; struct rb_list lazy; =20 @@ -786,6 +787,13 @@ struct vmap_node { * Ready-to-free areas. */ struct list_head purge_list; + struct work_struct purge_work; + unsigned long nr_purged; + + /* + * Control that only one user can pre-fetch this node. + */ + atomic_t fill_in_progress; }; =20 static struct vmap_node *nodes, snode; @@ -804,6 +812,32 @@ addr_to_node(unsigned long addr) return &nodes[addr_to_node_id(addr)]; } =20 +static inline struct vmap_node * +id_to_node(int id) +{ + return &nodes[id % nr_nodes]; +} + +static inline int +this_node_id(void) +{ + return raw_smp_processor_id() % nr_nodes; +} + +static inline unsigned long +encode_vn_id(int node_id) +{ + /* Can store U8_MAX [0:254] nodes. */ + return (node_id + 1) << BITS_PER_BYTE; +} + +static inline int +decode_vn_id(unsigned long val) +{ + /* Can store U8_MAX [0:254] nodes. */ + return (val >> BITS_PER_BYTE) - 1; +} + static __always_inline unsigned long va_size(struct vmap_area *va) { @@ -1586,6 +1620,7 @@ __alloc_vmap_area(struct rb_root *root, struct list_h= ead *head, static void free_vmap_area(struct vmap_area *va) { struct vmap_node *vn =3D addr_to_node(va->va_start); + int vn_id =3D decode_vn_id(va->flags); =20 /* * Remove from the busy tree/list. @@ -1594,12 +1629,19 @@ static void free_vmap_area(struct vmap_area *va) unlink_va(va, &vn->busy.root); spin_unlock(&vn->busy.lock); =20 - /* - * Insert/Merge it back to the free tree/list. - */ - spin_lock(&free_vmap_area_lock); - merge_or_add_vmap_area_augment(va, &free_vmap_area_root, &free_vmap_area_= list); - spin_unlock(&free_vmap_area_lock); + if (vn_id >=3D 0) { + vn =3D id_to_node(vn_id); + + /* Belongs to this node. */ + spin_lock(&vn->free.lock); + merge_or_add_vmap_area_augment(va, &vn->free.root, &vn->free.head); + spin_unlock(&vn->free.lock); + } else { + /* Goes to global. */ + spin_lock(&free_vmap_area_lock); + merge_or_add_vmap_area_augment(va, &free_vmap_area_root, &free_vmap_area= _list); + spin_unlock(&free_vmap_area_lock); + } } =20 static inline void @@ -1625,6 +1667,134 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_m= ask, int node) kmem_cache_free(vmap_area_cachep, va); } =20 +static unsigned long +node_alloc_fill(struct vmap_node *vn, + unsigned long size, unsigned long align, + gfp_t gfp_mask, int node) +{ + struct vmap_area *va; + unsigned long addr; + + va =3D kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node); + if (unlikely(!va)) + return VMALLOC_END; + + /* + * Please note, an allocated block is not aligned to its size. + * Therefore it can span several zones what means addr_to_node() + * can point to two different nodes: + * <-----> + * -|-----|-----|-----|-----|- + * 1 2 0 1 + * + * an alignment would just increase fragmentation thus more heap + * consumption what we would like to avoid. + */ + spin_lock(&free_vmap_area_lock); + addr =3D __alloc_vmap_area(&free_vmap_area_root, &free_vmap_area_list, + node_size, 1, VMALLOC_START, VMALLOC_END); + spin_unlock(&free_vmap_area_lock); + + if (addr =3D=3D VMALLOC_END) { + kmem_cache_free(vmap_area_cachep, va); + return VMALLOC_END; + } + + /* + * Statement and condition of the problem: + * + * a) where to free allocated areas from a node: + * - directly to a global heap; + * - to a node that we got a VA from; + * - what is a condition to return allocated areas + * to a global heap then; + * b) how to properly handle left small free fragments + * of a node in order to mitigate a fragmentation. + * + * How to address described points: + * When a new block is allocated(from a global heap) we shrink + * it deliberately by one page from both sides and place it to + * this node to serve a request. + * + * Why we shrink. We would like to distinguish VAs which were + * obtained from a node and a global heap. This is for a free + * path. A va->flags contains a node-id it belongs to. No VAs + * merging is possible between each other unless they are part + * of same block. + * + * A free-path in its turn can detect a correct node where a + * VA has to be returned. Thus as a block is freed entirely, + * its size becomes(merging): node_size - (2 * PAGE_SIZE) it + * recovers its edges, thus is released to a global heap for + * reuse elsewhere. In partly freed case, VAs go back to the + * node not bothering a global vmap space. + * + * 1 2 3 + * |<------------>|<------------>|<------------>| + * |..<-------->..|..<-------->..|..<-------->..| + */ + va->va_start =3D addr + PAGE_SIZE; + va->va_end =3D (addr + node_size) - PAGE_SIZE; + + spin_lock(&vn->free.lock); + /* Never merges. See explanation above. */ + insert_vmap_area_augment(va, NULL, &vn->free.root, &vn->free.head); + addr =3D va_alloc(va, &vn->free.root, &vn->free.head, + size, align, VMALLOC_START, VMALLOC_END); + spin_unlock(&vn->free.lock); + + return addr; +} + +static unsigned long +node_alloc(int vn_id, unsigned long size, unsigned long align, + unsigned long vstart, unsigned long vend, + gfp_t gfp_mask, int node) +{ + struct vmap_node *vn =3D id_to_node(vn_id); + unsigned long extra =3D align > PAGE_SIZE ? align : 0; + bool do_alloc_fill =3D false; + unsigned long addr; + + /* + * Fallback to a global heap if not vmalloc. + */ + if (vstart !=3D VMALLOC_START || vend !=3D VMALLOC_END) + return vend; + + /* + * A maximum allocation limit is 1/4 of capacity. This + * is done in order to prevent a fast depleting of zone + * by few requests. + */ + if (size + extra > (node_size >> 2)) + return vend; + + spin_lock(&vn->free.lock); + addr =3D __alloc_vmap_area(&vn->free.root, &vn->free.head, + size, align, vstart, vend); + + if (addr =3D=3D vend) { + /* + * Set the fetch flag under the critical section. + * This guarantees that only one user is eligible + * to perform a pre-fetch. A reset operation can + * be concurrent. + */ + if (!atomic_xchg(&vn->fill_in_progress, 1)) + do_alloc_fill =3D true; + } + spin_unlock(&vn->free.lock); + + /* Only if fails a previous attempt. */ + if (do_alloc_fill) { + addr =3D node_alloc_fill(vn, size, align, gfp_mask, node); + atomic_set(&vn->fill_in_progress, 0); + } + + return addr; +} + /* * Allocate a region of KVA of the specified size and alignment, within the * vstart and vend. @@ -1640,7 +1810,7 @@ static struct vmap_area *alloc_vmap_area(unsigned lon= g size, unsigned long freed; unsigned long addr; int purged =3D 0; - int ret; + int ret, vn_id; =20 if (unlikely(!size || offset_in_page(size) || !is_power_of_2(align))) return ERR_PTR(-EINVAL); @@ -1661,11 +1831,17 @@ static struct vmap_area *alloc_vmap_area(unsigned l= ong size, */ kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask); =20 + vn_id =3D this_node_id(); + addr =3D node_alloc(vn_id, size, align, vstart, vend, gfp_mask, node); + va->flags =3D (addr !=3D vend) ? encode_vn_id(vn_id) : 0; + retry: - preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, node); - addr =3D __alloc_vmap_area(&free_vmap_area_root, &free_vmap_area_list, - size, align, vstart, vend); - spin_unlock(&free_vmap_area_lock); + if (addr =3D=3D vend) { + preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, node); + addr =3D __alloc_vmap_area(&free_vmap_area_root, &free_vmap_area_list, + size, align, vstart, vend); + spin_unlock(&free_vmap_area_lock); + } =20 trace_alloc_vmap_area(addr, size, align, vstart, vend, addr =3D=3D vend); =20 @@ -1679,7 +1855,7 @@ static struct vmap_area *alloc_vmap_area(unsigned lon= g size, va->va_start =3D addr; va->va_end =3D addr + size; va->vm =3D NULL; - va->flags =3D va_flags; + va->flags |=3D va_flags; =20 vn =3D addr_to_node(va->va_start); =20 @@ -1772,31 +1948,58 @@ static DEFINE_MUTEX(vmap_purge_lock); static void purge_fragmented_blocks_allcpus(void); static cpumask_t purge_nodes; =20 -/* - * Purges all lazily-freed vmap areas. - */ -static unsigned long -purge_vmap_node(struct vmap_node *vn) +static void +reclaim_list_global(struct list_head *head) +{ + struct vmap_area *va, *n; + + if (list_empty(head)) + return; + + spin_lock(&free_vmap_area_lock); + list_for_each_entry_safe(va, n, head, list) + merge_or_add_vmap_area_augment(va, + &free_vmap_area_root, &free_vmap_area_list); + spin_unlock(&free_vmap_area_lock); +} + +static void purge_vmap_node(struct work_struct *work) { - unsigned long num_purged_areas =3D 0; + struct vmap_node *vn =3D container_of(work, + struct vmap_node, purge_work); struct vmap_area *va, *n_va; + LIST_HEAD(global); + + vn->nr_purged =3D 0; =20 if (list_empty(&vn->purge_list)) - return 0; + return; =20 - spin_lock(&free_vmap_area_lock); + spin_lock(&vn->free.lock); list_for_each_entry_safe(va, n_va, &vn->purge_list, list) { unsigned long nr =3D (va->va_end - va->va_start) >> PAGE_SHIFT; unsigned long orig_start =3D va->va_start; unsigned long orig_end =3D va->va_end; + int vn_id =3D decode_vn_id(va->flags); =20 - /* - * Finally insert or merge lazily-freed area. It is - * detached and there is no need to "unlink" it from - * anything. - */ - va =3D merge_or_add_vmap_area_augment(va, &free_vmap_area_root, - &free_vmap_area_list); + list_del_init(&va->list); + + if (vn_id >=3D 0) { + if (va_size(va) !=3D node_size - (2 * PAGE_SIZE)) + va =3D merge_or_add_vmap_area_augment(va, &vn->free.root, &vn->free.he= ad); + + if (va_size(va) =3D=3D node_size - (2 * PAGE_SIZE)) { + if (!list_empty(&va->list)) + unlink_va_augment(va, &vn->free.root); + + /* Restore the block size. */ + va->va_start -=3D PAGE_SIZE; + va->va_end +=3D PAGE_SIZE; + list_add(&va->list, &global); + } + } else { + list_add(&va->list, &global); + } =20 if (!va) continue; @@ -1806,11 +2009,10 @@ purge_vmap_node(struct vmap_node *vn) va->va_start, va->va_end); =20 atomic_long_sub(nr, &vmap_lazy_nr); - num_purged_areas++; + vn->nr_purged++; } - spin_unlock(&free_vmap_area_lock); - - return num_purged_areas; + spin_unlock(&vn->free.lock); + reclaim_list_global(&global); } =20 /* @@ -1818,11 +2020,17 @@ purge_vmap_node(struct vmap_node *vn) */ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) { - unsigned long num_purged_areas =3D 0; + unsigned long nr_purged_areas =3D 0; + unsigned int nr_purge_helpers; + unsigned int nr_purge_nodes; struct vmap_node *vn; int i; =20 lockdep_assert_held(&vmap_purge_lock); + + /* + * Use cpumask to mark which node has to be processed. + */ purge_nodes =3D CPU_MASK_NONE; =20 for (i =3D 0; i < nr_nodes; i++) { @@ -1847,17 +2055,45 @@ static bool __purge_vmap_area_lazy(unsigned long st= art, unsigned long end) cpumask_set_cpu(i, &purge_nodes); } =20 - if (cpumask_weight(&purge_nodes) > 0) { + nr_purge_nodes =3D cpumask_weight(&purge_nodes); + if (nr_purge_nodes > 0) { flush_tlb_kernel_range(start, end); =20 + /* One extra worker is per a lazy_max_pages() full set minus one. */ + nr_purge_helpers =3D atomic_long_read(&vmap_lazy_nr) / lazy_max_pages(); + nr_purge_helpers =3D clamp(nr_purge_helpers, 1U, nr_purge_nodes) - 1; + + for_each_cpu(i, &purge_nodes) { + vn =3D &nodes[i]; + + if (nr_purge_helpers > 0) { + INIT_WORK(&vn->purge_work, purge_vmap_node); + + if (cpumask_test_cpu(i, cpu_online_mask)) + schedule_work_on(i, &vn->purge_work); + else + schedule_work(&vn->purge_work); + + nr_purge_helpers--; + } else { + vn->purge_work.func =3D NULL; + purge_vmap_node(&vn->purge_work); + nr_purged_areas +=3D vn->nr_purged; + } + } + for_each_cpu(i, &purge_nodes) { vn =3D &nodes[i]; - num_purged_areas +=3D purge_vmap_node(vn); + + if (vn->purge_work.func) { + flush_work(&vn->purge_work); + nr_purged_areas +=3D vn->nr_purged; + } } } =20 - trace_purge_vmap_area_lazy(start, end, num_purged_areas); - return num_purged_areas > 0; + trace_purge_vmap_area_lazy(start, end, nr_purged_areas); + return nr_purged_areas > 0; } =20 /* @@ -1886,9 +2122,11 @@ static void drain_vmap_area_work(struct work_struct = *work) */ static void free_vmap_area_noflush(struct vmap_area *va) { - struct vmap_node *vn =3D addr_to_node(va->va_start); unsigned long nr_lazy_max =3D lazy_max_pages(); unsigned long va_start =3D va->va_start; + int vn_id =3D decode_vn_id(va->flags); + struct vmap_node *vn =3D vn_id >=3D 0 ? id_to_node(vn_id): + addr_to_node(va->va_start);; unsigned long nr_lazy; =20 if (WARN_ON_ONCE(!list_empty(&va->list))) @@ -4574,6 +4812,10 @@ static void vmap_init_nodes(void) vn->lazy.root =3D RB_ROOT; INIT_LIST_HEAD(&vn->lazy.head); spin_lock_init(&vn->lazy.lock); + + vn->free.root =3D RB_ROOT; + INIT_LIST_HEAD(&vn->free.head); + spin_lock_init(&vn->free.lock); } } =20 --=20 2.30.2 From nobody Fri Dec 19 14:21:08 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1ACCC6FA8F for ; Tue, 29 Aug 2023 08:12:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234116AbjH2IMR (ORCPT ); Tue, 29 Aug 2023 04:12:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234044AbjH2ILx (ORCPT ); Tue, 29 Aug 2023 04:11:53 -0400 Received: from mail-lj1-x232.google.com (mail-lj1-x232.google.com [IPv6:2a00:1450:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C59CE9 for ; Tue, 29 Aug 2023 01:11:51 -0700 (PDT) Received: by mail-lj1-x232.google.com with SMTP id 38308e7fff4ca-2bb9a063f26so62258411fa.2 for ; Tue, 29 Aug 2023 01:11:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693296709; x=1693901509; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=X9p+6vvdULZ9qjvQwfG8p+E9kLRCcHqiWfMDpSf+E74=; b=C+0ZPJKeh2AIn7VDgao2/FXWVrLlcnX1KA3YLoJi7cwPkwXL9e5UDGxjNsRNkYgr/B QjzB0HDDO9cPrCbMbmzlbj58HYnOEwytSJhrd1l0DnARX/+hknfvFf2YJASBluevQcEF +yzAAGe/wZSpO9edu3hwtl/H38HwNAdqnmh/yA0Zbszg1MKGDQ56gb0c2lfUNJkKdMlB OFqll7OSQXP932CL5iFUyLqfIJQIr8tPj0euSy+qV8bKyVZaU1zZjBccPJ/+qXwW5q4Z 4WHcDl9RIYZZxsBZfJkiqYhk9NRoTEKZAywVQarXxpFzSE/gcYhUE4lKKGSIP24Hj6Xu 8IGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693296709; x=1693901509; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=X9p+6vvdULZ9qjvQwfG8p+E9kLRCcHqiWfMDpSf+E74=; b=QCYlnXdu7tzxb0dYN3zqIKV1IIuMYPnZj88+xanIcLB2ihIDZMseX+NifIdHJfqE2H fbr6ejLsABhxr5Cz7OOxZVir9rZWf1ooX0WBUlgIQWrmixPWJjxI+aTzfAWhTOnG7n8T iOBdqCuxVpluskWl82fgAxGFA8rVwjUPQtDQtcDoO6X71F6vjXJk2LOVWSzVyeDtCpp3 rZ4MObiOFPe1muJJAGmy+Aa/zrpuZSjnQlVjSTcMG95jwfB8fNOe53r8eGYimROKfzhc tv2im0D641rQTZl9NeKftQ11pCys7g+1Hcl0uT6hNjDUuo0jI1ZRKkyg77l6aZc4Pi0B zuPg== X-Gm-Message-State: AOJu0YzHfh0aasQTAzQ7Mwg9amtMcDa9Ux780Dtyt1ZKRQTGh84aizu5 Qojm0geEDizUrV7VJprjHfM= X-Google-Smtp-Source: AGHT+IEC0NgW8tXM5qfPn6xExqsQCK2vQIifZS1j4TtRZBSspdGvCpkqpIJnDNq8vdHZtzaDCvPsdg== X-Received: by 2002:a05:6512:3115:b0:500:79a6:38d4 with SMTP id n21-20020a056512311500b0050079a638d4mr16193053lfb.40.1693296709391; Tue, 29 Aug 2023 01:11:49 -0700 (PDT) Received: from pc638.lan ([155.137.26.201]) by smtp.gmail.com with ESMTPSA id f25-20020a19ae19000000b004fbad341442sm1868026lfc.97.2023.08.29.01.11.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 01:11:49 -0700 (PDT) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton Cc: LKML , Baoquan He , Lorenzo Stoakes , Christoph Hellwig , Matthew Wilcox , "Liam R . Howlett" , Dave Chinner , "Paul E . McKenney" , Joel Fernandes , Uladzislau Rezki , Oleksiy Avramchenko Subject: [PATCH v2 7/9] mm: vmalloc: Support multiple nodes in vread_iter Date: Tue, 29 Aug 2023 10:11:40 +0200 Message-Id: <20230829081142.3619-8-urezki@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230829081142.3619-1-urezki@gmail.com> References: <20230829081142.3619-1-urezki@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Extend the vread_iter() to be able to perform a sequential reading of VAs which are spread among multiple nodes. So a data read over the /dev/kmem correctly reflects a vmalloc memory layout. Signed-off-by: Uladzislau Rezki (Sony) Reviewed-by: Baoquan He --- mm/vmalloc.c | 67 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 53 insertions(+), 14 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 4fd4915c532d..968144c16237 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -870,7 +870,7 @@ unsigned long vmalloc_nr_pages(void) =20 /* Look up the first VA which satisfies addr < va_end, NULL if none. */ static struct vmap_area * -find_vmap_area_exceed_addr(unsigned long addr, struct rb_root *root) +__find_vmap_area_exceed_addr(unsigned long addr, struct rb_root *root) { struct vmap_area *va =3D NULL; struct rb_node *n =3D root->rb_node; @@ -894,6 +894,41 @@ find_vmap_area_exceed_addr(unsigned long addr, struct = rb_root *root) return va; } =20 +/* + * Returns a node where a first VA, that satisfies addr < va_end, resides. + * If success, a node is locked. A user is responsible to unlock it when a + * VA is no longer needed to be accessed. + * + * Returns NULL if nothing found. + */ +static struct vmap_node * +find_vmap_area_exceed_addr_lock(unsigned long addr, struct vmap_area **va) +{ + struct vmap_node *vn, *va_node =3D NULL; + struct vmap_area *va_lowest; + int i; + + for (i =3D 0; i < nr_nodes; i++) { + vn =3D &nodes[i]; + + spin_lock(&vn->busy.lock); + va_lowest =3D __find_vmap_area_exceed_addr(addr, &vn->busy.root); + if (va_lowest) { + if (!va_node || va_lowest->va_start < (*va)->va_start) { + if (va_node) + spin_unlock(&va_node->busy.lock); + + *va =3D va_lowest; + va_node =3D vn; + continue; + } + } + spin_unlock(&vn->busy.lock); + } + + return va_node; +} + static struct vmap_area *__find_vmap_area(unsigned long addr, struct rb_ro= ot *root) { struct rb_node *n =3D root->rb_node; @@ -4048,6 +4083,7 @@ long vread_iter(struct iov_iter *iter, const char *ad= dr, size_t count) struct vm_struct *vm; char *vaddr; size_t n, size, flags, remains; + unsigned long next; =20 addr =3D kasan_reset_tag(addr); =20 @@ -4057,19 +4093,15 @@ long vread_iter(struct iov_iter *iter, const char *= addr, size_t count) =20 remains =3D count; =20 - /* Hooked to node_0 so far. */ - vn =3D addr_to_node(0); - spin_lock(&vn->busy.lock); - - va =3D find_vmap_area_exceed_addr((unsigned long)addr, &vn->busy.root); - if (!va) + vn =3D find_vmap_area_exceed_addr_lock((unsigned long) addr, &va); + if (!vn) goto finished_zero; =20 /* no intersects with alive vmap_area */ if ((unsigned long)addr + remains <=3D va->va_start) goto finished_zero; =20 - list_for_each_entry_from(va, &vn->busy.head, list) { + do { size_t copied; =20 if (remains =3D=3D 0) @@ -4084,10 +4116,10 @@ long vread_iter(struct iov_iter *iter, const char *= addr, size_t count) WARN_ON(flags =3D=3D VMAP_BLOCK); =20 if (!vm && !flags) - continue; + goto next_va; =20 if (vm && (vm->flags & VM_UNINITIALIZED)) - continue; + goto next_va; =20 /* Pair with smp_wmb() in clear_vm_uninitialized_flag() */ smp_rmb(); @@ -4096,7 +4128,7 @@ long vread_iter(struct iov_iter *iter, const char *ad= dr, size_t count) size =3D vm ? get_vm_area_size(vm) : va_size(va); =20 if (addr >=3D vaddr + size) - continue; + goto next_va; =20 if (addr < vaddr) { size_t to_zero =3D min_t(size_t, vaddr - addr, remains); @@ -4125,15 +4157,22 @@ long vread_iter(struct iov_iter *iter, const char *= addr, size_t count) =20 if (copied !=3D n) goto finished; - } + + next_va: + next =3D va->va_end; + spin_unlock(&vn->busy.lock); + } while ((vn =3D find_vmap_area_exceed_addr_lock(next, &va))); =20 finished_zero: - spin_unlock(&vn->busy.lock); + if (vn) + spin_unlock(&vn->busy.lock); + /* zero-fill memory holes */ return count - remains + zero_iter(iter, remains); finished: /* Nothing remains, or We couldn't copy/zero everything. */ - spin_unlock(&vn->busy.lock); + if (vn) + spin_unlock(&vn->busy.lock); =20 return count - remains; } --=20 2.30.2 From nobody Fri Dec 19 14:21:08 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82C91C83F25 for ; Tue, 29 Aug 2023 08:12:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234122AbjH2IMT (ORCPT ); Tue, 29 Aug 2023 04:12:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234046AbjH2ILz (ORCPT ); Tue, 29 Aug 2023 04:11:55 -0400 Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B40CBF for ; Tue, 29 Aug 2023 01:11:52 -0700 (PDT) Received: by mail-lj1-x231.google.com with SMTP id 38308e7fff4ca-2bbad32bc79so60894141fa.0 for ; Tue, 29 Aug 2023 01:11:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693296710; x=1693901510; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MQ79dcWK5DB43dAqCTqh1rVk+BckiUBvnWUU5cp8hRQ=; b=cJ578Rd8+pWkOxRraeWNzQ5Bjr4Xs8q21UHKNjn+pX9Pg9G1/d9RORSdyDavIEvqBu nZqw56Z/5bnZI+y6Pn8izwNVYNvZcPwy3SBuXJ0bmP150S3WqUUP2lvho86gZrTawaVa HN7hgQO88ZBN0j9YGepuZcSNMNKPJwd71vjjuyYtZgzfhFuO+xQAKqY/x0VT3pyjkI7i s33hleKJsNJYWtMKjIls/pp0qevocX7mEFRTjGXttgA1o21oQIAaRkMgOxo/Driz4GKG btE1nDRRXR/pCI90OWPdmQrGAGRn61cqJakmfIk3Jii9kpF3lXw5NDcqS79oKHhKU9MQ qvTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693296710; x=1693901510; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MQ79dcWK5DB43dAqCTqh1rVk+BckiUBvnWUU5cp8hRQ=; b=B+lXiEiQFQU3weU1gnzrq9YvzsHSCd8HPL54g+P1HszCMx+zKF756Tcic4gXVv+c4i wxi2khm0RuZkP9M5jFpw3ve/TPreL7ZoyZY6DtEIPXNnil+GtzehXleeUfl2azq+ZhzV 9t/UfuK7+k1C6qQmxWraHRfwAFKAPgEAi66KOvyW+2ySGaxDSE1DYC+zpGlcxe1FUGBQ kL8h8nya5X39u4xjdQURScEIFgUMZScfqbtTOmyvVERoB1k/443cLBQg5gs/E7p7t2yU NE0qSDjH30Ky6TfkGKFuEXB4q0cPXAraETJ0furos2Rr3vorfGZxoGGkUZs8AcZ6BgUf Ns1Q== X-Gm-Message-State: AOJu0YynOhxPr0a7eaiWZQQYhr+zyBsIk2OHXxI1i5A5yTCoPRrwvWiw 2r9vueR5kx7DghZkMPu/vA4= X-Google-Smtp-Source: AGHT+IGSANIaQ9YHIJaAEyDEmOUUzLYSIRZPqXEseP+0YpzgIn6wX5G+06cGZdVVQpKzh93qND74Rg== X-Received: by 2002:a05:6512:329c:b0:500:9de4:5968 with SMTP id p28-20020a056512329c00b005009de45968mr11140507lfe.59.1693296710194; Tue, 29 Aug 2023 01:11:50 -0700 (PDT) Received: from pc638.lan ([155.137.26.201]) by smtp.gmail.com with ESMTPSA id f25-20020a19ae19000000b004fbad341442sm1868026lfc.97.2023.08.29.01.11.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 01:11:49 -0700 (PDT) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton Cc: LKML , Baoquan He , Lorenzo Stoakes , Christoph Hellwig , Matthew Wilcox , "Liam R . Howlett" , Dave Chinner , "Paul E . McKenney" , Joel Fernandes , Uladzislau Rezki , Oleksiy Avramchenko Subject: [PATCH v2 8/9] mm: vmalloc: Support multiple nodes in vmallocinfo Date: Tue, 29 Aug 2023 10:11:41 +0200 Message-Id: <20230829081142.3619-9-urezki@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230829081142.3619-1-urezki@gmail.com> References: <20230829081142.3619-1-urezki@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Allocated areas are spread among nodes, it implies that the scanning has to be performed individually of each node in order to dump all existing VAs. Signed-off-by: Uladzislau Rezki (Sony) Reviewed-by: Baoquan He --- mm/vmalloc.c | 120 ++++++++++++++++++++------------------------------- 1 file changed, 47 insertions(+), 73 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 968144c16237..9cce012aecdb 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4636,30 +4636,6 @@ bool vmalloc_dump_obj(void *object) #endif =20 #ifdef CONFIG_PROC_FS -static void *s_start(struct seq_file *m, loff_t *pos) -{ - struct vmap_node *vn =3D addr_to_node(0); - - mutex_lock(&vmap_purge_lock); - spin_lock(&vn->busy.lock); - - return seq_list_start(&vn->busy.head, *pos); -} - -static void *s_next(struct seq_file *m, void *p, loff_t *pos) -{ - struct vmap_node *vn =3D addr_to_node(0); - return seq_list_next(p, &vn->busy.head, pos); -} - -static void s_stop(struct seq_file *m, void *p) -{ - struct vmap_node *vn =3D addr_to_node(0); - - spin_unlock(&vn->busy.lock); - mutex_unlock(&vmap_purge_lock); -} - static void show_numa_info(struct seq_file *m, struct vm_struct *v) { if (IS_ENABLED(CONFIG_NUMA)) { @@ -4703,84 +4679,82 @@ static void show_purge_info(struct seq_file *m) } } =20 -static int s_show(struct seq_file *m, void *p) +static int vmalloc_info_show(struct seq_file *m, void *p) { struct vmap_node *vn; struct vmap_area *va; struct vm_struct *v; + int i; =20 - vn =3D addr_to_node(0); - va =3D list_entry(p, struct vmap_area, list); + for (i =3D 0; i < nr_nodes; i++) { + vn =3D &nodes[i]; =20 - if (!va->vm) { - if (va->flags & VMAP_RAM) - seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n", - (void *)va->va_start, (void *)va->va_end, - va->va_end - va->va_start); + spin_lock(&vn->busy.lock); + list_for_each_entry(va, &vn->busy.head, list) { + if (!va->vm) { + if (va->flags & VMAP_RAM) + seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n", + (void *)va->va_start, (void *)va->va_end, + va->va_end - va->va_start); =20 - goto final; - } + continue; + } =20 - v =3D va->vm; + v =3D va->vm; =20 - seq_printf(m, "0x%pK-0x%pK %7ld", - v->addr, v->addr + v->size, v->size); + seq_printf(m, "0x%pK-0x%pK %7ld", + v->addr, v->addr + v->size, v->size); =20 - if (v->caller) - seq_printf(m, " %pS", v->caller); + if (v->caller) + seq_printf(m, " %pS", v->caller); =20 - if (v->nr_pages) - seq_printf(m, " pages=3D%d", v->nr_pages); + if (v->nr_pages) + seq_printf(m, " pages=3D%d", v->nr_pages); =20 - if (v->phys_addr) - seq_printf(m, " phys=3D%pa", &v->phys_addr); + if (v->phys_addr) + seq_printf(m, " phys=3D%pa", &v->phys_addr); =20 - if (v->flags & VM_IOREMAP) - seq_puts(m, " ioremap"); + if (v->flags & VM_IOREMAP) + seq_puts(m, " ioremap"); =20 - if (v->flags & VM_ALLOC) - seq_puts(m, " vmalloc"); + if (v->flags & VM_ALLOC) + seq_puts(m, " vmalloc"); =20 - if (v->flags & VM_MAP) - seq_puts(m, " vmap"); + if (v->flags & VM_MAP) + seq_puts(m, " vmap"); =20 - if (v->flags & VM_USERMAP) - seq_puts(m, " user"); + if (v->flags & VM_USERMAP) + seq_puts(m, " user"); =20 - if (v->flags & VM_DMA_COHERENT) - seq_puts(m, " dma-coherent"); + if (v->flags & VM_DMA_COHERENT) + seq_puts(m, " dma-coherent"); =20 - if (is_vmalloc_addr(v->pages)) - seq_puts(m, " vpages"); + if (is_vmalloc_addr(v->pages)) + seq_puts(m, " vpages"); =20 - show_numa_info(m, v); - seq_putc(m, '\n'); + show_numa_info(m, v); + seq_putc(m, '\n'); + } + spin_unlock(&vn->busy.lock); + } =20 /* * As a final step, dump "unpurged" areas. */ -final: - if (list_is_last(&va->list, &vn->busy.head)) - show_purge_info(m); - + show_purge_info(m); return 0; } =20 -static const struct seq_operations vmalloc_op =3D { - .start =3D s_start, - .next =3D s_next, - .stop =3D s_stop, - .show =3D s_show, -}; - static int __init proc_vmalloc_init(void) { + void *priv_data =3D NULL; + if (IS_ENABLED(CONFIG_NUMA)) - proc_create_seq_private("vmallocinfo", 0400, NULL, - &vmalloc_op, - nr_node_ids * sizeof(unsigned int), NULL); - else - proc_create_seq("vmallocinfo", 0400, NULL, &vmalloc_op); + priv_data =3D kmalloc(nr_node_ids * sizeof(unsigned int), GFP_KERNEL); + + proc_create_single_data("vmallocinfo", + 0400, NULL, vmalloc_info_show, priv_data); + return 0; } module_init(proc_vmalloc_init); --=20 2.30.2 From nobody Fri Dec 19 14:21:08 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 508E3C83F14 for ; Tue, 29 Aug 2023 08:12:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234131AbjH2IMU (ORCPT ); Tue, 29 Aug 2023 04:12:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234047AbjH2ILz (ORCPT ); Tue, 29 Aug 2023 04:11:55 -0400 Received: from mail-lf1-x131.google.com (mail-lf1-x131.google.com [IPv6:2a00:1450:4864:20::131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC4609E for ; Tue, 29 Aug 2023 01:11:52 -0700 (PDT) Received: by mail-lf1-x131.google.com with SMTP id 2adb3069b0e04-5009969be25so6453511e87.3 for ; Tue, 29 Aug 2023 01:11:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693296711; x=1693901511; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WDP4tqNUhlNIFZZBcXRgHZdwRMYy1Esdu/NCW7wWlsU=; b=T6NvuHclCxEppT8nFMh71q62sGE+tZlEA3hwVNUz/ZA8HpvLfE+jVsPEvlDkp2T6JA BvVtxAhcubFWHDu5ZanFy0HlsbfyU3/qbdz4bPI28m2qBXIlGa0rCttCqxWc/uXOIkl6 6UE+UaAshUPaG0V/dJkZFtmhVNTtGPOu4CJjaX56Km4/Cqox9EMJRpbuy2p45aKZ3PEV ZfkG9G9bTOinVcpj5NxphQLFLuXXYYF71Lw3fOpDNkJDEWv7QBySNmUcO8Z9PAuRFoM/ JueZT8bnCq4JbJcus2pTRqEENYrYmcJdD3FqrPZwA6rzNqirwxtGgAkf/SOxs02IMk+2 BN3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693296711; x=1693901511; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WDP4tqNUhlNIFZZBcXRgHZdwRMYy1Esdu/NCW7wWlsU=; b=EGtcJEez1vphIilfyFXQgwexAXIatIulqe8PDKzuE36LWgm0vy1ND01cTsUt7oVFrZ UX5yzqU3kphgtUvDjhTWuDOWnzJgxFZKKT/UBnKle2/LgGSnNJeT1SbRDEF5uPs+9PtF 1GUvx5kmEFSLlZsZ8JAPyJlyHedRqYmCaGaAG9MNbshW7xTT+Mp1CtzTcbklYwbelm+Q Zx775IxvOI/Prrb0tVZjttQVFHsufTBX8963QCC6S7GaQJ+iLmYSsNOWSpmtZBAakQwb jXuWl4hxuEdCyOyldS2jauI3Wt+M1K9O13yqegIT+NvBndoPUxOGF26nLC3wQWw1Gu7/ opYg== X-Gm-Message-State: AOJu0YwjhuOIWTRfmmuIajMuTcCJaopYEZkmbiKJC4T2kG3Iy/HorynW 9mKyl+CjJ4IQklRafzQ4bOM= X-Google-Smtp-Source: AGHT+IHUJH0ptBLC+Y1DnxMxgzbKnbR/6Fjs1YJ0+fTGbIM1B2rz7XjXZu/5VMJRPZ0+lDrr4JRlNg== X-Received: by 2002:a19:ca12:0:b0:500:8fcd:c3b4 with SMTP id a18-20020a19ca12000000b005008fcdc3b4mr12372967lfg.69.1693296711166; Tue, 29 Aug 2023 01:11:51 -0700 (PDT) Received: from pc638.lan ([155.137.26.201]) by smtp.gmail.com with ESMTPSA id f25-20020a19ae19000000b004fbad341442sm1868026lfc.97.2023.08.29.01.11.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 01:11:50 -0700 (PDT) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton Cc: LKML , Baoquan He , Lorenzo Stoakes , Christoph Hellwig , Matthew Wilcox , "Liam R . Howlett" , Dave Chinner , "Paul E . McKenney" , Joel Fernandes , Uladzislau Rezki , Oleksiy Avramchenko Subject: [PATCH v2 9/9] mm: vmalloc: Set nr_nodes/node_size based on CPU-cores Date: Tue, 29 Aug 2023 10:11:42 +0200 Message-Id: <20230829081142.3619-10-urezki@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230829081142.3619-1-urezki@gmail.com> References: <20230829081142.3619-1-urezki@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The density ratio is set to 2, i.e. two users per one node. For example if there are 6 cores in a system the "nr_nodes" is 3. The "node_size" also depends on number of physical cores. A high-threshold limit is hard-coded and set to SZ_4M. For 32-bit, single/dual core systems an access to a global vmap heap is not balanced. Such small systems do not suffer from lock contentions due to limitation of CPU-cores. Test on AMD Ryzen Threadripper 3970X 32-Core Processor: sudo ./test_vmalloc.sh run_test_mask=3D127 nr_threads=3D64 94.17% 0.90% [kernel] [k] _raw_spin_lock 93.27% 93.05% [kernel] [k] native_queued_spin_lock_slowpath 74.69% 0.25% [kernel] [k] __vmalloc_node_range 72.64% 0.01% [kernel] [k] __get_vm_area_node 72.04% 0.89% [kernel] [k] alloc_vmap_area 42.17% 0.00% [kernel] [k] vmalloc 32.53% 0.00% [kernel] [k] __vmalloc_node 24.91% 0.25% [kernel] [k] vfree 24.32% 0.01% [kernel] [k] remove_vm_area 22.63% 0.21% [kernel] [k] find_unlink_vmap_area 15.51% 0.00% [unknown] [k] 0xffffffffc09a74ac 14.35% 0.00% [kernel] [k] ret_from_fork_asm 14.35% 0.00% [kernel] [k] ret_from_fork 14.35% 0.00% [kernel] [k] kthread vs 74.32% 2.42% [kernel] [k] __vmalloc_node_range 69.58% 0.01% [kernel] [k] vmalloc 54.21% 1.17% [kernel] [k] __alloc_pages_bulk 48.13% 47.91% [kernel] [k] clear_page_orig 43.60% 0.01% [unknown] [k] 0xffffffffc082f16f 32.06% 0.00% [kernel] [k] ret_from_fork_asm 32.06% 0.00% [kernel] [k] ret_from_fork 32.06% 0.00% [kernel] [k] kthread 31.30% 0.00% [unknown] [k] 0xffffffffc082f889 22.98% 4.16% [kernel] [k] vfree 14.36% 0.28% [kernel] [k] __get_vm_area_node 13.43% 3.35% [kernel] [k] alloc_vmap_area 10.86% 0.04% [kernel] [k] remove_vm_area 8.89% 2.75% [kernel] [k] _raw_spin_lock 7.19% 0.00% [unknown] [k] 0xffffffffc082fba3 6.65% 1.37% [kernel] [k] free_unref_page 6.13% 6.11% [kernel] [k] native_queued_spin_lock_slowpath confirms that a native_queued_spin_lock_slowpath bottle-neck can be considered as negligible for the patch-series version. The throughput is ~15x higher: urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=3D127 nr_threads= =3D64 Run the test with following parameters: run_test_mask=3D127 nr_threads=3D64 Done. Check the kernel ring buffer to see the summary. real 24m3.305s user 0m0.361s sys 0m0.013s urezki@pc638:~$ urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=3D127 nr_threads= =3D64 Run the test with following parameters: run_test_mask=3D127 nr_threads=3D64 Done. Check the kernel ring buffer to see the summary. real 1m28.382s user 0m0.014s sys 0m0.026s urezki@pc638:~$ Signed-off-by: Uladzislau Rezki (Sony) Reviewed-by: Baoquan He --- mm/vmalloc.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 9cce012aecdb..08990f630c21 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -796,6 +796,9 @@ struct vmap_node { atomic_t fill_in_progress; }; =20 +#define MAX_NODES U8_MAX +#define MAX_NODE_SIZE SZ_4M + static struct vmap_node *nodes, snode; static __read_mostly unsigned int nr_nodes =3D 1; static __read_mostly unsigned int node_size =3D 1; @@ -4803,11 +4806,24 @@ static void vmap_init_free_space(void) } } =20 +static unsigned int calculate_nr_nodes(void) +{ + unsigned int nr_cpus; + + nr_cpus =3D num_present_cpus(); + if (nr_cpus <=3D 1) + nr_cpus =3D num_possible_cpus(); + + /* Density factor. Two users per a node. */ + return clamp_t(unsigned int, nr_cpus >> 1, 1, MAX_NODES); +} + static void vmap_init_nodes(void) { struct vmap_node *vn; int i; =20 + nr_nodes =3D calculate_nr_nodes(); nodes =3D &snode; =20 if (nr_nodes > 1) { @@ -4830,6 +4846,16 @@ static void vmap_init_nodes(void) INIT_LIST_HEAD(&vn->free.head); spin_lock_init(&vn->free.lock); } + + /* + * Scale a node size to number of CPUs. Each power of two + * value doubles a node size. A high-threshold limit is set + * to 4M. + */ +#if BITS_PER_LONG =3D=3D 64 + if (nr_nodes > 1) + node_size =3D min(SZ_64K << fls(num_possible_cpus()), SZ_4M); +#endif } =20 void __init vmalloc_init(void) --=20 2.30.2