From nobody Sun May 10 21:54:47 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C289CC433F5 for ; Fri, 22 Apr 2022 06:01:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233441AbiDVGE2 (ORCPT ); Fri, 22 Apr 2022 02:04:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233313AbiDVGES (ORCPT ); Fri, 22 Apr 2022 02:04:18 -0400 Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 173D24FC59 for ; Thu, 21 Apr 2022 23:01:24 -0700 (PDT) Received: by mail-pl1-x62a.google.com with SMTP id c12so8617441plr.6 for ; Thu, 21 Apr 2022 23:01:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Bp+6Mtl+9X8bx0bqhBqFKs5a4ePa+PXv90jlJdkj1lk=; b=UtxoZ1JdqW3fcX9auDfKm8675JgAGikYBukOpzndjrb/5obHXJuS5bajo3sXY8v1h1 jdIy74NZnl4V+AALme7MfYZzkEitt4clzC8B/aHb21JiWCLOlJr1MzyFujlbhcolMOB0 /f+Y40z4gYEgNUq1xabve/eP3xOjbArMQKNZBgAqpe1U7exem/8jWVnWdr/PhuzIQya8 sdSKOx8UN1uFZ+SSbw7OxmJng07BsR9qWbPp43X0d9pm+Zy1O9P2pwpermzw8oT88QjE 5CX6sbVq/kvJ8U3nftnAXr207zwiWHZhZE4P88DijAZJ+8WpRGH4Y+OOdbEMYUp7KGKX tlXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Bp+6Mtl+9X8bx0bqhBqFKs5a4ePa+PXv90jlJdkj1lk=; b=FIr4K9ZVOf7W2OS3K7VZR0HPfy6R0PkXhTuG3etM0JReEfdAY/7rJyO44Z9pzohzV2 WdwXTaSTCLwcTNvxTbL/QXCQQeWkQ2E36ZLIbvPDKgJrEns5zvWbmdzublJgSq6ukeze f1UV21yiQnJhWz2CLMn85ykxe8Svu2KhmBrvvo+MnoNw8vcxpmCYOTP0+70TqNQVKGio 5PrOsMy2AvZf+3jALZmZYYff88z+wloOl0Omnyvp7L97fGxnY7gH7++ZSc3i+UwhFbT9 HFR1nzrCszP/eUgBHaNsQ7FE1FDR7QbcMggdzUe+e8bZdvTeMt2cWTJiEbeTVBFe4Nqa 61Jw== X-Gm-Message-State: AOAM530HZNUjUvENzbH+HbGwiY7Cq33q1JYJwd3sYXgndRnavW4cfc3A y4ikaddPq241UpJ/Oro4eTc= X-Google-Smtp-Source: ABdhPJyE7r4GoOePw5giA255Y5FX9aOWvsph11ITy41jRBpWLTDZLqaV65pwzE9oS2A13aEn9VlUjw== X-Received: by 2002:a17:90a:4e08:b0:1cb:a3ac:938b with SMTP id n8-20020a17090a4e0800b001cba3ac938bmr14264027pjh.112.1650607283621; Thu, 21 Apr 2022 23:01:23 -0700 (PDT) Received: from bobo.ozlabs.ibm.com (193-116-116-20.tpgi.com.au. [193.116.116.20]) by smtp.gmail.com with ESMTPSA id y16-20020a637d10000000b00381268f2c6fsm998607pgc.4.2022.04.21.23.01.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Apr 2022 23:01:23 -0700 (PDT) From: Nicholas Piggin To: Paul Menzel Cc: Nicholas Piggin , x86@kernel.org, Song Liu , "Edgecombe, Rick P" , "Torvalds, Linus" , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 1/2] mm/vmalloc: huge vmalloc backing pages should be split rather than compound Date: Fri, 22 Apr 2022 16:01:05 +1000 Message-Id: <20220422060107.781512-2-npiggin@gmail.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220422060107.781512-1-npiggin@gmail.com> References: <20220422060107.781512-1-npiggin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Huge vmalloc higher-order backing pages were allocated with __GFP_COMP in order to allow the sub-pages to be refcounted by callers such as "remap_vmalloc_page [sic]" (remap_vmalloc_range). However a similar problem exists for other struct page fields callers use, for example fb_deferred_io_fault() takes a vmalloc'ed page and not only refcounts it but uses ->lru, ->mapping, ->index. This is not compatible with compound sub-pages. The correct approach is to use split high-order pages for the huge vmalloc backing. These allow callers to treat them in exactly the same way as individually-allocated order-0 pages. Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 07da85ae825b..cadfbb5155ea 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2653,15 +2653,18 @@ static void __vunmap(const void *addr, int dealloca= te_pages) vm_remove_mappings(area, deallocate_pages); =20 if (deallocate_pages) { - unsigned int page_order =3D vm_area_page_order(area); - int i, step =3D 1U << page_order; + int i; =20 - for (i =3D 0; i < area->nr_pages; i +=3D step) { + for (i =3D 0; i < area->nr_pages; i++) { struct page *page =3D area->pages[i]; =20 BUG_ON(!page); - mod_memcg_page_state(page, MEMCG_VMALLOC, -step); - __free_pages(page, page_order); + mod_memcg_page_state(page, MEMCG_VMALLOC, -1); + /* + * High-order allocs for huge vmallocs are split, so + * can be freed as an array of order-0 allocations + */ + __free_pages(page, 0); cond_resched(); } atomic_long_sub(area->nr_pages, &nr_vmalloc_pages); @@ -2914,12 +2917,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, if (nr !=3D nr_pages_request) break; } - } else - /* - * Compound pages required for remap_vmalloc_page if - * high-order pages. - */ - gfp |=3D __GFP_COMP; + } =20 /* High-order pages or fallback path if "bulk" fails. */ =20 @@ -2933,6 +2931,15 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page =3D alloc_pages_node(nid, gfp, order); if (unlikely(!page)) break; + /* + * Higher order allocations must be able to be treated as + * indepdenent small pages by callers (as they can with + * small-page vmallocs). Some drivers do their own refcounting + * on vmalloc_to_page() pages, some use page->mapping, + * page->lru, etc. + */ + if (order) + split_page(page, order); =20 /* * Careful, we allocate and map page-order pages, but @@ -2992,11 +2999,10 @@ static void *__vmalloc_area_node(struct vm_struct *= area, gfp_t gfp_mask, =20 atomic_long_add(area->nr_pages, &nr_vmalloc_pages); if (gfp_mask & __GFP_ACCOUNT) { - int i, step =3D 1U << page_order; + int i; =20 - for (i =3D 0; i < area->nr_pages; i +=3D step) - mod_memcg_page_state(area->pages[i], MEMCG_VMALLOC, - step); + for (i =3D 0; i < area->nr_pages; i++) + mod_memcg_page_state(area->pages[i], MEMCG_VMALLOC, 1); } =20 /* --=20 2.35.1 From nobody Sun May 10 21:54:47 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC32AC433EF for ; Fri, 22 Apr 2022 06:01:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241078AbiDVGEc (ORCPT ); Fri, 22 Apr 2022 02:04:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233399AbiDVGEU (ORCPT ); Fri, 22 Apr 2022 02:04:20 -0400 Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FD9F4F9F1 for ; Thu, 21 Apr 2022 23:01:28 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id bo5so7043775pfb.4 for ; Thu, 21 Apr 2022 23:01:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=iCtPEUmganS3WW6hgnqPl75q8FroRAr1REDTeJ7X0R0=; b=I478Ghar3KWh63quvKkgtK9ltlcIKE6xMB4aGC1ywBJ0fiu1rd4oq8FEIaZBiWG2Vy XdbNKsmyGSP2SzgBZMBPoMsVEnZqXfDub+b+tWxI5A3JPJ1L8SI5CvfK/RZ7oiWTdrTs FdMzKDGt5hJLtxYG4I0kYObvHpro/xHMdZ4fyK3XldXONdspWEsbo2tUWCQ57yPGDRlv 7gxnJJPW/H1V57G81YCRPeId/zxr4D7+CuH31t3RjSlJkRSqfQ8r39R4IMHRqDvvwP8D SWkGvKF3f1V902b3YuPg0HZJWLCsAMMmt9RzHu2kAgZqExJk0t8zhWPL0x2+MCIVs/Y0 WnjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=iCtPEUmganS3WW6hgnqPl75q8FroRAr1REDTeJ7X0R0=; b=jpjZsqu+qLgyUMtkiZYahFws8dyikprMet9zop4z8YiOQEm9iAnCmDPQPHwtLA88Bb 9L/v9lB0fy9MI0epsmFI146qSps1buWcCDyuvtyK8rZNlEmVqmCHFALG+2rjic7CNlqR WHBGAMtQnAuJ0f0ry1VMIPEdKGkMe6BJWB589SyWnPEZBvr7VnPiFsEjThI0zdI0j/L4 ECWHWp3gwaIE4CZPod6lonskVEd3VMb72F5T7XfsK3QSCaiC2vwHwduAj3yalr1e0HEv by3I+nxI3ltYbQIyslL37VDf599YMdjGaM2/4C8ITAgTZUFKRUVuJ8lm2kiqXl01XnKa 9OWQ== X-Gm-Message-State: AOAM532bLkaW1q+fa33b1ptwL+yuEYOWnY7fR2YlJv4CxO34N+xNAgqA c3owAkMUBxe+SrZU2lF0z8Q= X-Google-Smtp-Source: ABdhPJychbw03HI4297NZJB2yHoXqNJYk8ferxJW1V428PntpJtqiVa6H3q8JY2D4MExZdVRlMhXOw== X-Received: by 2002:a05:6a00:2405:b0:4e1:5008:adcc with SMTP id z5-20020a056a00240500b004e15008adccmr3352801pfh.35.1650607287854; Thu, 21 Apr 2022 23:01:27 -0700 (PDT) Received: from bobo.ozlabs.ibm.com (193-116-116-20.tpgi.com.au. [193.116.116.20]) by smtp.gmail.com with ESMTPSA id y16-20020a637d10000000b00381268f2c6fsm998607pgc.4.2022.04.21.23.01.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Apr 2022 23:01:27 -0700 (PDT) From: Nicholas Piggin To: Paul Menzel Cc: Nicholas Piggin , x86@kernel.org, Song Liu , "Edgecombe, Rick P" , "Torvalds, Linus" , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 2/2] Revert "vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP" Date: Fri, 22 Apr 2022 16:01:06 +1000 Message-Id: <20220422060107.781512-3-npiggin@gmail.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220422060107.781512-1-npiggin@gmail.com> References: <20220422060107.781512-1-npiggin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This reverts commit 559089e0a93d44280ec3ab478830af319c56dbe3 The previous commit fixes huge vmalloc for drivers that use the vmalloc_to_page() struct pages. Signed-off-by: Nicholas Piggin --- arch/Kconfig | 6 ++++-- arch/powerpc/kernel/module.c | 2 +- arch/s390/kvm/pv.c | 7 ++++++- include/linux/vmalloc.h | 4 ++-- mm/vmalloc.c | 17 +++++++---------- 5 files changed, 20 insertions(+), 16 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 31c4fdc4a4ba..29b0167c088b 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -854,8 +854,10 @@ config HAVE_ARCH_HUGE_VMAP =20 # # Archs that select this would be capable of PMD-sized vmaps (i.e., -# arch_vmap_pmd_supported() returns true). The VM_ALLOW_HUGE_VMAP flag -# must be used to enable allocations to use hugepages. +# arch_vmap_pmd_supported() returns true), and they must make no assumpti= ons +# that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP = flag +# can be used to prohibit arch-specific allocations from using hugepages = to +# help with this (e.g., modules may require it). # config HAVE_ARCH_HUGE_VMALLOC depends on HAVE_ARCH_HUGE_VMAP diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c index 97a76a8619fb..40a583e9d3c7 100644 --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -101,7 +101,7 @@ __module_alloc(unsigned long size, unsigned long start,= unsigned long end, bool * too. */ return __vmalloc_node_range(size, 1, start, end, gfp, prot, - VM_FLUSH_RESET_PERMS, + VM_FLUSH_RESET_PERMS | VM_NO_HUGE_VMAP, NUMA_NO_NODE, __builtin_return_address(0)); } =20 diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c index cc7c9599f43e..7f7c0d6af2ce 100644 --- a/arch/s390/kvm/pv.c +++ b/arch/s390/kvm/pv.c @@ -137,7 +137,12 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm) /* Allocate variable storage */ vlen =3D ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE); vlen +=3D uv_info.guest_virt_base_stor_len; - kvm->arch.pv.stor_var =3D vzalloc(vlen); + /* + * The Create Secure Configuration Ultravisor Call does not support + * using large pages for the virtual memory area. + * This is a hardware limitation. + */ + kvm->arch.pv.stor_var =3D vmalloc_no_huge(vlen); if (!kvm->arch.pv.stor_var) goto out_err; return 0; diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index b159c2789961..3b1df7da402d 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -26,7 +26,7 @@ struct notifier_block; /* in notifier.h */ #define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */ #define VM_FLUSH_RESET_PERMS 0x00000100 /* reset direct map and flush TLB = on unmap, can't be freed in atomic context */ #define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */ -#define VM_ALLOW_HUGE_VMAP 0x00000400 /* Allow for huge pages on arch= s with HAVE_ARCH_HUGE_VMALLOC */ +#define VM_NO_HUGE_VMAP 0x00000400 /* force PAGE_SIZE pte mapping */ =20 #if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \ !defined(CONFIG_KASAN_VMALLOC) @@ -153,7 +153,7 @@ extern void *__vmalloc_node_range(unsigned long size, u= nsigned long align, const void *caller) __alloc_size(1); void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_ma= sk, int node, const void *caller) __alloc_size(1); -void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1); +void *vmalloc_no_huge(unsigned long size) __alloc_size(1); =20 extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_s= ize(1, 2); extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index cadfbb5155ea..09470361dc03 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3101,7 +3101,7 @@ void *__vmalloc_node_range(unsigned long size, unsign= ed long align, return NULL; } =20 - if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) { + if (vmap_allow_huge && !(vm_flags & VM_NO_HUGE_VMAP)) { unsigned long size_per_node; =20 /* @@ -3268,24 +3268,21 @@ void *vmalloc(unsigned long size) EXPORT_SYMBOL(vmalloc); =20 /** - * vmalloc_huge - allocate virtually contiguous memory, allow huge pages - * @size: allocation size - * @gfp_mask: flags for the page level allocator + * vmalloc_no_huge - allocate virtually contiguous memory using small pages + * @size: allocation size * - * Allocate enough pages to cover @size from the page level + * Allocate enough non-huge pages to cover @size from the page level * allocator and map them into contiguous kernel virtual space. - * If @size is greater than or equal to PMD_SIZE, allow using - * huge pages for the memory * * Return: pointer to the allocated memory or %NULL on error */ -void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) +void *vmalloc_no_huge(unsigned long size) { return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, - gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP, + GFP_KERNEL, PAGE_KERNEL, VM_NO_HUGE_VMAP, NUMA_NO_NODE, __builtin_return_address(0)); } -EXPORT_SYMBOL_GPL(vmalloc_huge); +EXPORT_SYMBOL(vmalloc_no_huge); =20 /** * vzalloc - allocate virtually contiguous memory with zero fill --=20 2.35.1