From nobody Tue Apr 7 20:06:14 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6C1931E856 for ; Wed, 11 Mar 2026 17:25:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773249958; cv=none; b=DrwPQ6mKxqRPWYnrr5V35mcN09VuMjChmNvsfMo089CwvzamttVth0pj3OL/D04vnDArB8AgQzbYeVGps0ZcI6NDlld++ctEQQYCEDtnQrY8UmIEWhrEzvLrtWa9gKtNr4EAxDsDM+N+ohfFe37Fr3xObClBOkAA4+wHS8BM7PM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773249958; c=relaxed/simple; bh=NC89/X0UAtd6MuroNMC9+3geZS5DQaThc4yCOgVtpXo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ir1yevVpnpvlN1mqWM40AzZo7mmFfPrwJafUHskwHlK3me5pLxjwIJ6aAsDXsqHIec98ip+/91VYe3TxzlpMG992KubOnld//zAzefFeY0qyeQPAjTw8eecyry+zdouB0S4JFeU/thDEUDKNKabSwZkvwAOPrskCJU2OSIMYg4M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Qe3bY2RU; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Qe3bY2RU" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 106F6C4CEF7; Wed, 11 Mar 2026 17:25:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773249958; bh=NC89/X0UAtd6MuroNMC9+3geZS5DQaThc4yCOgVtpXo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Qe3bY2RUv/zBO5L9kyWgdabVDnUmRfSKgnDu+VxZsOoTw7XgAl77B0zM+jZbdqD63 IAdVUwTVGnR7umyDxUkU5dKsTjQZTGT1JJ3Bq8oQ3JmMmMPGeoQ1W0yCu3he/g7cnn Cpj81nEAPcZRCZmFdKtOiBUqO60tUv0r21EF96FlvYRgvefpxXIiAiM9YBAgXwA901 60TCtAc/BZFiFtsXov+c9Vd8obfjDf1MonXvz7CSoKCk1Uc146mbVN3rfzO3Rw9eCw 6mpHhgyTrTsgqBw2HuTj5m6Jo6RLsLL8mE7E7NuM6ijtPDIXvJdsFKZsE4FrCfJ/Hh Pg7Sl19O3+Ohw== From: "Lorenzo Stoakes (Oracle)" To: Andrew Morton Cc: "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Jann Horn , Pedro Falcato , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jianzhou Zhao , Oscar Salvador Subject: [PATCH 1/3] mm/mremap: correct invalid map count check Date: Wed, 11 Mar 2026 17:24:36 +0000 Message-ID: <73e218c67dcd197c5331840fb011e2c17155bfb0.1773249037.git.ljs@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We currently check to see, if on moving a VMA when doing mremap(), if it might violate the sys.vm.max_map_count limit. This was introduced in the mists of time prior to 2.6.12. At this point in time, as now, the move_vma() operation would copy the VMA (+1 mapping if not merged), then potentially split the source VMA upon unmap. Prior to commit 659ace584e7a ("mmap: don't return ENOMEM when mapcount is temporarily exceeded in munmap()"), a VMA split would check whether mm->map_count >=3D sysctl_max_map_count prior to a split before it ran. On unmap of the source VMA, if we are moving a partial VMA, we might split the VMA twice. This would mean, on invocation of split_vma() (as was), we'd check whether mm->map_count >=3D sysctl_max_map_count with a map count elevated by one, then again with a map count elevated by two, ending up with a map count elevated by three. At this point we'd reduce the map count on unmap. At the start of move_vma(), there was a check that has remained throughout mremap()'s history of mm->map_count >=3D sysctl_max_map_count - 3 (which implies mm->mmap_count + 4 > sysctl_max_map_count - that is, we must have headroom for 4 additional mappings). After mm->map_count is elevated by 3, it is decremented by one once the unmap completes. The mmap write lock is held, so nothing else will observe mm->map_count > sysctl_max_map_count. It appears this check was always incorrect - it should have either be one of 'mm->map_count > sysctl_max_map_count - 3' or 'mm->map_count >=3D sysctl_max_map_count - 2'. After commit 659ace584e7a ("mmap: don't return ENOMEM when mapcount is temporarily exceeded in munmap()"), the map count check on split is eliminated in the newly introduced __split_vma(), which the unmap path uses, and has that path check whether mm->map_count >=3D sysctl_max_map_count. This is valid since, net, an unmap can only cause an increase in map count of 1 (split both sides, unmap middle). Since we only copy a VMA and (if MREMAP_DONTUNMAP is not set) unmap afterwards, the maximum number of additional mappings that will actually be subject to any check will be 2. Therefore, update the check to assert this corrected value. Additionally, update the check introduced by commit ea2c3f6f5545 ("mm,mremap: bail out earlier in mremap_to under map pressure") to account for this. While we're here, clean up the comment prior to that. Signed-off-by: Lorenzo Stoakes (Oracle) Reviewed-by: Pedro Falcato --- mm/mremap.c | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/mm/mremap.c b/mm/mremap.c index 2be876a70cc0..e8c3021dd841 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -1041,10 +1041,11 @@ static unsigned long prep_move_vma(struct vma_remap= _struct *vrm) vm_flags_t dummy =3D vma->vm_flags; =20 /* - * We'd prefer to avoid failure later on in do_munmap: - * which may split one vma into three before unmapping. + * We'd prefer to avoid failure later on in do_munmap: we copy a VMA, + * which may not merge, then (if MREMAP_DONTUNMAP is not set) unmap the + * source, which may split, causing a net increase of 2 mappings. */ - if (current->mm->map_count >=3D sysctl_max_map_count - 3) + if (current->mm->map_count + 2 > sysctl_max_map_count) return -ENOMEM; =20 if (vma->vm_ops && vma->vm_ops->may_split) { @@ -1804,20 +1805,15 @@ static unsigned long check_mremap_params(struct vma= _remap_struct *vrm) return -EINVAL; =20 /* - * move_vma() need us to stay 4 maps below the threshold, otherwise - * it will bail out at the very beginning. - * That is a problem if we have already unmapped the regions here - * (new_addr, and old_addr), because userspace will not know the - * state of the vma's after it gets -ENOMEM. - * So, to avoid such scenario we can pre-compute if the whole - * operation has high chances to success map-wise. - * Worst-scenario case is when both vma's (new_addr and old_addr) get - * split in 3 before unmapping it. - * That means 2 more maps (1 for each) to the ones we already hold. - * Check whether current map count plus 2 still leads us to 4 maps below - * the threshold, otherwise return -ENOMEM here to be more safe. + * We may unmap twice before invoking move_vma(), that is if new_len < + * old_len (shrinking), and in the MREMAP_FIXED case, unmapping part of + * a VMA located at the destination. + * + * In the worst case, both unmappings will cause splits, resulting in a + * net increased map count of 2. In move_vma() we check for headroom of + * 2 additional mappings, so check early to avoid bailing out then. */ - if ((current->mm->map_count + 2) >=3D sysctl_max_map_count - 3) + if (current->mm->map_count + 4 > sysctl_max_map_count) return -ENOMEM; =20 return 0; --=20 2.53.0 From nobody Tue Apr 7 20:06:14 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82A3C3E276D for ; Wed, 11 Mar 2026 17:26:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773249961; cv=none; b=Rw/eLk4WVsy+IzjvUWq3PiwtPe2zVZW3/Zp5VMbDV8ZJSAKUMx97KxC0lBS92v48cqnhHDD9eb1ZtYxcbZ58dtCquemDSPI/W1lQSOdsFeOKqtFFsH/85FoOaOUvYWA1pJQZ+btRIl+94EZG1wcd0lH+D+TuKWo7ppofocxEKyE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773249961; c=relaxed/simple; bh=VxzFdwK5BomklsvpZZhqobXHAFHf//CC8QYNa1lC21M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QdfRFbjCQTWs3ouDkCBJwBS8z5/75fH7VEa40/5C2J+puqURTriLMiRzVp+CPD1MsvLiIg2suCnAwUX0T4/+dbiGJ2Qpn/IaC4ncaTLu/I3FAKhUjzzyT5L3N13jFXED9igZ7ptnGGMkTAUN41TTzLCzU6IDr1x+EBQ044pY/x8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qW//qMDT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qW//qMDT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BEE2CC19424; Wed, 11 Mar 2026 17:26:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773249961; bh=VxzFdwK5BomklsvpZZhqobXHAFHf//CC8QYNa1lC21M=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qW//qMDTN9l3K4RFZ4aKjbmKRIzEVbcsxU+DjhapouOY/hCn4xpJuchd3euhDk4gF 2FvEXorHzo0aLS5ruc2Q5U8dl2EfIMn3R7dGo4U/hVobzG4HDIuNRlNOYG880GL8cR aHVzw0E8Rt9LPK064RAU8+cxVjnXtiuzPB1A9nLGHicvrhCVUTNLzerw/rxhOIyQdI XOz4RmL6naN+xwarzFTeoiST9dBbehmvGNtID+wgeBjen72bQBsVE8z2cmwlkSxSvT 8uhlveZEHd7sUU2ToBjw7QcSi6+6GvGV+gFK4a3z/3w9i3M4H+5Zxq1u/NK8/jkFhs boSwzuBbrngjg== From: "Lorenzo Stoakes (Oracle)" To: Andrew Morton Cc: "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Jann Horn , Pedro Falcato , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jianzhou Zhao , Oscar Salvador Subject: [PATCH 2/3] mm: abstract reading sysctl_max_map_count, and READ_ONCE() Date: Wed, 11 Mar 2026 17:24:37 +0000 Message-ID: <0715259eb37cbdfde4f9e5db92a20ec7110a1ce5.1773249037.git.ljs@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Concurrent reads and writes of sysctl_max_map_count are possible, so we should READ_ONCE() and WRITE_ONCE(). The sysctl procfs logic already enforces WRITE_ONCE(), so abstract the read side with get_sysctl_max_map_count(). While we're here, also move the field to mm/internal.h and add the getter there since only mm interacts with it, there's no need for anybody else to have access. Finally, update the VMA userland tests to reflect the change. Signed-off-by: Lorenzo Stoakes (Oracle) Reviewed-by: Pedro Falcato --- include/linux/mm.h | 2 -- mm/internal.h | 6 ++++++ mm/mmap.c | 2 +- mm/mremap.c | 4 ++-- mm/nommu.c | 2 +- mm/vma.c | 6 +++--- tools/testing/vma/include/custom.h | 3 --- tools/testing/vma/include/dup.h | 9 +++++++++ tools/testing/vma/main.c | 2 ++ 9 files changed, 24 insertions(+), 12 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 4c4fd55fc823..1168374e2219 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -207,8 +207,6 @@ static inline void __mm_zero_struct_page(struct page *p= age) #define MAPCOUNT_ELF_CORE_MARGIN (5) #define DEFAULT_MAX_MAP_COUNT (USHRT_MAX - MAPCOUNT_ELF_CORE_MARGIN) =20 -extern int sysctl_max_map_count; - extern unsigned long sysctl_user_reserve_kbytes; extern unsigned long sysctl_admin_reserve_kbytes; =20 diff --git a/mm/internal.h b/mm/internal.h index 95b583e7e4f7..68bc509757c9 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1848,4 +1848,10 @@ static inline int pmdp_test_and_clear_young_notify(s= truct vm_area_struct *vma, =20 #endif /* CONFIG_MMU_NOTIFIER */ =20 +extern int sysctl_max_map_count; +static inline int get_sysctl_max_map_count(void) +{ + return READ_ONCE(sysctl_max_map_count); +} + #endif /* __MM_INTERNAL_H */ diff --git a/mm/mmap.c b/mm/mmap.c index 843160946aa5..79544d893411 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -375,7 +375,7 @@ unsigned long do_mmap(struct file *file, unsigned long = addr, return -EOVERFLOW; =20 /* Too many mappings? */ - if (mm->map_count > sysctl_max_map_count) + if (mm->map_count > get_sysctl_max_map_count()) return -ENOMEM; =20 /* diff --git a/mm/mremap.c b/mm/mremap.c index e8c3021dd841..ba6c690f6c1b 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -1045,7 +1045,7 @@ static unsigned long prep_move_vma(struct vma_remap_s= truct *vrm) * which may not merge, then (if MREMAP_DONTUNMAP is not set) unmap the * source, which may split, causing a net increase of 2 mappings. */ - if (current->mm->map_count + 2 > sysctl_max_map_count) + if (current->mm->map_count + 2 > get_sysctl_max_map_count()) return -ENOMEM; =20 if (vma->vm_ops && vma->vm_ops->may_split) { @@ -1813,7 +1813,7 @@ static unsigned long check_mremap_params(struct vma_r= emap_struct *vrm) * net increased map count of 2. In move_vma() we check for headroom of * 2 additional mappings, so check early to avoid bailing out then. */ - if (current->mm->map_count + 4 > sysctl_max_map_count) + if (current->mm->map_count + 4 > get_sysctl_max_map_count()) return -ENOMEM; =20 return 0; diff --git a/mm/nommu.c b/mm/nommu.c index c3a23b082adb..ed3934bc2de4 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1317,7 +1317,7 @@ static int split_vma(struct vma_iterator *vmi, struct= vm_area_struct *vma, return -ENOMEM; =20 mm =3D vma->vm_mm; - if (mm->map_count >=3D sysctl_max_map_count) + if (mm->map_count >=3D get_sysctl_max_map_count()) return -ENOMEM; =20 region =3D kmem_cache_alloc(vm_region_jar, GFP_KERNEL); diff --git a/mm/vma.c b/mm/vma.c index be64f781a3aa..07882d2040b1 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -590,7 +590,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_st= ruct *vma, static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long addr, int new_below) { - if (vma->vm_mm->map_count >=3D sysctl_max_map_count) + if (vma->vm_mm->map_count >=3D get_sysctl_max_map_count()) return -ENOMEM; =20 return __split_vma(vmi, vma, addr, new_below); @@ -1394,7 +1394,7 @@ static int vms_gather_munmap_vmas(struct vma_munmap_s= truct *vms, * its limit temporarily, to help free resources as expected. */ if (vms->end < vms->vma->vm_end && - vms->vma->vm_mm->map_count >=3D sysctl_max_map_count) { + vms->vma->vm_mm->map_count >=3D get_sysctl_max_map_count()) { error =3D -ENOMEM; goto map_count_exceeded; } @@ -2870,7 +2870,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_= area_struct *vma, if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT)) return -ENOMEM; =20 - if (mm->map_count > sysctl_max_map_count) + if (mm->map_count > get_sysctl_max_map_count()) return -ENOMEM; =20 if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT)) diff --git a/tools/testing/vma/include/custom.h b/tools/testing/vma/include= /custom.h index 833ff4d7f799..adabd732ad3a 100644 --- a/tools/testing/vma/include/custom.h +++ b/tools/testing/vma/include/custom.h @@ -21,9 +21,6 @@ extern unsigned long dac_mmap_min_addr; #define VM_BUG_ON(_expr) (BUG_ON(_expr)) #define VM_BUG_ON_VMA(_expr, _vma) (BUG_ON(_expr)) =20 -/* We hardcode this for now. */ -#define sysctl_max_map_count 0x1000000UL - #define TASK_SIZE ((1ul << 47)-PAGE_SIZE) =20 /* diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/du= p.h index 5eb313beb43d..8865ffe046d8 100644 --- a/tools/testing/vma/include/dup.h +++ b/tools/testing/vma/include/dup.h @@ -419,6 +419,9 @@ struct vma_iterator { =20 #define EMPTY_VMA_FLAGS ((vma_flags_t){ }) =20 +#define MAPCOUNT_ELF_CORE_MARGIN (5) +#define DEFAULT_MAX_MAP_COUNT (USHRT_MAX - MAPCOUNT_ELF_CORE_MARGIN) + /* What action should be taken after an .mmap_prepare call is complete? */ enum mmap_action_type { MMAP_NOTHING, /* Mapping is complete, no further action. */ @@ -1342,3 +1345,9 @@ static inline void vma_set_file(struct vm_area_struct= *vma, struct file *file) swap(vma->vm_file, file); fput(file); } + +extern int sysctl_max_map_count; +static inline int get_sysctl_max_map_count(void) +{ + return READ_ONCE(sysctl_max_map_count); +} diff --git a/tools/testing/vma/main.c b/tools/testing/vma/main.c index 49b09e97a51f..18338f5d29e0 100644 --- a/tools/testing/vma/main.c +++ b/tools/testing/vma/main.c @@ -14,6 +14,8 @@ #include "tests/mmap.c" #include "tests/vma.c" =20 +int sysctl_max_map_count __read_mostly =3D DEFAULT_MAX_MAP_COUNT; + /* Helper functions which utilise static kernel functions. */ =20 struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg) --=20 2.53.0 From nobody Tue Apr 7 20:06:14 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F40CC3E3D93 for ; Wed, 11 Mar 2026 17:26:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773249964; cv=none; b=l8vIkxqrLoeaMACWbYCgmzSuSQFsAHh2CJX0YEnvIR6jgxg1o6quxrSJD0z/wBIOOhGYGXgvsiUpFa67ecYlgbv0OocGFJv0x/kMqFAmJtHGcZQXdAqm05HQG/IvXq0ZQVrlOQnWSHenZ/XY20cMcht1LWl3LWLH7DQWNRc6A7Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773249964; c=relaxed/simple; bh=WuYiqZxOOUSTMj+uHG8hKRHfxCYiMKZgeLbdCYDg2b0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DnauUF1b1K4RmSX+qaG4wgPiaKEb1K2aGVQMcBbjU0MdN01XoXbRqhmq0Eqbe8y3cMlYNLVxYe4nh/NdpQQdzC4s1A++24qWDnVopn6E0VowJYx5I/TAUmT6BmUVhsIfsDjb2dbLPk2+9cW/ELIhjQKUVTOoYjmEgYubF419KVA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=e/YBSoHr; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="e/YBSoHr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 770FEC19424; Wed, 11 Mar 2026 17:26:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773249963; bh=WuYiqZxOOUSTMj+uHG8hKRHfxCYiMKZgeLbdCYDg2b0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=e/YBSoHrZXvKVCaNm1I2qkdjbYiDv4/SkTfvR+P+e9LZNC4PqrJ6Xu7bpJYGnVgHR nUGgMnFVOBXtq6x6+J89y7wKr/xkxL4yK0Gl09zt9Zoak12vspkZab+XovBojPoWFu TCtlfuq02+d9qr/N4n22xYFCN4asrtUGDWMcTFtpZdl3jUMHV5Ksq280e83UussN6U seaex4c1N6UM2Ztsyn64QYmrfKccf3GcdVQ+U/4+6CApe1YjaMwpw2bEJstyn2+1ZU PsuE37hUMKVQch+kCocRB2YDNXuTdgV/VWz9P+tnO4LgtduJOxX6ZJx8XANv0VTpb4 V8A4+pjMgzobA== From: "Lorenzo Stoakes (Oracle)" To: Andrew Morton Cc: "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Jann Horn , Pedro Falcato , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jianzhou Zhao , Oscar Salvador Subject: [PATCH 3/3] mm/mremap: check map count under mmap write lock and abstract Date: Wed, 11 Mar 2026 17:24:38 +0000 Message-ID: <18be0b48eaa8e8804eb745974ee729c3ade0c687.1773249037.git.ljs@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We are checking the mmap count in check_mremap_params(), prior to obtaining an mmap write lock, which means that accesses to current->mm->map_count might race with this field being updated. Resolve this by only checking this field after the mmap write lock is held. Additionally, abstract this check into a helper function with extensive ASCII documentation of what's going on. Reported-by: Jianzhou Zhao Closes: https://lore.kernel.org/all/1a7d4c26.6b46.19cdbe7eaf0.Coremail.luck= d0g@163.com/ Signed-off-by: Lorenzo Stoakes (Oracle) Reviewed-by: Pedro Falcato --- mm/mremap.c | 88 +++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 75 insertions(+), 13 deletions(-) diff --git a/mm/mremap.c b/mm/mremap.c index ba6c690f6c1b..ee46bbb031e6 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -1028,6 +1028,75 @@ static void vrm_stat_account(struct vma_remap_struct= *vrm, mm->locked_vm +=3D pages; } =20 +static bool __check_map_count_against_split(struct mm_struct *mm, + bool before_unmaps) +{ + const int sys_map_count =3D get_sysctl_max_map_count(); + int map_count =3D mm->map_count; + + mmap_assert_write_locked(mm); + + /* + * At the point of shrinking the VMA, if new_len < old_len, we unmap + * thusly in the worst case: + * + * old_addr+old_len old_addr+old_len + * |---------------.----.---------| |---------------| |---------| + * | . . | -> | +1 | -1 | +1 | + * |---------------.----.---------| |---------------| |---------| + * old_addr+new_len old_addr+new_len + * + * At the point of removing the portion of an existing VMA to make space + * for the moved VMA if MREMAP_FIXED, we unmap thusly in the worst case: + * + * new_addr new_addr+new_len new_addr new_addr+new_len + * |----.---------------.---------| |----| |---------| + * | . . | -> | +1 | -1 | +1 | + * |----.---------------.---------| |----| |---------| + * + * Therefore, before we consider the move anything, we have to account + * for 2 additional VMAs possibly being created upon these unmappings. + */ + if (before_unmaps) + map_count +=3D 2; + + /* + * At the point of MOVING the VMA: + * + * We start by copying a VMA, which creates an additional VMA if no + * merge occurs, then if not MREMAP_DONTUNMAP, we unmap the source VMA. + * In the worst case we might then observe: + * + * new_addr new_addr+new_len new_addr new_addr+new_len + * |----| |---------| |----|---------------|---------| + * | | | | -> | | +1 | | + * |----| |---------| |----|---------------|---------| + * + * old_addr old_addr+old_len old_addr old_addr+old_len + * |----.---------------.---------| |----| |---------| + * | . . | -> | +1 | -1 | +1 | + * |----.---------------.---------| |----| |---------| + * + * Therefore we must check to ensure we have headroom of 2 additional + * VMAs. + */ + return map_count + 2 <=3D sys_map_count; +} + +/* Do we violate the map count limit if we split VMAs when moving the VMA?= */ +static bool check_map_count_against_split(void) +{ + return __check_map_count_against_split(current->mm, + /*before_unmaps=3D*/false); +} + +/* Do we violate the map count limit if we split VMAs prior to early unmap= s? */ +static bool check_map_count_against_split_early(void) +{ + return __check_map_count_against_split(current->mm, + /*before_unmaps=3D*/true); +} + /* * Perform checks before attempting to write a VMA prior to it being * moved. @@ -1045,7 +1114,7 @@ static unsigned long prep_move_vma(struct vma_remap_s= truct *vrm) * which may not merge, then (if MREMAP_DONTUNMAP is not set) unmap the * source, which may split, causing a net increase of 2 mappings. */ - if (current->mm->map_count + 2 > get_sysctl_max_map_count()) + if (!check_map_count_against_split()) return -ENOMEM; =20 if (vma->vm_ops && vma->vm_ops->may_split) { @@ -1804,18 +1873,6 @@ static unsigned long check_mremap_params(struct vma_= remap_struct *vrm) if (vrm_overlaps(vrm)) return -EINVAL; =20 - /* - * We may unmap twice before invoking move_vma(), that is if new_len < - * old_len (shrinking), and in the MREMAP_FIXED case, unmapping part of - * a VMA located at the destination. - * - * In the worst case, both unmappings will cause splits, resulting in a - * net increased map count of 2. In move_vma() we check for headroom of - * 2 additional mappings, so check early to avoid bailing out then. - */ - if (current->mm->map_count + 4 > get_sysctl_max_map_count()) - return -ENOMEM; - return 0; } =20 @@ -1925,6 +1982,11 @@ static unsigned long do_mremap(struct vma_remap_stru= ct *vrm) return -EINTR; vrm->mmap_locked =3D true; =20 + if (!check_map_count_against_split_early()) { + mmap_write_unlock(mm); + return -ENOMEM; + } + if (vrm_move_only(vrm)) { res =3D remap_move(vrm); } else { --=20 2.53.0