From nobody Fri Apr 19 01:32:27 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06696C761A6 for ; Sun, 19 Mar 2023 00:25:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230267AbjCSAZm (ORCPT ); Sat, 18 Mar 2023 20:25:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230443AbjCSAYW (ORCPT ); Sat, 18 Mar 2023 20:24:22 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F2E22A6E8; Sat, 18 Mar 2023 17:21:38 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id y14so7381832wrq.4; Sat, 18 Mar 2023 17:21:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679185217; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=csP2U+h4A4N+JyUsV24C6SnnqlSeb33JonwK9o1duFk=; b=nC6RYf6jBdzIKQ4o3vm7PvlkCYMlOb1FdP26eTaA7XVeOCh6KBdfA7t8A9kV0DJyUu Zla8U4rG4p3JsUlcTbj1hALd7X+zuQQyHPo31R1hXSlO6RI1L+zQQ6nLiJl+1/Xd7U7U apPCOgsxAoHSLFM0qPvVpo7u61XC0kmIRa5pxuaYfocJcUN0AOST09bBvQuEcJGH8esh Ib/zsOkFPFJkM6s012r+iEmCzFsVMHBLJiz8a/qppPITwzE0psbE/fxPchSR/bE9d1mw OfIHu0q2z+2gyGNfQdqreV8GYmkXjgOOuUaUsHA43nqXVPlwr56UszPzKQAefMoPfCHJ 5Cqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679185217; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=csP2U+h4A4N+JyUsV24C6SnnqlSeb33JonwK9o1duFk=; b=vmsTJjT1h1n0x8CuDFUASxp1SZXpyChGvQvj4UemXTxwc05NpHU+h0XM70NNuOe77q mzZSsaOYaRaNQEgIj1udTHyA6XPibJHNKjCbiVG+sda/dni62mBdy94HWJ0wlcyD2tFi dIzO6G2SsLVCbR4pvPLU3PKgNg8N4VxflmXuRofdHsO7GLmKgcdBPqikGv6uv99CmKpP YqEztcmAFh4g2QKFnYMyYTlQjAbCvUNumelo+Q7iT+b5bwDFDcjaEVNDmMrjFKOot2xf sz1RUYAhU+pfspUyecmtr/ce/0u909DEV7i++h4dnoR/CQmtsFMAUIRhjkvk4cQabP7i z74w== X-Gm-Message-State: AO0yUKUirSlZLywteyOnfhJdifQIT18hIzns1/GfuB05V6vyxcxbNXqD Zgiqyz+nFqAwYgK5QGyIfR8= X-Google-Smtp-Source: AK7set+dDEGx+X3Pskgz3ooddARr7XyBMTyRwnPZALsSabKkf9CNmLvTCoIx1rhzU9tGuRxE4a3syA== X-Received: by 2002:a5d:54c1:0:b0:2d3:fba4:e61d with SMTP id x1-20020a5d54c1000000b002d3fba4e61dmr4877179wrv.12.1679185217216; Sat, 18 Mar 2023 17:20:17 -0700 (PDT) Received: from lucifer.home (host86-146-209-214.range86-146.btcentralplus.com. [86.146.209.214]) by smtp.googlemail.com with ESMTPSA id x14-20020adfdd8e000000b002cff0c57b98sm5399639wrl.18.2023.03.18.17.20.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 18 Mar 2023 17:20:16 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton Cc: Baoquan He , Uladzislau Rezki , Matthew Wilcox , David Hildenbrand , Liu Shixin , Jiri Olsa , Lorenzo Stoakes Subject: [PATCH 1/4] fs/proc/kcore: Avoid bounce buffer for ktext data Date: Sun, 19 Mar 2023 00:20:09 +0000 Message-Id: <2ed992d6604965fd9eea05fed4473ddf54540989.1679183626.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Commit df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data") introduced the use of a bounce buffer to retrieve kernel text data for /proc/kcore in order to avoid failures arising from hardened user copies enabled by CONFIG_HARDENED_USERCOPY in check_kernel_text_object(). We can avoid doing this if instead of copy_to_user() we use _copy_to_user() which bypasses the hardening check. This is more efficient than using a bounce buffer and simplifies the code. We do so as part an overall effort to eliminate bounce buffer usage in the function with an eye to converting it an iterator read. Signed-off-by: Lorenzo Stoakes --- fs/proc/kcore.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 71157ee35c1a..556f310d6aa4 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -541,19 +541,12 @@ read_kcore(struct file *file, char __user *buffer, si= ze_t buflen, loff_t *fpos) case KCORE_VMEMMAP: case KCORE_TEXT: /* - * Using bounce buffer to bypass the - * hardened user copy kernel text checks. + * We use _copy_to_user() to bypass usermode hardening + * which would otherwise prevent this operation. */ - if (copy_from_kernel_nofault(buf, (void *)start, tsz)) { - if (clear_user(buffer, tsz)) { - ret =3D -EFAULT; - goto out; - } - } else { - if (copy_to_user(buffer, buf, tsz)) { - ret =3D -EFAULT; - goto out; - } + if (_copy_to_user(buffer, (char *)start, tsz)) { + ret =3D -EFAULT; + goto out; } break; default: --=20 2.39.2 From nobody Fri Apr 19 01:32:27 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 242F2C74A5B for ; Sun, 19 Mar 2023 00:25:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230296AbjCSAZp (ORCPT ); Sat, 18 Mar 2023 20:25:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230306AbjCSAYl (ORCPT ); Sat, 18 Mar 2023 20:24:41 -0400 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFB98C161; Sat, 18 Mar 2023 17:21:53 -0700 (PDT) Received: by mail-wr1-x435.google.com with SMTP id v16so7408472wrn.0; Sat, 18 Mar 2023 17:21:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679185218; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=V8uXWzHaUDgnMR3zVzK4ll6XJwTMNIlCa4zu/w/8SY0=; b=BIGAFMU+LM58A19CRYns7kudOoBi/HrNebk/7eL6lz1wzLpeTtZe13tBuzDTR0ocEH 4Piwt9StM4RAAIJ5mkuuwccSNfIPlvjLCkP3BSqrApZwTXhHoPOiuEcsvd+0ZYAtw0Gd KSlzj8h4p6+BTNW3jcdrnry2D85k7wVkh4ZVb1K5O/DFQ2mp+nPbj/Ia0f7t9Ws49+Yl PWou5LVOSkGFPj6zE5kMDDO8XOiggLFV+saVeOmrT+j47nvpI/eBYKzMD6PnPAydRvEF WP5SPYriCXkvUTilNFaw/KB8EobK3C8+BYNpFknl8cU6Dcg5Bgn4/fH8V0qLdCcSt+Ol oP7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679185218; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=V8uXWzHaUDgnMR3zVzK4ll6XJwTMNIlCa4zu/w/8SY0=; b=rwt2DjuYhL1e3NrfvqJTfZHeS9wZA8laMeTdHY9nXxSNeOLPXn7ch8UAs7oT8hSvbM vRV7n3S6Uj0sdvjj3D1MNGznHo/+CUTc+p5t+EDwizSuImpLnTSV8dmEDMOUwW9ze6ap ineXbyCUDGEokWtzsn1NIaKBuHiVQ3toD263niJjhdyFm5aCWZ1Aap/ks7moWeKGSJ7O A8L4vjbHrtk/3uiraTVBrO1jkQjJywhWstXqGteDfnTM+ipPsT+MYotqHWOA4RjJmRvW R9Gy7tIc0BLU/mMye6CozyatLQ8xdATnRSJQYCWHm6VwL35H+wwJTFjHrG+wVpoiw5a9 3aKA== X-Gm-Message-State: AO0yUKXsYnrhDT/1CD0cIBS4lDRRCWrvrv/J695K6cN1YyfyXoK7jEYv o37gC9ESk18PQhkjISSFqFk= X-Google-Smtp-Source: AK7set88Id+IUbuwMnBgQ9I23eA+dbbFAnqxLKiDglrp/xFZfCeV1KXHlD/ma2eGgGL4GBuHx1xvUw== X-Received: by 2002:a5d:624e:0:b0:2d0:33aa:26db with SMTP id m14-20020a5d624e000000b002d033aa26dbmr10479638wrv.56.1679185218406; Sat, 18 Mar 2023 17:20:18 -0700 (PDT) Received: from lucifer.home (host86-146-209-214.range86-146.btcentralplus.com. [86.146.209.214]) by smtp.googlemail.com with ESMTPSA id x14-20020adfdd8e000000b002cff0c57b98sm5399639wrl.18.2023.03.18.17.20.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 18 Mar 2023 17:20:17 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton Cc: Baoquan He , Uladzislau Rezki , Matthew Wilcox , David Hildenbrand , Liu Shixin , Jiri Olsa , Lorenzo Stoakes Subject: [PATCH 2/4] mm: vmalloc: use rwsem, mutex for vmap_area_lock and vmap_block->lock Date: Sun, 19 Mar 2023 00:20:10 +0000 Message-Id: <6c7f1ac0aeb55faaa46a09108d3999e4595870d9.1679183626.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" vmalloc() is, by design, not permitted to be used in atomic context and already contains components which may sleep, so avoiding spin locks is not a problem from the perspective of atomic context. The global vmap_area_lock is held when the red/black tree rooted in vmap_are_root is accessed and thus is rather long-held and under potentially high contention. It is likely to be under contention for reads rather than write, so replace it with a rwsem. Each individual vmap_block->lock is likely to be held for less time but under low contention, so a mutex is not an outrageous choice here. A subset of test_vmalloc.sh performance results:- fix_size_alloc_test 0.40% full_fit_alloc_test 2.08% long_busy_list_alloc_test 0.34% random_size_alloc_test -0.25% random_size_align_alloc_test 0.06% ... all tests cycles 0.2% This represents a tiny reduction in performance that sits barely above noise. The reason for making this change is to build a basis for vread() to be usable asynchronously, this eliminating the need for a bounce buffer when copying data to userland in read_kcore() and allowing that to be converted to an iterator form. Signed-off-by: Lorenzo Stoakes --- mm/vmalloc.c | 77 +++++++++++++++++++++++++++------------------------- 1 file changed, 40 insertions(+), 37 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 978194dc2bb8..c24b27664a97 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include =20 @@ -725,7 +726,7 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define DEBUG_AUGMENT_LOWEST_MATCH_CHECK 0 =20 =20 -static DEFINE_SPINLOCK(vmap_area_lock); +static DECLARE_RWSEM(vmap_area_lock); static DEFINE_SPINLOCK(free_vmap_area_lock); /* Export for kexec only */ LIST_HEAD(vmap_area_list); @@ -1537,9 +1538,9 @@ static void free_vmap_area(struct vmap_area *va) /* * Remove from the busy tree/list. */ - spin_lock(&vmap_area_lock); + down_write(&vmap_area_lock); unlink_va(va, &vmap_area_root); - spin_unlock(&vmap_area_lock); + up_write(&vmap_area_lock); =20 /* * Insert/Merge it back to the free tree/list. @@ -1627,9 +1628,9 @@ static struct vmap_area *alloc_vmap_area(unsigned lon= g size, va->vm =3D NULL; va->flags =3D va_flags; =20 - spin_lock(&vmap_area_lock); + down_write(&vmap_area_lock); insert_vmap_area(va, &vmap_area_root, &vmap_area_list); - spin_unlock(&vmap_area_lock); + up_write(&vmap_area_lock); =20 BUG_ON(!IS_ALIGNED(va->va_start, align)); BUG_ON(va->va_start < vstart); @@ -1854,9 +1855,9 @@ struct vmap_area *find_vmap_area(unsigned long addr) { struct vmap_area *va; =20 - spin_lock(&vmap_area_lock); + down_read(&vmap_area_lock); va =3D __find_vmap_area(addr, &vmap_area_root); - spin_unlock(&vmap_area_lock); + up_read(&vmap_area_lock); =20 return va; } @@ -1865,11 +1866,11 @@ static struct vmap_area *find_unlink_vmap_area(unsi= gned long addr) { struct vmap_area *va; =20 - spin_lock(&vmap_area_lock); + down_write(&vmap_area_lock); va =3D __find_vmap_area(addr, &vmap_area_root); if (va) unlink_va(va, &vmap_area_root); - spin_unlock(&vmap_area_lock); + up_write(&vmap_area_lock); =20 return va; } @@ -1914,7 +1915,7 @@ struct vmap_block_queue { }; =20 struct vmap_block { - spinlock_t lock; + struct mutex lock; struct vmap_area *va; unsigned long free, dirty; DECLARE_BITMAP(used_map, VMAP_BBMAP_BITS); @@ -1991,7 +1992,7 @@ static void *new_vmap_block(unsigned int order, gfp_t= gfp_mask) } =20 vaddr =3D vmap_block_vaddr(va->va_start, 0); - spin_lock_init(&vb->lock); + mutex_init(&vb->lock); vb->va =3D va; /* At least something should be left free */ BUG_ON(VMAP_BBMAP_BITS <=3D (1UL << order)); @@ -2026,9 +2027,9 @@ static void free_vmap_block(struct vmap_block *vb) tmp =3D xa_erase(&vmap_blocks, addr_to_vb_idx(vb->va->va_start)); BUG_ON(tmp !=3D vb); =20 - spin_lock(&vmap_area_lock); + down_write(&vmap_area_lock); unlink_va(vb->va, &vmap_area_root); - spin_unlock(&vmap_area_lock); + up_write(&vmap_area_lock); =20 free_vmap_area_noflush(vb->va); kfree_rcu(vb, rcu_head); @@ -2047,7 +2048,7 @@ static void purge_fragmented_blocks(int cpu) if (!(vb->free + vb->dirty =3D=3D VMAP_BBMAP_BITS && vb->dirty !=3D VMAP= _BBMAP_BITS)) continue; =20 - spin_lock(&vb->lock); + mutex_lock(&vb->lock); if (vb->free + vb->dirty =3D=3D VMAP_BBMAP_BITS && vb->dirty !=3D VMAP_B= BMAP_BITS) { vb->free =3D 0; /* prevent further allocs after releasing lock */ vb->dirty =3D VMAP_BBMAP_BITS; /* prevent purging it again */ @@ -2056,10 +2057,10 @@ static void purge_fragmented_blocks(int cpu) spin_lock(&vbq->lock); list_del_rcu(&vb->free_list); spin_unlock(&vbq->lock); - spin_unlock(&vb->lock); + mutex_unlock(&vb->lock); list_add_tail(&vb->purge, &purge); } else - spin_unlock(&vb->lock); + mutex_unlock(&vb->lock); } rcu_read_unlock(); =20 @@ -2101,9 +2102,9 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_m= ask) list_for_each_entry_rcu(vb, &vbq->free, free_list) { unsigned long pages_off; =20 - spin_lock(&vb->lock); + mutex_lock(&vb->lock); if (vb->free < (1UL << order)) { - spin_unlock(&vb->lock); + mutex_unlock(&vb->lock); continue; } =20 @@ -2117,7 +2118,7 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_m= ask) spin_unlock(&vbq->lock); } =20 - spin_unlock(&vb->lock); + mutex_unlock(&vb->lock); break; } =20 @@ -2144,16 +2145,16 @@ static void vb_free(unsigned long addr, unsigned lo= ng size) order =3D get_order(size); offset =3D (addr & (VMAP_BLOCK_SIZE - 1)) >> PAGE_SHIFT; vb =3D xa_load(&vmap_blocks, addr_to_vb_idx(addr)); - spin_lock(&vb->lock); + mutex_lock(&vb->lock); bitmap_clear(vb->used_map, offset, (1UL << order)); - spin_unlock(&vb->lock); + mutex_unlock(&vb->lock); =20 vunmap_range_noflush(addr, addr + size); =20 if (debug_pagealloc_enabled_static()) flush_tlb_kernel_range(addr, addr + size); =20 - spin_lock(&vb->lock); + mutex_lock(&vb->lock); =20 /* Expand dirty range */ vb->dirty_min =3D min(vb->dirty_min, offset); @@ -2162,10 +2163,10 @@ static void vb_free(unsigned long addr, unsigned lo= ng size) vb->dirty +=3D 1UL << order; if (vb->dirty =3D=3D VMAP_BBMAP_BITS) { BUG_ON(vb->free); - spin_unlock(&vb->lock); + mutex_unlock(&vb->lock); free_vmap_block(vb); } else - spin_unlock(&vb->lock); + mutex_unlock(&vb->lock); } =20 static void _vm_unmap_aliases(unsigned long start, unsigned long end, int = flush) @@ -2183,7 +2184,7 @@ static void _vm_unmap_aliases(unsigned long start, un= signed long end, int flush) =20 rcu_read_lock(); list_for_each_entry_rcu(vb, &vbq->free, free_list) { - spin_lock(&vb->lock); + mutex_lock(&vb->lock); if (vb->dirty && vb->dirty !=3D VMAP_BBMAP_BITS) { unsigned long va_start =3D vb->va->va_start; unsigned long s, e; @@ -2196,7 +2197,7 @@ static void _vm_unmap_aliases(unsigned long start, un= signed long end, int flush) =20 flush =3D 1; } - spin_unlock(&vb->lock); + mutex_unlock(&vb->lock); } rcu_read_unlock(); } @@ -2451,9 +2452,9 @@ static inline void setup_vmalloc_vm_locked(struct vm_= struct *vm, static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va, unsigned long flags, const void *caller) { - spin_lock(&vmap_area_lock); + down_write(&vmap_area_lock); setup_vmalloc_vm_locked(vm, va, flags, caller); - spin_unlock(&vmap_area_lock); + up_write(&vmap_area_lock); } =20 static void clear_vm_uninitialized_flag(struct vm_struct *vm) @@ -3507,9 +3508,9 @@ static void vmap_ram_vread(char *buf, char *addr, int= count, unsigned long flags if (!vb) goto finished; =20 - spin_lock(&vb->lock); + mutex_lock(&vb->lock); if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) { - spin_unlock(&vb->lock); + mutex_unlock(&vb->lock); goto finished; } for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) { @@ -3536,7 +3537,7 @@ static void vmap_ram_vread(char *buf, char *addr, int= count, unsigned long flags count -=3D n; } unlock: - spin_unlock(&vb->lock); + mutex_unlock(&vb->lock); =20 finished: /* zero-fill the left dirty or free regions */ @@ -3576,13 +3577,15 @@ long vread(char *buf, char *addr, unsigned long cou= nt) unsigned long buflen =3D count; unsigned long n, size, flags; =20 + might_sleep(); + addr =3D kasan_reset_tag(addr); =20 /* Don't allow overflow */ if ((unsigned long) addr + count < count) count =3D -(unsigned long) addr; =20 - spin_lock(&vmap_area_lock); + down_read(&vmap_area_lock); va =3D find_vmap_area_exceed_addr((unsigned long)addr); if (!va) goto finished; @@ -3639,7 +3642,7 @@ long vread(char *buf, char *addr, unsigned long count) count -=3D n; } finished: - spin_unlock(&vmap_area_lock); + up_read(&vmap_area_lock); =20 if (buf =3D=3D buf_start) return 0; @@ -3980,14 +3983,14 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned= long *offsets, } =20 /* insert all vm's */ - spin_lock(&vmap_area_lock); + down_write(&vmap_area_lock); for (area =3D 0; area < nr_vms; area++) { insert_vmap_area(vas[area], &vmap_area_root, &vmap_area_list); =20 setup_vmalloc_vm_locked(vms[area], vas[area], VM_ALLOC, pcpu_get_vm_areas); } - spin_unlock(&vmap_area_lock); + up_write(&vmap_area_lock); =20 /* * Mark allocated areas as accessible. Do it now as a best-effort @@ -4114,7 +4117,7 @@ static void *s_start(struct seq_file *m, loff_t *pos) __acquires(&vmap_area_lock) { mutex_lock(&vmap_purge_lock); - spin_lock(&vmap_area_lock); + down_read(&vmap_area_lock); =20 return seq_list_start(&vmap_area_list, *pos); } @@ -4128,7 +4131,7 @@ static void s_stop(struct seq_file *m, void *p) __releases(&vmap_area_lock) __releases(&vmap_purge_lock) { - spin_unlock(&vmap_area_lock); + up_read(&vmap_area_lock); mutex_unlock(&vmap_purge_lock); } =20 --=20 2.39.2 From nobody Fri Apr 19 01:32:27 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F8D7C74A5B for ; Sun, 19 Mar 2023 00:25:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230180AbjCSAZi (ORCPT ); Sat, 18 Mar 2023 20:25:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46114 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230317AbjCSAYm (ORCPT ); Sat, 18 Mar 2023 20:24:42 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7EA3D2C67F; Sat, 18 Mar 2023 17:21:54 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id i9so7385662wrp.3; Sat, 18 Mar 2023 17:21:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679185220; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ML/fjqBQiFdniaLp2U4iIoVxIHX7dLvD1UO1EbWAfgg=; b=n3/WQOG/ih4PqFSwkZOumfiFVTaTsKhtI63BBDGI+/vv/zPkXwXwDN0ApMTMTk8SYB ZMBRckYTo15ECbqaHnrBo48DrtOZUTNYDSw8c2T1sKBr5u2pwlk3r9mfSp26dS+X7GiD hT0ZMmPNSr5XADXvConvouz1CmLldien5XgSGZJMS+0yZw8l6uxPhbwG5PXY7KrqrM5C pMxQhA41a0VwFVargL4lvGa73JGXOiX13ei56eMROD8jAJ7I7o5+7YinueUteFCHDPXM 230nGcCMkJr4sHIdUsxmxEX2nkRrEyhgq3V6csMtQPw4EcM45xMSJc3lLbFfE8xRTgJJ bZCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679185220; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ML/fjqBQiFdniaLp2U4iIoVxIHX7dLvD1UO1EbWAfgg=; b=ccR0Go4jV2kgjMuaF5K86GP9semjSbCJLcLD253xGCvgUZApABBgcNpyptCDnxq/b1 nJeiw9RgTNabLQI6doet3ERsI4Dhm+RApIAXU7rB7ImRgJH2k54vaLgDFkzGbGduaHvz 5fm+2DyJq5Q92gEkS+NzCs96bmZPqJudLvJ/Yy3458M9CHt8iDz2s/UxDXzL/mV2l9hE bmXerK7xL0MJwPSvB5IDLkptKL9yog6WSx3fXmHu+7b6+5K9k5UaDY0FlUq6By9GyQwp B1amcmGH2m1czGu8sRkBbpkh+HIaZaiJXYRXyI4wz/JosdW/yfRjaN8N52KqazEuFBGq lLOA== X-Gm-Message-State: AO0yUKVEGpnDIBgcdP5ly/w0T2e9o9LlVAnNi90cXnN3A5cvzXQgHXFF +/KuyV77on1ckuya4HONqK1JBr3DtU0= X-Google-Smtp-Source: AK7set95d6yiRVYGBDcNsU8nn/6kRPj65SWvFG+EDet5fLk/OpQVCKm3COqIdxAlwCybBefp9ljiDQ== X-Received: by 2002:a5d:54c1:0:b0:2d3:fba4:e61d with SMTP id x1-20020a5d54c1000000b002d3fba4e61dmr4877232wrv.12.1679185220032; Sat, 18 Mar 2023 17:20:20 -0700 (PDT) Received: from lucifer.home (host86-146-209-214.range86-146.btcentralplus.com. [86.146.209.214]) by smtp.googlemail.com with ESMTPSA id x14-20020adfdd8e000000b002cff0c57b98sm5399639wrl.18.2023.03.18.17.20.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 18 Mar 2023 17:20:19 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton Cc: Baoquan He , Uladzislau Rezki , Matthew Wilcox , David Hildenbrand , Liu Shixin , Jiri Olsa , Lorenzo Stoakes Subject: [PATCH 3/4] fs/proc/kcore: convert read_kcore() to read_kcore_iter() Date: Sun, 19 Mar 2023 00:20:11 +0000 Message-Id: <32f8fad50500d0cd0927a66638c5890533725d30.1679183626.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Now we have eliminated spinlocks from the vread() case, convert read_kcore() to read_kcore_iter(). For the time being we still use a bounce buffer for vread(), however in the next patch we will convert this to interact directly with the iterator and eliminate the bounce buffer altogether. Signed-off-by: Lorenzo Stoakes --- fs/proc/kcore.c | 58 ++++++++++++++++++++++++------------------------- 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 556f310d6aa4..25e0eeb8d498 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -24,7 +24,7 @@ #include #include #include -#include +#include #include #include #include @@ -308,9 +308,12 @@ static void append_kcore_note(char *notes, size_t *i, = const char *name, } =20 static ssize_t -read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *= fpos) +read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) { + struct file *file =3D iocb->ki_filp; char *buf =3D file->private_data; + loff_t *ppos =3D &iocb->ki_pos; + size_t phdrs_offset, notes_offset, data_offset; size_t page_offline_frozen =3D 1; size_t phdrs_len, notes_len; @@ -318,6 +321,7 @@ read_kcore(struct file *file, char __user *buffer, size= _t buflen, loff_t *fpos) size_t tsz; int nphdr; unsigned long start; + size_t buflen =3D iov_iter_count(iter); size_t orig_buflen =3D buflen; int ret =3D 0; =20 @@ -333,7 +337,7 @@ read_kcore(struct file *file, char __user *buffer, size= _t buflen, loff_t *fpos) notes_offset =3D phdrs_offset + phdrs_len; =20 /* ELF file header. */ - if (buflen && *fpos < sizeof(struct elfhdr)) { + if (buflen && *ppos < sizeof(struct elfhdr)) { struct elfhdr ehdr =3D { .e_ident =3D { [EI_MAG0] =3D ELFMAG0, @@ -355,19 +359,18 @@ read_kcore(struct file *file, char __user *buffer, si= ze_t buflen, loff_t *fpos) .e_phnum =3D nphdr, }; =20 - tsz =3D min_t(size_t, buflen, sizeof(struct elfhdr) - *fpos); - if (copy_to_user(buffer, (char *)&ehdr + *fpos, tsz)) { + tsz =3D min_t(size_t, buflen, sizeof(struct elfhdr) - *ppos); + if (copy_to_iter((char *)&ehdr + *ppos, tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } =20 - buffer +=3D tsz; buflen -=3D tsz; - *fpos +=3D tsz; + *ppos +=3D tsz; } =20 /* ELF program headers. */ - if (buflen && *fpos < phdrs_offset + phdrs_len) { + if (buflen && *ppos < phdrs_offset + phdrs_len) { struct elf_phdr *phdrs, *phdr; =20 phdrs =3D kzalloc(phdrs_len, GFP_KERNEL); @@ -397,22 +400,21 @@ read_kcore(struct file *file, char __user *buffer, si= ze_t buflen, loff_t *fpos) phdr++; } =20 - tsz =3D min_t(size_t, buflen, phdrs_offset + phdrs_len - *fpos); - if (copy_to_user(buffer, (char *)phdrs + *fpos - phdrs_offset, - tsz)) { + tsz =3D min_t(size_t, buflen, phdrs_offset + phdrs_len - *ppos); + if (copy_to_iter((char *)phdrs + *ppos - phdrs_offset, tsz, + iter) !=3D tsz) { kfree(phdrs); ret =3D -EFAULT; goto out; } kfree(phdrs); =20 - buffer +=3D tsz; buflen -=3D tsz; - *fpos +=3D tsz; + *ppos +=3D tsz; } =20 /* ELF note segment. */ - if (buflen && *fpos < notes_offset + notes_len) { + if (buflen && *ppos < notes_offset + notes_len) { struct elf_prstatus prstatus =3D {}; struct elf_prpsinfo prpsinfo =3D { .pr_sname =3D 'R', @@ -447,24 +449,23 @@ read_kcore(struct file *file, char __user *buffer, si= ze_t buflen, loff_t *fpos) vmcoreinfo_data, min(vmcoreinfo_size, notes_len - i)); =20 - tsz =3D min_t(size_t, buflen, notes_offset + notes_len - *fpos); - if (copy_to_user(buffer, notes + *fpos - notes_offset, tsz)) { + tsz =3D min_t(size_t, buflen, notes_offset + notes_len - *ppos); + if (copy_to_iter(notes + *ppos - notes_offset, tsz, iter) !=3D tsz) { kfree(notes); ret =3D -EFAULT; goto out; } kfree(notes); =20 - buffer +=3D tsz; buflen -=3D tsz; - *fpos +=3D tsz; + *ppos +=3D tsz; } =20 /* * Check to see if our file offset matches with any of * the addresses in the elf_phdr on our list. */ - start =3D kc_offset_to_vaddr(*fpos - data_offset); + start =3D kc_offset_to_vaddr(*ppos - data_offset); if ((tsz =3D (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen) tsz =3D buflen; =20 @@ -497,7 +498,7 @@ read_kcore(struct file *file, char __user *buffer, size= _t buflen, loff_t *fpos) } =20 if (!m) { - if (clear_user(buffer, tsz)) { + if (iov_iter_zero(tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } @@ -508,14 +509,14 @@ read_kcore(struct file *file, char __user *buffer, si= ze_t buflen, loff_t *fpos) case KCORE_VMALLOC: vread(buf, (char *)start, tsz); /* we have to zero-fill user buffer even if no read */ - if (copy_to_user(buffer, buf, tsz)) { + if (copy_to_iter(buf, tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } break; case KCORE_USER: /* User page is handled prior to normal kernel page: */ - if (copy_to_user(buffer, (char *)start, tsz)) { + if (copy_to_iter((char *)start, tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } @@ -531,7 +532,7 @@ read_kcore(struct file *file, char __user *buffer, size= _t buflen, loff_t *fpos) */ if (!page || PageOffline(page) || is_page_hwpoison(page) || !pfn_is_ram(pfn)) { - if (clear_user(buffer, tsz)) { + if (iov_iter_zero(tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } @@ -541,25 +542,24 @@ read_kcore(struct file *file, char __user *buffer, si= ze_t buflen, loff_t *fpos) case KCORE_VMEMMAP: case KCORE_TEXT: /* - * We use _copy_to_user() to bypass usermode hardening + * We use _copy_to_iter() to bypass usermode hardening * which would otherwise prevent this operation. */ - if (_copy_to_user(buffer, (char *)start, tsz)) { + if (_copy_to_iter((char *)start, tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } break; default: pr_warn_once("Unhandled KCORE type: %d\n", m->type); - if (clear_user(buffer, tsz)) { + if (iov_iter_zero(tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } } skip: buflen -=3D tsz; - *fpos +=3D tsz; - buffer +=3D tsz; + *ppos +=3D tsz; start +=3D tsz; tsz =3D (buflen > PAGE_SIZE ? PAGE_SIZE : buflen); } @@ -603,7 +603,7 @@ static int release_kcore(struct inode *inode, struct fi= le *file) } =20 static const struct proc_ops kcore_proc_ops =3D { - .proc_read =3D read_kcore, + .proc_read_iter =3D read_kcore_iter, .proc_open =3D open_kcore, .proc_release =3D release_kcore, .proc_lseek =3D default_llseek, --=20 2.39.2 From nobody Fri Apr 19 01:32:27 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B37B2C74A5B for ; Sun, 19 Mar 2023 00:25:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230317AbjCSAZs (ORCPT ); Sat, 18 Mar 2023 20:25:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230336AbjCSAYd (ORCPT ); Sat, 18 Mar 2023 20:24:33 -0400 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69F312A17A; Sat, 18 Mar 2023 17:21:45 -0700 (PDT) Received: by mail-wr1-x431.google.com with SMTP id y14so7381892wrq.4; Sat, 18 Mar 2023 17:21:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679185221; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MTgm8eMaEME071JBKDzGLJgbn/ShH65iS57yVvkZwv4=; b=huWPKdm+tRAG9YrVt08wts9oXX1u/L6VFZzCMnd846EM/acbas78Ee3zKgv7qmyDtD bQ4WNX1Lu+YmfrpGKGF9SyFdXzzkh/lHE4Iecb+gC8DONfvV1VWDU6hLr7f8aTtTmN7t HsZ+37pEqNuXSetVcUlvZvK5lC1BPpSHqEtT5J2tC7y9SBf3pcvrpBPKD0wEiJ0KPZiC ECZxUl8khPH3gNwSahF7FMoeapYF2Ooz4nvhyNQPEhGwJDLaneWyb6hY1U7a+bvCC0nr vFI3/O5GQGCBZMjn7KZkrweyNUhrDcliQX2NCWuxFeYw+PB+NuIWEt//techJMcX6FDx cMHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679185221; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MTgm8eMaEME071JBKDzGLJgbn/ShH65iS57yVvkZwv4=; b=jUnRG0+NUetU68lVUW6GIjyl6oI5YY5vWcxAoMZeH2NSqrKKdeO0PmK6BM15eO7dYK lr5fEe3buiFjXMsVjArrMYz8gjAmz8hfvxRgBH6sQCtKAA7R0d3T5fLStLlo5MccQQJq T/My8jdl/vrtxKCAMEo63zDkyjazTkTW/FnRgVUM7KdMNQI3H8V3lVnXFclkORsCk5NV Uqc/FpXeHIibWeu5Vk3k+kIFbc2ZlqYJ+hyo53nOkkDnTHlGDt8vz/FHcCPbEgHZn6DJ n/jncpb22gzWXOZclEhNJFV1UqE00n2OtZ+HkErmutE/GRMQyBOSh8TZTy8/F9TERqWr ZqSQ== X-Gm-Message-State: AO0yUKWYU5mjti0zgdUOB611qSXHeP4dPQ0jFpqe+jgyljRMmv2xW0Mg LKtRRw314KN+3ovnnaUWLbuYfX46T6s= X-Google-Smtp-Source: AK7set/8N5Yaps2ztad7B6HhBHCnmCxkP9fmkUJlMKQ2+v0qjXt1bYYTjp1A7l8qM/wVSHyDxqm1wQ== X-Received: by 2002:adf:fd12:0:b0:2ce:306d:6515 with SMTP id e18-20020adffd12000000b002ce306d6515mr9876348wrr.34.1679185221216; Sat, 18 Mar 2023 17:20:21 -0700 (PDT) Received: from lucifer.home (host86-146-209-214.range86-146.btcentralplus.com. [86.146.209.214]) by smtp.googlemail.com with ESMTPSA id x14-20020adfdd8e000000b002cff0c57b98sm5399639wrl.18.2023.03.18.17.20.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 18 Mar 2023 17:20:20 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton Cc: Baoquan He , Uladzislau Rezki , Matthew Wilcox , David Hildenbrand , Liu Shixin , Jiri Olsa , Lorenzo Stoakes Subject: [PATCH 4/4] mm: vmalloc: convert vread() to vread_iter() Date: Sun, 19 Mar 2023 00:20:12 +0000 Message-Id: <119871ea9507eac7be5d91db38acdb03981e049e.1679183626.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Having previously laid the foundation for converting vread() to an iterator function, pull the trigger and do so. This patch attempts to provide minimal refactoring and to reflect the existing logic as best we can, with the exception of aligned_vread_iter() which drops the use of the deprecated kmap_atomic() in favour of kmap_local_page(). All existing logic to zero portions of memory not read remain and there should be no functional difference other than a performance improvement in /proc/kcore access to vmalloc regions. Now we have discarded with the need for a bounce buffer at all in read_kcore_iter(), we dispense with the one allocated there altogether. Signed-off-by: Lorenzo Stoakes --- fs/proc/kcore.c | 21 +-------- include/linux/vmalloc.h | 3 +- mm/vmalloc.c | 101 +++++++++++++++++++++------------------- 3 files changed, 57 insertions(+), 68 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 25e0eeb8d498..8a07f04c9203 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -307,13 +307,9 @@ static void append_kcore_note(char *notes, size_t *i, = const char *name, *i =3D ALIGN(*i + descsz, 4); } =20 -static ssize_t -read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) +static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) { - struct file *file =3D iocb->ki_filp; - char *buf =3D file->private_data; loff_t *ppos =3D &iocb->ki_pos; - size_t phdrs_offset, notes_offset, data_offset; size_t page_offline_frozen =3D 1; size_t phdrs_len, notes_len; @@ -507,9 +503,7 @@ read_kcore_iter(struct kiocb *iocb, struct iov_iter *it= er) =20 switch (m->type) { case KCORE_VMALLOC: - vread(buf, (char *)start, tsz); - /* we have to zero-fill user buffer even if no read */ - if (copy_to_iter(buf, tsz, iter) !=3D tsz) { + if (vread_iter((char *)start, tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } @@ -582,10 +576,6 @@ static int open_kcore(struct inode *inode, struct file= *filp) if (ret) return ret; =20 - filp->private_data =3D kmalloc(PAGE_SIZE, GFP_KERNEL); - if (!filp->private_data) - return -ENOMEM; - if (kcore_need_update) kcore_update_ram(); if (i_size_read(inode) !=3D proc_root_kcore->size) { @@ -596,16 +586,9 @@ static int open_kcore(struct inode *inode, struct file= *filp) return 0; } =20 -static int release_kcore(struct inode *inode, struct file *file) -{ - kfree(file->private_data); - return 0; -} - static const struct proc_ops kcore_proc_ops =3D { .proc_read_iter =3D read_kcore_iter, .proc_open =3D open_kcore, - .proc_release =3D release_kcore, .proc_lseek =3D default_llseek, }; =20 diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 69250efa03d1..f70ebdf21f22 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -9,6 +9,7 @@ #include /* pgprot_t */ #include #include +#include =20 #include =20 @@ -251,7 +252,7 @@ static inline void set_vm_flush_reset_perms(void *addr) #endif =20 /* for /proc/kcore */ -extern long vread(char *buf, char *addr, unsigned long count); +extern long vread_iter(char *addr, size_t count, struct iov_iter *iter); =20 /* * Internals. Don't use.. diff --git a/mm/vmalloc.c b/mm/vmalloc.c index c24b27664a97..3a32754266dc 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include @@ -3446,20 +3445,20 @@ EXPORT_SYMBOL(vmalloc_32_user); * small helper routine , copy contents to buf from addr. * If the page is not present, fill zero. */ - -static int aligned_vread(char *buf, char *addr, unsigned long count) +static void aligned_vread_iter(char *addr, size_t count, + struct iov_iter *iter) { - struct page *p; - int copied =3D 0; + struct page *page; =20 - while (count) { + while (count > 0) { unsigned long offset, length; + size_t copied =3D 0; =20 offset =3D offset_in_page(addr); length =3D PAGE_SIZE - offset; if (length > count) length =3D count; - p =3D vmalloc_to_page(addr); + page =3D vmalloc_to_page(addr); /* * To do safe access to this _mapped_ area, we need * lock. But adding lock here means that we need to add @@ -3467,23 +3466,24 @@ static int aligned_vread(char *buf, char *addr, uns= igned long count) * interface, rarely used. Instead of that, we'll use * kmap() and get small overhead in this access function. */ - if (p) { + if (page) { /* We can expect USER0 is not used -- see vread() */ - void *map =3D kmap_atomic(p); - memcpy(buf, map + offset, length); - kunmap_atomic(map); - } else - memset(buf, 0, length); + void *map =3D kmap_local_page(page); + + copied =3D copy_to_iter(map + offset, length, iter); + kunmap_local(map); + } + + if (copied < length) + iov_iter_zero(length - copied, iter); =20 addr +=3D length; - buf +=3D length; - copied +=3D length; count -=3D length; } - return copied; } =20 -static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long= flags) +static void vmap_ram_vread_iter(char *addr, int count, unsigned long flags, + struct iov_iter *iter) { char *start; struct vmap_block *vb; @@ -3496,7 +3496,7 @@ static void vmap_ram_vread(char *buf, char *addr, int= count, unsigned long flags * handle it here. */ if (!(flags & VMAP_BLOCK)) { - aligned_vread(buf, addr, count); + aligned_vread_iter(addr, count, iter); return; } =20 @@ -3517,22 +3517,24 @@ static void vmap_ram_vread(char *buf, char *addr, i= nt count, unsigned long flags if (!count) break; start =3D vmap_block_vaddr(vb->va->va_start, rs); - while (addr < start) { + + if (addr < start) { + size_t to_zero =3D min_t(size_t, start - addr, count); + + iov_iter_zero(to_zero, iter); + addr +=3D to_zero; + count -=3D (int)to_zero; if (count =3D=3D 0) goto unlock; - *buf =3D '\0'; - buf++; - addr++; - count--; } + /*it could start reading from the middle of used region*/ offset =3D offset_in_page(addr); n =3D ((re - rs + 1) << PAGE_SHIFT) - offset; if (n > count) n =3D count; - aligned_vread(buf, start+offset, n); + aligned_vread_iter(start + offset, n, iter); =20 - buf +=3D n; addr +=3D n; count -=3D n; } @@ -3541,15 +3543,15 @@ static void vmap_ram_vread(char *buf, char *addr, i= nt count, unsigned long flags =20 finished: /* zero-fill the left dirty or free regions */ - if (count) - memset(buf, 0, count); + if (count > 0) + iov_iter_zero(count, iter); } =20 /** - * vread() - read vmalloc area in a safe way. - * @buf: buffer for reading data - * @addr: vm address. - * @count: number of bytes to be read. + * vread_iter() - read vmalloc area in a safe way to an iterator. + * @addr: vm address. + * @count: number of bytes to be read. + * @iter: the iterator to which data should be written. * * This function checks that addr is a valid vmalloc'ed area, and * copy data from that area to a given buffer. If the given memory range @@ -3569,13 +3571,13 @@ static void vmap_ram_vread(char *buf, char *addr, i= nt count, unsigned long flags * (same number as @count) or %0 if [addr...addr+count) doesn't * include any intersection with valid vmalloc area */ -long vread(char *buf, char *addr, unsigned long count) +long vread_iter(char *addr, size_t count, struct iov_iter *iter) { struct vmap_area *va; struct vm_struct *vm; - char *vaddr, *buf_start =3D buf; - unsigned long buflen =3D count; - unsigned long n, size, flags; + char *vaddr; + size_t buflen =3D count; + size_t n, size, flags; =20 might_sleep(); =20 @@ -3595,7 +3597,7 @@ long vread(char *buf, char *addr, unsigned long count) goto finished; =20 list_for_each_entry_from(va, &vmap_area_list, list) { - if (!count) + if (count =3D=3D 0) break; =20 vm =3D va->vm; @@ -3619,36 +3621,39 @@ long vread(char *buf, char *addr, unsigned long cou= nt) =20 if (addr >=3D vaddr + size) continue; - while (addr < vaddr) { + + if (addr < vaddr) { + size_t to_zero =3D min_t(size_t, vaddr - addr, count); + + iov_iter_zero(to_zero, iter); + addr +=3D to_zero; + count -=3D to_zero; if (count =3D=3D 0) goto finished; - *buf =3D '\0'; - buf++; - addr++; - count--; } + n =3D vaddr + size - addr; if (n > count) n =3D count; =20 if (flags & VMAP_RAM) - vmap_ram_vread(buf, addr, n, flags); + vmap_ram_vread_iter(addr, n, flags, iter); else if (!(vm->flags & VM_IOREMAP)) - aligned_vread(buf, addr, n); + aligned_vread_iter(addr, n, iter); else /* IOREMAP area is treated as memory hole */ - memset(buf, 0, n); - buf +=3D n; + iov_iter_zero(n, iter); + addr +=3D n; count -=3D n; } finished: up_read(&vmap_area_lock); =20 - if (buf =3D=3D buf_start) + if (count =3D=3D buflen) return 0; /* zero-fill memory holes */ - if (buf !=3D buf_start + buflen) - memset(buf, 0, buflen - (buf - buf_start)); + if (count > 0) + iov_iter_zero(count, iter); =20 return buflen; } --=20 2.39.2