From nobody Sat May 11 08:49:00 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5200BC6FD1D for ; Thu, 23 Mar 2023 10:15:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231320AbjCWKPd (ORCPT ); Thu, 23 Mar 2023 06:15:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231226AbjCWKP0 (ORCPT ); Thu, 23 Mar 2023 06:15:26 -0400 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7AC771ADDA; Thu, 23 Mar 2023 03:15:25 -0700 (PDT) Received: by mail-wm1-x335.google.com with SMTP id r19-20020a05600c459300b003eb3e2a5e7bso735512wmo.0; Thu, 23 Mar 2023 03:15:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679566524; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cY5q38znnygXEk7MQGfhMNT6fWhXSJdRMwyHM9vnLSQ=; b=qX0NvMt0CqzoY0Lr7UQkxPq2uVe4Zy3EMu6s5/qeG3l29H6nnTeBqXkDfUGl+sMJDi To1Uc1nPNJQYHElIFq5iO8fS+lk/SakYHnFC0dSfIt0VpQY8sLqg09eimJ/7YEqbP1XR VT8JBPbHX06JTtbWOk93ztG/5KD1q6ob99V4FihMHwv2MByqX1Ai3Z6znLATtnv+ksm2 c5FSk9J5/L8z0Co66rkYe1lq/fW2DO7o+YDkp/zAgdPysPTuJd1SALva7Z+nuzGuib3o K2iz/+LuOqaihmm/nmaxKKkG3rhp33yiI0LpXtvNIi6guzXD+7ckLO8ucQMHb28Rp07t X6fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679566524; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cY5q38znnygXEk7MQGfhMNT6fWhXSJdRMwyHM9vnLSQ=; b=tEJTuiJH2WuqJiIx41eTiX7uzloho0o7X0QzxbP3eIXwARTnTgi0Zrqo1qVjE2OGiF uKE2TrrypVN9tBTICEZ0lYlKiSRQQBAyatfMY3UKHyRG3GsuIj1AnLlxyvgWWPKFaZlq 7b4f8HMdd29kgoHlg6LVq2IGV7PVRpADPaGcMgFxv9k58pKBPTu4kHwzDtKNdeqktSGS 72XKycHVUSuCNPZ4SAsxQboyUFCRoA6012adSwtY+g1BpMpcJMVJ7Eq4PabBBmogo+XL K7s42sWCHICF6/QEs46nDJeMuPBwA7IhYZn3ISkczJvAJFgFkEveO7QO0V4zl8L6LSjn dC4Q== X-Gm-Message-State: AO0yUKWgZbgrBI4wUlxRbmafKTvf69xc1vH0ocSusGyos9Yf/jmpxt/V KgoMREo07wNMAVfchMRaERY= X-Google-Smtp-Source: AK7set8l2mw02Hi+yEM7NEqPCCxbfo5h/cpPyKk3/TXqnMetKEQVjkCEfcASensglEf7x+j3LK9Iiw== X-Received: by 2002:a05:600c:b43:b0:3ed:2eb5:c2e8 with SMTP id k3-20020a05600c0b4300b003ed2eb5c2e8mr1890205wmr.10.1679566523681; Thu, 23 Mar 2023 03:15:23 -0700 (PDT) Received: from lucifer.home (host86-156-84-164.range86-156.btcentralplus.com. [86.156.84.164]) by smtp.googlemail.com with ESMTPSA id f18-20020a05600c155200b003ede2c59a54sm1416952wmg.37.2023.03.23.03.15.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Mar 2023 03:15:22 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton Cc: Baoquan He , Uladzislau Rezki , Matthew Wilcox , David Hildenbrand , Liu Shixin , Jiri Olsa , Jens Axboe , Alexander Viro , Lorenzo Stoakes Subject: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data Date: Thu, 23 Mar 2023 10:15:16 +0000 Message-Id: X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Commit df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data") introduced the use of a bounce buffer to retrieve kernel text data for /proc/kcore in order to avoid failures arising from hardened user copies enabled by CONFIG_HARDENED_USERCOPY in check_kernel_text_object(). We can avoid doing this if instead of copy_to_user() we use _copy_to_user() which bypasses the hardening check. This is more efficient than using a bounce buffer and simplifies the code. We do so as part an overall effort to eliminate bounce buffer usage in the function with an eye to converting it an iterator read. Signed-off-by: Lorenzo Stoakes Reviewed-by: David Hildenbrand Reviewed-by: Baoquan He --- fs/proc/kcore.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 71157ee35c1a..556f310d6aa4 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -541,19 +541,12 @@ read_kcore(struct file *file, char __user *buffer, si= ze_t buflen, loff_t *fpos) case KCORE_VMEMMAP: case KCORE_TEXT: /* - * Using bounce buffer to bypass the - * hardened user copy kernel text checks. + * We use _copy_to_user() to bypass usermode hardening + * which would otherwise prevent this operation. */ - if (copy_from_kernel_nofault(buf, (void *)start, tsz)) { - if (clear_user(buffer, tsz)) { - ret =3D -EFAULT; - goto out; - } - } else { - if (copy_to_user(buffer, buf, tsz)) { - ret =3D -EFAULT; - goto out; - } + if (_copy_to_user(buffer, (char *)start, tsz)) { + ret =3D -EFAULT; + goto out; } break; default: --=20 2.39.2 From nobody Sat May 11 08:49:00 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0027DC6FD1C for ; Thu, 23 Mar 2023 10:15:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231356AbjCWKPg (ORCPT ); Thu, 23 Mar 2023 06:15:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231292AbjCWKP1 (ORCPT ); Thu, 23 Mar 2023 06:15:27 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE4CB1B2DF; Thu, 23 Mar 2023 03:15:26 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id m2so19840785wrh.6; Thu, 23 Mar 2023 03:15:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679566525; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CnqbASwlTwT+hpjobkOVo9MU5RGxoqqRsLFO5z2uP7A=; b=ay5smV8H77AUMTTLy9JbPLC9H0FvAA6po1YwW88plRseeVZNzrO/yWXbWoeqBe73BV EPgoT4A9/R/wlPu2/DDT/sXT4SvPJX01j/7S2NXxmlxFXfMomXZaZHpKhswlSSHQAlwn YGU9oMUiADoHyd7ZKlL/C+RRgIp5YyBRmuDMkS8V6pWCvANVchSSxW2H3WGP45Ou1g4F 5LowAeuegB7b2wwRVSLyP6b5V+tJKxRo43A9f1YG5Pig2CU0CsXh/kbSLJ1arlLoPMAR UlnBALoZjxX2Emv0VI1hYbblmrbcykLwIR8Hj8xPwSWirkoP5Xg6oe7ivaw2MaP++1RV gW9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679566525; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CnqbASwlTwT+hpjobkOVo9MU5RGxoqqRsLFO5z2uP7A=; b=gLyvxfJjOeOx1EzI13rvmtSkvZJ0Ypk6gR8c3g10h9csBKr0WPFjwI7CIm+SGg3n/i 0gaZArp/oGKHGGoB3MGiLvvAhx9PJli+h9+ATLltsKB4sW6mGchNOk2vorBGfuxLgv3F 5MBX54xJQIJTCMuAU/x5s3OOPVRGGqEmkx0ptp2SNnx2Y4THGejfhjVs/GdGA3jQlU4m vL/XKkHhjLYexOl21SOlFkxDwqadcL5gj0/vV+g+UGUtp389U37RrrlBXDKOMEwepVYD tn5Eg1p+ybxpMMAVxPzta/4kPXO0ppAdDDti/v+brqvxgpJnMYIIc62pkJAHKFXnRJ+y MYOg== X-Gm-Message-State: AAQBX9dPJqJJa4ILJxdBT30IaKXrfH52ZagqegYp33/NBrJiNi2Kc0qB zkFjJb12mn747qxDHP83Te0= X-Google-Smtp-Source: AKy350YAiKXXkl0iQoEpKbs8mnEe9sJULjjf91EDkzy28H0cXCyjz2M7MyL7zCDYF1/QYu3rmhjIKA== X-Received: by 2002:adf:fac2:0:b0:2cf:f061:8134 with SMTP id a2-20020adffac2000000b002cff0618134mr2124557wrs.26.1679566525016; Thu, 23 Mar 2023 03:15:25 -0700 (PDT) Received: from lucifer.home (host86-156-84-164.range86-156.btcentralplus.com. [86.156.84.164]) by smtp.googlemail.com with ESMTPSA id f18-20020a05600c155200b003ede2c59a54sm1416952wmg.37.2023.03.23.03.15.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Mar 2023 03:15:24 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton Cc: Baoquan He , Uladzislau Rezki , Matthew Wilcox , David Hildenbrand , Liu Shixin , Jiri Olsa , Jens Axboe , Alexander Viro , Lorenzo Stoakes Subject: [PATCH v8 2/4] fs/proc/kcore: convert read_kcore() to read_kcore_iter() Date: Thu, 23 Mar 2023 10:15:17 +0000 Message-Id: X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" For the time being we still use a bounce buffer for vread(), however in the next patch we will convert this to interact directly with the iterator and eliminate the bounce buffer altogether. Signed-off-by: Lorenzo Stoakes Reviewed-by: David Hildenbrand Reviewed-by: Baoquan He --- fs/proc/kcore.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 556f310d6aa4..08b795fd80b4 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -24,7 +24,7 @@ #include #include #include -#include +#include #include #include #include @@ -308,9 +308,12 @@ static void append_kcore_note(char *notes, size_t *i, = const char *name, } =20 static ssize_t -read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *= fpos) +read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) { + struct file *file =3D iocb->ki_filp; char *buf =3D file->private_data; + loff_t *fpos =3D &iocb->ki_pos; + size_t phdrs_offset, notes_offset, data_offset; size_t page_offline_frozen =3D 1; size_t phdrs_len, notes_len; @@ -318,6 +321,7 @@ read_kcore(struct file *file, char __user *buffer, size= _t buflen, loff_t *fpos) size_t tsz; int nphdr; unsigned long start; + size_t buflen =3D iov_iter_count(iter); size_t orig_buflen =3D buflen; int ret =3D 0; =20 @@ -356,12 +360,11 @@ read_kcore(struct file *file, char __user *buffer, si= ze_t buflen, loff_t *fpos) }; =20 tsz =3D min_t(size_t, buflen, sizeof(struct elfhdr) - *fpos); - if (copy_to_user(buffer, (char *)&ehdr + *fpos, tsz)) { + if (copy_to_iter((char *)&ehdr + *fpos, tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } =20 - buffer +=3D tsz; buflen -=3D tsz; *fpos +=3D tsz; } @@ -398,15 +401,14 @@ read_kcore(struct file *file, char __user *buffer, si= ze_t buflen, loff_t *fpos) } =20 tsz =3D min_t(size_t, buflen, phdrs_offset + phdrs_len - *fpos); - if (copy_to_user(buffer, (char *)phdrs + *fpos - phdrs_offset, - tsz)) { + if (copy_to_iter((char *)phdrs + *fpos - phdrs_offset, tsz, + iter) !=3D tsz) { kfree(phdrs); ret =3D -EFAULT; goto out; } kfree(phdrs); =20 - buffer +=3D tsz; buflen -=3D tsz; *fpos +=3D tsz; } @@ -448,14 +450,13 @@ read_kcore(struct file *file, char __user *buffer, si= ze_t buflen, loff_t *fpos) min(vmcoreinfo_size, notes_len - i)); =20 tsz =3D min_t(size_t, buflen, notes_offset + notes_len - *fpos); - if (copy_to_user(buffer, notes + *fpos - notes_offset, tsz)) { + if (copy_to_iter(notes + *fpos - notes_offset, tsz, iter) !=3D tsz) { kfree(notes); ret =3D -EFAULT; goto out; } kfree(notes); =20 - buffer +=3D tsz; buflen -=3D tsz; *fpos +=3D tsz; } @@ -497,7 +498,7 @@ read_kcore(struct file *file, char __user *buffer, size= _t buflen, loff_t *fpos) } =20 if (!m) { - if (clear_user(buffer, tsz)) { + if (iov_iter_zero(tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } @@ -508,14 +509,14 @@ read_kcore(struct file *file, char __user *buffer, si= ze_t buflen, loff_t *fpos) case KCORE_VMALLOC: vread(buf, (char *)start, tsz); /* we have to zero-fill user buffer even if no read */ - if (copy_to_user(buffer, buf, tsz)) { + if (copy_to_iter(buf, tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } break; case KCORE_USER: /* User page is handled prior to normal kernel page: */ - if (copy_to_user(buffer, (char *)start, tsz)) { + if (copy_to_iter((char *)start, tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } @@ -531,7 +532,7 @@ read_kcore(struct file *file, char __user *buffer, size= _t buflen, loff_t *fpos) */ if (!page || PageOffline(page) || is_page_hwpoison(page) || !pfn_is_ram(pfn)) { - if (clear_user(buffer, tsz)) { + if (iov_iter_zero(tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } @@ -541,17 +542,17 @@ read_kcore(struct file *file, char __user *buffer, si= ze_t buflen, loff_t *fpos) case KCORE_VMEMMAP: case KCORE_TEXT: /* - * We use _copy_to_user() to bypass usermode hardening + * We use _copy_to_iter() to bypass usermode hardening * which would otherwise prevent this operation. */ - if (_copy_to_user(buffer, (char *)start, tsz)) { + if (_copy_to_iter((char *)start, tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } break; default: pr_warn_once("Unhandled KCORE type: %d\n", m->type); - if (clear_user(buffer, tsz)) { + if (iov_iter_zero(tsz, iter) !=3D tsz) { ret =3D -EFAULT; goto out; } @@ -559,7 +560,6 @@ read_kcore(struct file *file, char __user *buffer, size= _t buflen, loff_t *fpos) skip: buflen -=3D tsz; *fpos +=3D tsz; - buffer +=3D tsz; start +=3D tsz; tsz =3D (buflen > PAGE_SIZE ? PAGE_SIZE : buflen); } @@ -603,7 +603,7 @@ static int release_kcore(struct inode *inode, struct fi= le *file) } =20 static const struct proc_ops kcore_proc_ops =3D { - .proc_read =3D read_kcore, + .proc_read_iter =3D read_kcore_iter, .proc_open =3D open_kcore, .proc_release =3D release_kcore, .proc_lseek =3D default_llseek, --=20 2.39.2 From nobody Sat May 11 08:49:00 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39948C6FD1D for ; Thu, 23 Mar 2023 10:15:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231299AbjCWKPj (ORCPT ); Thu, 23 Mar 2023 06:15:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57396 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231149AbjCWKP3 (ORCPT ); Thu, 23 Mar 2023 06:15:29 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1194319F2F; Thu, 23 Mar 2023 03:15:28 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id r29so19807702wra.13; Thu, 23 Mar 2023 03:15:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679566526; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ziv2ovcgc/3NOuBG0+YWYfMhKDq2ZBtAkLCMMX3QtWE=; b=q71TRgMaiJhexb4rt9K6K/OJXB+xp2xMvihLfEQSyx92J+vcfBnKnPXLHd1HxY+Jkh fylGKmF703YaOMD430/wzE2YVyB/toaRNiNVbr9xNt1ushLgX7U8X6EWfC7giBMjdd3A QXbJXhtqmS/cNRUAe5OtLLDmT/jhP5iNMIcVmfToX2nFjnziIra8o0oqX0SPd5mdlBvo s7byAuwRvXRdFSvpzjpYzFYWDE8ZDnUc/S4DsfwvwNsHx4H9uXjgvTyptRBwa0AgmqkG 4txxsONgFqFC4qRe5uOA2fEgbppYZRZHznEeqrg73PhMqjSuCGV61AMbGtfc84SQPvMv sobA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679566526; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ziv2ovcgc/3NOuBG0+YWYfMhKDq2ZBtAkLCMMX3QtWE=; b=Nj6uPGH3iLSHQo8w3mpWHoW5V2LFPhOEOsIcd5OGYubt6rr2npv3qexHyKUfyOHK5+ cWS5d+3ccDGkRYfXFPlE5e4MsjLHAmzRmkN+PpALduv0J5WA0rnj2VrKsusbT0UwpFin 53QtaSWjiMz6ICYha1nJXugJn7i/aIg7TNds7bEjSr601zo1FnmypYoed8mQTV2ZA0Mr WZqHWZzeYaUK1okEKYz5iRDOF1iBlgMCdS3NpZp4IcT4VySnR3Xi9WwfiVI/myHho79K 3lLuCF6HuO7OfEPdRhp0ox+EIdJOg/gFrfAXM6Hh4xW1HglPen80YtihEz46QfTvI53+ OMZQ== X-Gm-Message-State: AAQBX9cX7orPE+/ec8DU0SraNRhOFmhAk5rLEwuQ/EbuoStO8OmEjbxr JWsiXervKELJLiK+J0aX+G8= X-Google-Smtp-Source: AKy350ZF32A8g3SpKgBLxf2F/F30nUWdXSvk0BtRiQ25SfSbpeVUfcurgkUToqQZN9wTmKBO9ILEMQ== X-Received: by 2002:adf:f30a:0:b0:2dc:c45:faf6 with SMTP id i10-20020adff30a000000b002dc0c45faf6mr794519wro.51.1679566526457; Thu, 23 Mar 2023 03:15:26 -0700 (PDT) Received: from lucifer.home (host86-156-84-164.range86-156.btcentralplus.com. [86.156.84.164]) by smtp.googlemail.com with ESMTPSA id f18-20020a05600c155200b003ede2c59a54sm1416952wmg.37.2023.03.23.03.15.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Mar 2023 03:15:25 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton Cc: Baoquan He , Uladzislau Rezki , Matthew Wilcox , David Hildenbrand , Liu Shixin , Jiri Olsa , Jens Axboe , Alexander Viro , Lorenzo Stoakes Subject: [PATCH v8 3/4] iov_iter: add copy_page_to_iter_nofault() Date: Thu, 23 Mar 2023 10:15:18 +0000 Message-Id: <19734729defb0f498a76bdec1bef3ac48a3af3e8.1679566220.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Provide a means to copy a page to user space from an iterator, aborting if a page fault would occur. This supports compound pages, but may be passed a tail page with an offset extending further into the compound page, so we cannot pass a folio. This allows for this function to be called from atomic context and _try_ to user pages if they are faulted in, aborting if not. The function does not use _copy_to_iter() in order to not specify might_fault(), this is similar to copy_page_from_iter_atomic(). This is being added in order that an iteratable form of vread() can be implemented while holding spinlocks. Signed-off-by: Lorenzo Stoakes Reviewed-by: Baoquan He --- include/linux/uio.h | 2 ++ lib/iov_iter.c | 48 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/include/linux/uio.h b/include/linux/uio.h index 27e3fd942960..29eb18bb6feb 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -173,6 +173,8 @@ static inline size_t copy_folio_to_iter(struct folio *f= olio, size_t offset, { return copy_page_to_iter(&folio->page, offset, bytes, i); } +size_t copy_page_to_iter_nofault(struct page *page, unsigned offset, + size_t bytes, struct iov_iter *i); =20 static __always_inline __must_check size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i) diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 274014e4eafe..34dd6bdf2fba 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -172,6 +172,18 @@ static int copyout(void __user *to, const void *from, = size_t n) return n; } =20 +static int copyout_nofault(void __user *to, const void *from, size_t n) +{ + long res; + + if (should_fail_usercopy()) + return n; + + res =3D copy_to_user_nofault(to, from, n); + + return res < 0 ? n : res; +} + static int copyin(void *to, const void __user *from, size_t n) { size_t res =3D n; @@ -734,6 +746,42 @@ size_t copy_page_to_iter(struct page *page, size_t off= set, size_t bytes, } EXPORT_SYMBOL(copy_page_to_iter); =20 +size_t copy_page_to_iter_nofault(struct page *page, unsigned offset, size_= t bytes, + struct iov_iter *i) +{ + size_t res =3D 0; + + if (!page_copy_sane(page, offset, bytes)) + return 0; + if (WARN_ON_ONCE(i->data_source)) + return 0; + if (unlikely(iov_iter_is_pipe(i))) + return copy_page_to_iter_pipe(page, offset, bytes, i); + page +=3D offset / PAGE_SIZE; // first subpage + offset %=3D PAGE_SIZE; + while (1) { + void *kaddr =3D kmap_local_page(page); + size_t n =3D min(bytes, (size_t)PAGE_SIZE - offset); + + iterate_and_advance(i, n, base, len, off, + copyout_nofault(base, kaddr + offset + off, len), + memcpy(base, kaddr + offset + off, len) + ) + kunmap_local(kaddr); + res +=3D n; + bytes -=3D n; + if (!bytes || !n) + break; + offset +=3D n; + if (offset =3D=3D PAGE_SIZE) { + page++; + offset =3D 0; + } + } + return res; +} +EXPORT_SYMBOL(copy_page_to_iter_nofault); + size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes, struct iov_iter *i) { --=20 2.39.2 From nobody Sat May 11 08:49:00 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1EB7C6FD1C for ; Thu, 23 Mar 2023 10:15:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230377AbjCWKPo (ORCPT ); Thu, 23 Mar 2023 06:15:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231304AbjCWKPb (ORCPT ); Thu, 23 Mar 2023 06:15:31 -0400 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BE4B1B33C; Thu, 23 Mar 2023 03:15:29 -0700 (PDT) Received: by mail-wm1-x32e.google.com with SMTP id v20-20020a05600c471400b003ed8826253aso1676181wmo.0; Thu, 23 Mar 2023 03:15:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679566528; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2AsCFAqSuw0gM0mvXGTtPyda1fJMyndkAzm+ZfOwetk=; b=cbMJrHwix1y5yZ0Xw0buwAI/xzyEm8deulV4S0oYi2A/MSWlHE6FhmxwGtUM6BZ65y zs1Zcfr7/Bh5b14jhamR1yRkTF6BdGSKVhO0WAVtBXg+5x5nExAx+0rsf3NKQXcSAYQA gS1049zFR9kJzvhqZcNXSXOu6OUfXQVh2WsoMCEQQfJyptgqrw+sjmDe45N4e088vXhU AyCYpiIXPMmepZ9kSyVxWYh3hHuzRCiKpke4DGlQxW0FL1UvFxJcBCyUNLA8pSe0c9U2 dRfIKLqg5mqj0J9iPxAQhKcb04ky0jnRVgr80wu8peMa5qFXfmXFstuVnppurq0aXHa0 5qkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679566528; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2AsCFAqSuw0gM0mvXGTtPyda1fJMyndkAzm+ZfOwetk=; b=pWVUVHdM9Kgqh28aSdKnCue939So/7AuhpUWZA4eKaXlFi8u2iOcKKJIY3jLSdmdZy 9PQP1eK8nQ/nNFmMIFX0lg952YVh84g4DfVFdOrNP8N4Tk6pRp1CWKFnqObw3f9Dy+ZB fipINOQVAtW+1yysmcxx3Q7GCSv0EMySwA7UYujPeqNi+IrrZnlXj7mnmZ6H16UjbaZ/ 5Iic5Uw8CFcI4C9eLDVss76AIAFFVeLua6yzfWqDVINcC8K61RgxBsKKR0TaL88xdv3u Owc0taD+HeAw1yepFylhDWRsxbriLY13mKFNVQIAzW8HEFoidyCvCet26OJNH9YkNM1M /Lmw== X-Gm-Message-State: AO0yUKUkVYNNbzdZAayqs4N4xqechsNboZGdUEALaC5iyCBof1DAaibX uatQXfFGZ65Q+iauT29Aea0= X-Google-Smtp-Source: AK7set/KXLvkspouh0TI8jR9VpZAV5euedYHcvEgky1WQvZjPi8rd3teXOJUoBJuhzHCuA4JJLYh3g== X-Received: by 2002:a05:600c:2102:b0:3ed:245f:97a with SMTP id u2-20020a05600c210200b003ed245f097amr1787883wml.19.1679566527809; Thu, 23 Mar 2023 03:15:27 -0700 (PDT) Received: from lucifer.home (host86-156-84-164.range86-156.btcentralplus.com. [86.156.84.164]) by smtp.googlemail.com with ESMTPSA id f18-20020a05600c155200b003ede2c59a54sm1416952wmg.37.2023.03.23.03.15.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Mar 2023 03:15:27 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton Cc: Baoquan He , Uladzislau Rezki , Matthew Wilcox , David Hildenbrand , Liu Shixin , Jiri Olsa , Jens Axboe , Alexander Viro , Lorenzo Stoakes Subject: [PATCH v8 4/4] mm: vmalloc: convert vread() to vread_iter() Date: Thu, 23 Mar 2023 10:15:19 +0000 Message-Id: <8506cbc667c39205e65a323f750ff9c11a463798.1679566220.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Having previously laid the foundation for converting vread() to an iterator function, pull the trigger and do so. This patch attempts to provide minimal refactoring and to reflect the existing logic as best we can, for example we continue to zero portions of memory not read, as before. Overall, there should be no functional difference other than a performance improvement in /proc/kcore access to vmalloc regions. Now we have eliminated the need for a bounce buffer in read_kcore_iter(), we dispense with it, and try to write to user memory optimistically but with faults disabled via copy_page_to_iter_nofault(). We already have preemption disabled by holding a spin lock. We continue faulting in until the operation is complete. Additionally, we must account for the fact that at any point a copy may fail (most likely due to a fault not being able to occur), we exit indicating fewer bytes retrieved than expected. Signed-off-by: Lorenzo Stoakes Reviewed-by: Baoquan He --- fs/proc/kcore.c | 44 ++++---- include/linux/vmalloc.h | 3 +- mm/nommu.c | 10 +- mm/vmalloc.c | 234 +++++++++++++++++++++++++--------------- 4 files changed, 176 insertions(+), 115 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 08b795fd80b4..25b44b303b35 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -307,13 +307,9 @@ static void append_kcore_note(char *notes, size_t *i, = const char *name, *i =3D ALIGN(*i + descsz, 4); } =20 -static ssize_t -read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) +static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) { - struct file *file =3D iocb->ki_filp; - char *buf =3D file->private_data; loff_t *fpos =3D &iocb->ki_pos; - size_t phdrs_offset, notes_offset, data_offset; size_t page_offline_frozen =3D 1; size_t phdrs_len, notes_len; @@ -507,13 +503,30 @@ read_kcore_iter(struct kiocb *iocb, struct iov_iter *= iter) =20 switch (m->type) { case KCORE_VMALLOC: - vread(buf, (char *)start, tsz); - /* we have to zero-fill user buffer even if no read */ - if (copy_to_iter(buf, tsz, iter) !=3D tsz) { - ret =3D -EFAULT; - goto out; + { + const char *src =3D (char *)start; + size_t read =3D 0, left =3D tsz; + + /* + * vmalloc uses spinlocks, so we optimistically try to + * read memory. If this fails, fault pages in and try + * again until we are done. + */ + while (true) { + read +=3D vread_iter(iter, src, left); + if (read =3D=3D tsz) + break; + + src +=3D read; + left -=3D read; + + if (fault_in_iov_iter_writeable(iter, left)) { + ret =3D -EFAULT; + goto out; + } } break; + } case KCORE_USER: /* User page is handled prior to normal kernel page: */ if (copy_to_iter((char *)start, tsz, iter) !=3D tsz) { @@ -582,10 +595,6 @@ static int open_kcore(struct inode *inode, struct file= *filp) if (ret) return ret; =20 - filp->private_data =3D kmalloc(PAGE_SIZE, GFP_KERNEL); - if (!filp->private_data) - return -ENOMEM; - if (kcore_need_update) kcore_update_ram(); if (i_size_read(inode) !=3D proc_root_kcore->size) { @@ -596,16 +605,9 @@ static int open_kcore(struct inode *inode, struct file= *filp) return 0; } =20 -static int release_kcore(struct inode *inode, struct file *file) -{ - kfree(file->private_data); - return 0; -} - static const struct proc_ops kcore_proc_ops =3D { .proc_read_iter =3D read_kcore_iter, .proc_open =3D open_kcore, - .proc_release =3D release_kcore, .proc_lseek =3D default_llseek, }; =20 diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 69250efa03d1..461aa5637f65 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -9,6 +9,7 @@ #include /* pgprot_t */ #include #include +#include =20 #include =20 @@ -251,7 +252,7 @@ static inline void set_vm_flush_reset_perms(void *addr) #endif =20 /* for /proc/kcore */ -extern long vread(char *buf, char *addr, unsigned long count); +extern long vread_iter(struct iov_iter *iter, const char *addr, size_t cou= nt); =20 /* * Internals. Don't use.. diff --git a/mm/nommu.c b/mm/nommu.c index 57ba243c6a37..f670d9979a26 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -36,6 +36,7 @@ #include =20 #include +#include #include #include #include @@ -198,14 +199,13 @@ unsigned long vmalloc_to_pfn(const void *addr) } EXPORT_SYMBOL(vmalloc_to_pfn); =20 -long vread(char *buf, char *addr, unsigned long count) +long vread_iter(struct iov_iter *iter, const char *addr, size_t count) { /* Don't allow overflow */ - if ((unsigned long) buf + count < count) - count =3D -(unsigned long) buf; + if ((unsigned long) addr + count < count) + count =3D -(unsigned long) addr; =20 - memcpy(buf, addr, count); - return count; + return copy_to_iter(addr, count, iter); } =20 /* diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 978194dc2bb8..2aaa9382605c 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include @@ -3442,62 +3441,96 @@ void *vmalloc_32_user(unsigned long size) EXPORT_SYMBOL(vmalloc_32_user); =20 /* - * small helper routine , copy contents to buf from addr. - * If the page is not present, fill zero. + * Atomically zero bytes in the iterator. + * + * Returns the number of zeroed bytes. */ +static size_t zero_iter(struct iov_iter *iter, size_t count) +{ + size_t remains =3D count; + + while (remains > 0) { + size_t num, copied; + + num =3D remains < PAGE_SIZE ? remains : PAGE_SIZE; + copied =3D copy_page_to_iter_nofault(ZERO_PAGE(0), 0, num, iter); + remains -=3D copied; + + if (copied < num) + break; + } =20 -static int aligned_vread(char *buf, char *addr, unsigned long count) + return count - remains; +} + +/* + * small helper routine, copy contents to iter from addr. + * If the page is not present, fill zero. + * + * Returns the number of copied bytes. + */ +static size_t aligned_vread_iter(struct iov_iter *iter, + const char *addr, size_t count) { - struct page *p; - int copied =3D 0; + size_t remains =3D count; + struct page *page; =20 - while (count) { + while (remains > 0) { unsigned long offset, length; + size_t copied =3D 0; =20 offset =3D offset_in_page(addr); length =3D PAGE_SIZE - offset; - if (length > count) - length =3D count; - p =3D vmalloc_to_page(addr); + if (length > remains) + length =3D remains; + page =3D vmalloc_to_page(addr); /* - * To do safe access to this _mapped_ area, we need - * lock. But adding lock here means that we need to add - * overhead of vmalloc()/vfree() calls for this _debug_ - * interface, rarely used. Instead of that, we'll use - * kmap() and get small overhead in this access function. + * To do safe access to this _mapped_ area, we need lock. But + * adding lock here means that we need to add overhead of + * vmalloc()/vfree() calls for this _debug_ interface, rarely + * used. Instead of that, we'll use an local mapping via + * copy_page_to_iter_nofault() and accept a small overhead in + * this access function. */ - if (p) { - /* We can expect USER0 is not used -- see vread() */ - void *map =3D kmap_atomic(p); - memcpy(buf, map + offset, length); - kunmap_atomic(map); - } else - memset(buf, 0, length); + if (page) + copied =3D copy_page_to_iter_nofault(page, offset, + length, iter); + else + copied =3D zero_iter(iter, length); =20 - addr +=3D length; - buf +=3D length; - copied +=3D length; - count -=3D length; + addr +=3D copied; + remains -=3D copied; + + if (copied !=3D length) + break; } - return copied; + + return count - remains; } =20 -static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long= flags) +/* + * Read from a vm_map_ram region of memory. + * + * Returns the number of copied bytes. + */ +static size_t vmap_ram_vread_iter(struct iov_iter *iter, const char *addr, + size_t count, unsigned long flags) { char *start; struct vmap_block *vb; unsigned long offset; - unsigned int rs, re, n; + unsigned int rs, re; + size_t remains, n; =20 /* * If it's area created by vm_map_ram() interface directly, but * not further subdividing and delegating management to vmap_block, * handle it here. */ - if (!(flags & VMAP_BLOCK)) { - aligned_vread(buf, addr, count); - return; - } + if (!(flags & VMAP_BLOCK)) + return aligned_vread_iter(iter, addr, count); + + remains =3D count; =20 /* * Area is split into regions and tracked with vmap_block, read out @@ -3505,50 +3538,64 @@ static void vmap_ram_vread(char *buf, char *addr, i= nt count, unsigned long flags */ vb =3D xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr)); if (!vb) - goto finished; + goto finished_zero; =20 spin_lock(&vb->lock); if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) { spin_unlock(&vb->lock); - goto finished; + goto finished_zero; } + for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) { - if (!count) - break; + size_t copied; + + if (remains =3D=3D 0) + goto finished; + start =3D vmap_block_vaddr(vb->va->va_start, rs); - while (addr < start) { - if (count =3D=3D 0) - goto unlock; - *buf =3D '\0'; - buf++; - addr++; - count--; + + if (addr < start) { + size_t to_zero =3D min_t(size_t, start - addr, remains); + size_t zeroed =3D zero_iter(iter, to_zero); + + addr +=3D zeroed; + remains -=3D zeroed; + + if (remains =3D=3D 0 || zeroed !=3D to_zero) + goto finished; } + /*it could start reading from the middle of used region*/ offset =3D offset_in_page(addr); n =3D ((re - rs + 1) << PAGE_SHIFT) - offset; - if (n > count) - n =3D count; - aligned_vread(buf, start+offset, n); + if (n > remains) + n =3D remains; + + copied =3D aligned_vread_iter(iter, start + offset, n); =20 - buf +=3D n; - addr +=3D n; - count -=3D n; + addr +=3D copied; + remains -=3D copied; + + if (copied !=3D n) + goto finished; } -unlock: + spin_unlock(&vb->lock); =20 -finished: +finished_zero: /* zero-fill the left dirty or free regions */ - if (count) - memset(buf, 0, count); + return count - remains + zero_iter(iter, remains); +finished: + /* We couldn't copy/zero everything */ + spin_unlock(&vb->lock); + return count - remains; } =20 /** - * vread() - read vmalloc area in a safe way. - * @buf: buffer for reading data - * @addr: vm address. - * @count: number of bytes to be read. + * vread_iter() - read vmalloc area in a safe way to an iterator. + * @iter: the iterator to which data should be written. + * @addr: vm address. + * @count: number of bytes to be read. * * This function checks that addr is a valid vmalloc'ed area, and * copy data from that area to a given buffer. If the given memory range @@ -3568,13 +3615,12 @@ static void vmap_ram_vread(char *buf, char *addr, i= nt count, unsigned long flags * (same number as @count) or %0 if [addr...addr+count) doesn't * include any intersection with valid vmalloc area */ -long vread(char *buf, char *addr, unsigned long count) +long vread_iter(struct iov_iter *iter, const char *addr, size_t count) { struct vmap_area *va; struct vm_struct *vm; - char *vaddr, *buf_start =3D buf; - unsigned long buflen =3D count; - unsigned long n, size, flags; + char *vaddr; + size_t n, size, flags, remains; =20 addr =3D kasan_reset_tag(addr); =20 @@ -3582,18 +3628,22 @@ long vread(char *buf, char *addr, unsigned long cou= nt) if ((unsigned long) addr + count < count) count =3D -(unsigned long) addr; =20 + remains =3D count; + spin_lock(&vmap_area_lock); va =3D find_vmap_area_exceed_addr((unsigned long)addr); if (!va) - goto finished; + goto finished_zero; =20 /* no intersects with alive vmap_area */ - if ((unsigned long)addr + count <=3D va->va_start) - goto finished; + if ((unsigned long)addr + remains <=3D va->va_start) + goto finished_zero; =20 list_for_each_entry_from(va, &vmap_area_list, list) { - if (!count) - break; + size_t copied; + + if (remains =3D=3D 0) + goto finished; =20 vm =3D va->vm; flags =3D va->flags & VMAP_FLAGS_MASK; @@ -3608,6 +3658,7 @@ long vread(char *buf, char *addr, unsigned long count) =20 if (vm && (vm->flags & VM_UNINITIALIZED)) continue; + /* Pair with smp_wmb() in clear_vm_uninitialized_flag() */ smp_rmb(); =20 @@ -3616,38 +3667,45 @@ long vread(char *buf, char *addr, unsigned long cou= nt) =20 if (addr >=3D vaddr + size) continue; - while (addr < vaddr) { - if (count =3D=3D 0) + + if (addr < vaddr) { + size_t to_zero =3D min_t(size_t, vaddr - addr, remains); + size_t zeroed =3D zero_iter(iter, to_zero); + + addr +=3D zeroed; + remains -=3D zeroed; + + if (remains =3D=3D 0 || zeroed !=3D to_zero) goto finished; - *buf =3D '\0'; - buf++; - addr++; - count--; } + n =3D vaddr + size - addr; - if (n > count) - n =3D count; + if (n > remains) + n =3D remains; =20 if (flags & VMAP_RAM) - vmap_ram_vread(buf, addr, n, flags); + copied =3D vmap_ram_vread_iter(iter, addr, n, flags); else if (!(vm->flags & VM_IOREMAP)) - aligned_vread(buf, addr, n); + copied =3D aligned_vread_iter(iter, addr, n); else /* IOREMAP area is treated as memory hole */ - memset(buf, 0, n); - buf +=3D n; - addr +=3D n; - count -=3D n; + copied =3D zero_iter(iter, n); + + addr +=3D copied; + remains -=3D copied; + + if (copied !=3D n) + goto finished; } -finished: - spin_unlock(&vmap_area_lock); =20 - if (buf =3D=3D buf_start) - return 0; +finished_zero: + spin_unlock(&vmap_area_lock); /* zero-fill memory holes */ - if (buf !=3D buf_start + buflen) - memset(buf, 0, buflen - (buf - buf_start)); + return count - remains + zero_iter(iter, remains); +finished: + /* Nothing remains, or We couldn't copy/zero everything. */ + spin_unlock(&vmap_area_lock); =20 - return buflen; + return count - remains; } =20 /** --=20 2.39.2