From nobody Thu Sep 19 22:12:16 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D390DC636D7 for ; Thu, 9 Feb 2023 10:31:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230204AbjBIKbt (ORCPT ); Thu, 9 Feb 2023 05:31:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229794AbjBIKaz (ORCPT ); Thu, 9 Feb 2023 05:30:55 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59107DBED for ; Thu, 9 Feb 2023 02:30:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675938608; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zsYNekVz+lznQWi+DyiWbMmMgsK+xoFDa6MvvOVfs9k=; b=LfB3wO7Sikhc3XLw7SPcLcB7j+k08yNao8foid9fgCeHQjX8VLEjEpg8Rtvs5SaXlkPBOM 3foSfHxLQ7sOc0ANzRDbstBhyVDt9XhrUm2j+w7sFzTsP6DyWazef3sIbQhu3BANt0Ow8G I4gh6XXNRZU4SgJy4hoq6EZIDb9KXNY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-324-svCu24HANLy3Imi4GjyZKQ-1; Thu, 09 Feb 2023 05:30:02 -0500 X-MC-Unique: svCu24HANLy3Imi4GjyZKQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8BAD52806041; Thu, 9 Feb 2023 10:30:01 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 849DB403D0C5; Thu, 9 Feb 2023 10:29:59 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com, Christoph Hellwig , John Hubbard Subject: [PATCH v13 01/12] splice: Fix O_DIRECT file read splice to avoid reversion of ITER_PIPE Date: Thu, 9 Feb 2023 10:29:43 +0000 Message-Id: <20230209102954.528942-2-dhowells@redhat.com> In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com> References: <20230209102954.528942-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" With the upcoming iov_iter_extract_pages() function, pages extracted from a non-user-backed iterator such as ITER_PIPE aren't pinned. __iomap_dio_rw(), however, calls iov_iter_revert() to shorten the iterator to just the bufferage it is going to use - which has the side-effect of freeing the excess pipe buffers, even though they're attached to a bio and may get written to by DMA (thanks to Hillf Danton for spotting this[1]). This then causes memory corruption that is particularly noticable when the syzbot test[2] is run. The test boils down to: out =3D creat(argv[1], 0666); ftruncate(out, 0x800); lseek(out, 0x200, SEEK_SET); in =3D open(argv[1], O_RDONLY | O_DIRECT | O_NOFOLLOW); sendfile(out, in, NULL, 0x1dd00); run repeatedly in parallel. What I think is happening is that ftruncate() occasionally shortens the DIO read that's about to be made by sendfile's splice core by reducing i_size. Fix this by splitting the handling of a splice from an O_DIRECT file fd off from that of non-DIO and in this case, replacing the use of an ITER_PIPE iterator with an ITER_BVEC iterator for which reversion won't free the buffers. The DIO-specific code bulk allocates all the buffers it thinks it is going to use in advance, does the read synchronously and only then trims the buffer down. The pages we did use get pushed into the pipe. This should be more efficient for DIO read by virtue of doing a bulk page allocation, but slightly less efficient by ignoring any partial page in the pipe. Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_iter= _extract_pages") Reported-by: syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com Signed-off-by: David Howells cc: Jens Axboe cc: Christoph Hellwig cc: Al Viro cc: David Hildenbrand cc: John Hubbard cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230207094731.1390-1-hdanton@sina.com/ [1] Link: https://lore.kernel.org/r/000000000000b0b3c005f3a09383@google.com/ [2] --- Notes: ver #13) - Don't completely replace generic_file_splice_read(), but rather only= use this if we're doing a splicing from an O_DIRECT file fd. fs/splice.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) diff --git a/fs/splice.c b/fs/splice.c index 5969b7a1d353..b4be6fc314a1 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -282,6 +282,99 @@ void splice_shrink_spd(struct splice_pipe_desc *spd) kfree(spd->partial); } =20 +/* + * Splice data from an O_DIRECT file into pages and then add them to the o= utput + * pipe. + */ +static ssize_t generic_file_direct_splice_read(struct file *in, loff_t *pp= os, + struct pipe_inode_info *pipe, + size_t len, unsigned int flags) +{ + LIST_HEAD(pages); + struct iov_iter to; + struct bio_vec *bv; + struct kiocb kiocb; + struct page *page; + unsigned int head; + ssize_t ret; + size_t used, npages, chunk, remain, reclaim; + int i; + + /* Work out how much data we can actually add into the pipe */ + used =3D pipe_occupancy(pipe->head, pipe->tail); + npages =3D max_t(ssize_t, pipe->max_usage - used, 0); + len =3D min_t(size_t, len, npages * PAGE_SIZE); + npages =3D DIV_ROUND_UP(len, PAGE_SIZE); + + bv =3D kmalloc(array_size(npages, sizeof(bv[0])), GFP_KERNEL); + if (!bv) + return -ENOMEM; + + npages =3D alloc_pages_bulk_list(GFP_USER, npages, &pages); + if (!npages) { + kfree(bv); + return -ENOMEM; + } + + remain =3D len =3D min_t(size_t, len, npages * PAGE_SIZE); + + for (i =3D 0; i < npages; i++) { + chunk =3D min_t(size_t, PAGE_SIZE, remain); + page =3D list_first_entry(&pages, struct page, lru); + list_del_init(&page->lru); + bv[i].bv_page =3D page; + bv[i].bv_offset =3D 0; + bv[i].bv_len =3D chunk; + remain -=3D chunk; + } + + /* Do the I/O */ + iov_iter_bvec(&to, ITER_DEST, bv, npages, len); + init_sync_kiocb(&kiocb, in); + kiocb.ki_pos =3D *ppos; + ret =3D call_read_iter(in, &kiocb, &to); + + reclaim =3D npages * PAGE_SIZE; + remain =3D 0; + if (ret > 0) { + reclaim -=3D ret; + remain =3D ret; + *ppos =3D kiocb.ki_pos; + file_accessed(in); + } else if (ret < 0) { + /* + * callers of ->splice_read() expect -EAGAIN on + * "can't put anything in there", rather than -EFAULT. + */ + if (ret =3D=3D -EFAULT) + ret =3D -EAGAIN; + } + + /* Free any pages that didn't get touched at all. */ + for (; reclaim >=3D PAGE_SIZE; reclaim -=3D PAGE_SIZE) + __free_page(bv[--npages].bv_page); + + /* Push the remaining pages into the pipe. */ + head =3D pipe->head; + for (i =3D 0; i < npages; i++) { + struct pipe_buffer *buf =3D &pipe->bufs[head & (pipe->ring_size - 1)]; + + chunk =3D min_t(size_t, remain, PAGE_SIZE); + *buf =3D (struct pipe_buffer) { + .ops =3D &default_pipe_buf_ops, + .page =3D bv[i].bv_page, + .offset =3D 0, + .len =3D chunk, + }; + head++; + remain -=3D chunk; + } + pipe->head =3D head; + + kfree(bv); + return ret; +} + /** * generic_file_splice_read - splice data from file to a pipe * @in: file to splice from @@ -303,6 +396,9 @@ ssize_t generic_file_splice_read(struct file *in, loff_= t *ppos, struct kiocb kiocb; int ret; =20 + if (in->f_flags & O_DIRECT) + return generic_file_direct_splice_read(in, ppos, pipe, len, flags); + iov_iter_pipe(&to, ITER_DEST, pipe, len); init_sync_kiocb(&kiocb, in); kiocb.ki_pos =3D *ppos;