From nobody Fri Dec 19 18:54:03 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CAA10217322; Sun, 24 Mar 2024 23:09:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711321770; cv=none; b=EoOgNkzgWjZQAuPOavLpkvQVb2eSYQOmNUjkXVWkfmtWAY+cEQSU1L+JbsuSUtORe12I1ZlhijgZgXyeUKwVTknw23/exikXbb+9vPEck59+MuMdEGhFabmgiV01xdvAxH2NZ70N+W7uExWn5pVYB+Zx3+oJ41rm/bAfN/SV628= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711321770; c=relaxed/simple; bh=Fce2uAMlEnrqUcEAnjpgsiZXj803UTDlk5q8XPPatnk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TiAjew04kiJ9hHUpOyEVIazv4KGYqBHGFln7NbnbauAW8VEabFawwRgU6IW/+4TNuNmOLMxdE2l8hRHvRdlM7v15FKcBKwEwHvIGHhBlvYa9vg2FWH3op5+Xco6JTo18VI59p7JtZwWvSQ7eFZliyAh0nDVVVMV9Pp2FnxMZh8w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UsX1+iVw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UsX1+iVw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 22E93C433F1; Sun, 24 Mar 2024 23:09:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1711321770; bh=Fce2uAMlEnrqUcEAnjpgsiZXj803UTDlk5q8XPPatnk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UsX1+iVwWGdLBec0KHZz0LT2SE2/out9n1UiAIqmC2LYcUWugu+xKXUQdDq3qaCn7 v8SwmrZ97XWr1e+hkGDWfCl2x/uxGi/ZlKJ0PAMDmXwvU+jGd+yA2ocrjTdq+gXLap H260jGuOrAK2OXoYrnGboX3xp0rIHmOglMxTE0U6CKpZrXj0gKn+UPFaze2PgSgCua IlgclAjXYfR0RwqdsWMr9YQdwf3ns2t9O4FSbI70ETGimFcEwrwcGUFLylX9xYJXlJ rV7D3iTZz+h9r9zuwKUeCK/jWVDv1geHINELL1RfDLOrgpGOXADHvdOrFf9akN/g69 Wv/zMULXpy+Gw== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: David Howells , Steve French , Paulo Alcantara , Ronnie Sahlberg , Shyam Prasad N , Tom Talpey , Jeff Layton , linux-cifs@vger.kernel.org, samba-technical@lists.samba.org, netfs@lists.linux.dev, linux-fsdevel@vger.kernel.org, Steve French , Sasha Levin Subject: [PATCH 6.6 497/638] cifs: Fix writeback data corruption Date: Sun, 24 Mar 2024 18:58:54 -0400 Message-ID: <20240324230116.1348576-498-sashal@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240324230116.1348576-1-sashal@kernel.org> References: <20240324230116.1348576-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: David Howells [ Upstream commit f3dc1bdb6b0b0693562c7c54a6c28bafa608ba3c ] cifs writeback doesn't correctly handle the case where cifs_extend_writeback() hits a point where it is considering an additional folio, but this would overrun the wsize - at which point it drops out of the xarray scanning loop and calls xas_pause(). The problem is that xas_pause() advances the loop counter - thereby skipping that page. What needs to happen is for xas_reset() to be called any time we decide we don't want to process the page we're looking at, but rather send the request we are building and start a new one. Fix this by copying and adapting the netfslib writepages code as a temporary measure, with cifs writeback intending to be offloaded to netfslib in the near future. This also fixes the issue with the use of filemap_get_folios_tag() causing retry of a bunch of pages which the extender already dealt with. This can be tested by creating, say, a 64K file somewhere not on cifs (otherwise copy-offload may get underfoot), mounting a cifs share with a wsize of 64000, copying the file to it and then comparing the original file and the copy: dd if=3D/dev/urandom of=3D/tmp/64K bs=3D64k count=3D1 mount //192.168.6.1/test /mnt -o user=3D...,pass=3D...,wsize=3D64000 cp /tmp/64K /mnt/64K cmp /tmp/64K /mnt/64K Without the fix, the cmp fails at position 64000 (or shortly thereafter). Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather = than a page list") Signed-off-by: David Howells cc: Steve French cc: Paulo Alcantara cc: Ronnie Sahlberg cc: Shyam Prasad N cc: Tom Talpey cc: Jeff Layton cc: linux-cifs@vger.kernel.org cc: samba-technical@lists.samba.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French Signed-off-by: Sasha Levin --- fs/smb/client/file.c | 283 ++++++++++++++++++++++++------------------- 1 file changed, 157 insertions(+), 126 deletions(-) diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c index 7320272ef0074..c156460eb5587 100644 --- a/fs/smb/client/file.c +++ b/fs/smb/client/file.c @@ -2622,20 +2622,20 @@ static int cifs_partialpagewrite(struct page *page,= unsigned from, unsigned to) * dirty pages if possible, but don't sleep while doing so. */ static void cifs_extend_writeback(struct address_space *mapping, + struct xa_state *xas, long *_count, loff_t start, int max_pages, - size_t max_len, - unsigned int *_len) + loff_t max_len, + size_t *_len) { struct folio_batch batch; struct folio *folio; - unsigned int psize, nr_pages; - size_t len =3D *_len; - pgoff_t index =3D (start + len) / PAGE_SIZE; + unsigned int nr_pages; + pgoff_t index =3D (start + *_len) / PAGE_SIZE; + size_t len; bool stop =3D true; unsigned int i; - XA_STATE(xas, &mapping->i_pages, index); =20 folio_batch_init(&batch); =20 @@ -2646,54 +2646,64 @@ static void cifs_extend_writeback(struct address_sp= ace *mapping, */ rcu_read_lock(); =20 - xas_for_each(&xas, folio, ULONG_MAX) { + xas_for_each(xas, folio, ULONG_MAX) { stop =3D true; - if (xas_retry(&xas, folio)) + if (xas_retry(xas, folio)) continue; if (xa_is_value(folio)) break; - if (folio->index !=3D index) + if (folio->index !=3D index) { + xas_reset(xas); break; + } + if (!folio_try_get_rcu(folio)) { - xas_reset(&xas); + xas_reset(xas); continue; } nr_pages =3D folio_nr_pages(folio); - if (nr_pages > max_pages) + if (nr_pages > max_pages) { + xas_reset(xas); break; + } =20 /* Has the page moved or been split? */ - if (unlikely(folio !=3D xas_reload(&xas))) { + if (unlikely(folio !=3D xas_reload(xas))) { folio_put(folio); + xas_reset(xas); break; } =20 if (!folio_trylock(folio)) { folio_put(folio); + xas_reset(xas); break; } - if (!folio_test_dirty(folio) || folio_test_writeback(folio)) { + if (!folio_test_dirty(folio) || + folio_test_writeback(folio)) { folio_unlock(folio); folio_put(folio); + xas_reset(xas); break; } =20 max_pages -=3D nr_pages; - psize =3D folio_size(folio); - len +=3D psize; + len =3D folio_size(folio); stop =3D false; - if (max_pages <=3D 0 || len >=3D max_len || *_count <=3D 0) - stop =3D true; =20 index +=3D nr_pages; + *_count -=3D nr_pages; + *_len +=3D len; + if (max_pages <=3D 0 || *_len >=3D max_len || *_count <=3D 0) + stop =3D true; + if (!folio_batch_add(&batch, folio)) break; if (stop) break; } =20 - if (!stop) - xas_pause(&xas); + xas_pause(xas); rcu_read_unlock(); =20 /* Now, if we obtained any pages, we can shift them to being @@ -2710,16 +2720,12 @@ static void cifs_extend_writeback(struct address_sp= ace *mapping, if (!folio_clear_dirty_for_io(folio)) WARN_ON(1); folio_start_writeback(folio); - - *_count -=3D folio_nr_pages(folio); folio_unlock(folio); } =20 folio_batch_release(&batch); cond_resched(); } while (!stop); - - *_len =3D len; } =20 /* @@ -2727,8 +2733,10 @@ static void cifs_extend_writeback(struct address_spa= ce *mapping, */ static ssize_t cifs_write_back_from_locked_folio(struct address_space *map= ping, struct writeback_control *wbc, + struct xa_state *xas, struct folio *folio, - loff_t start, loff_t end) + unsigned long long start, + unsigned long long end) { struct inode *inode =3D mapping->host; struct TCP_Server_Info *server; @@ -2737,17 +2745,18 @@ static ssize_t cifs_write_back_from_locked_folio(st= ruct address_space *mapping, struct cifs_credits credits_on_stack; struct cifs_credits *credits =3D &credits_on_stack; struct cifsFileInfo *cfile =3D NULL; - unsigned int xid, wsize, len; - loff_t i_size =3D i_size_read(inode); - size_t max_len; + unsigned long long i_size =3D i_size_read(inode), max_len; + unsigned int xid, wsize; + size_t len =3D folio_size(folio); long count =3D wbc->nr_to_write; int rc; =20 /* The folio should be locked, dirty and not undergoing writeback. */ + if (!folio_clear_dirty_for_io(folio)) + WARN_ON_ONCE(1); folio_start_writeback(folio); =20 count -=3D folio_nr_pages(folio); - len =3D folio_size(folio); =20 xid =3D get_xid(); server =3D cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses); @@ -2777,9 +2786,10 @@ static ssize_t cifs_write_back_from_locked_folio(str= uct address_space *mapping, wdata->server =3D server; cfile =3D NULL; =20 - /* Find all consecutive lockable dirty pages, stopping when we find a - * page that is not immediately lockable, is not dirty or is missing, - * or we reach the end of the range. + /* Find all consecutive lockable dirty pages that have contiguous + * written regions, stopping when we find a page that is not + * immediately lockable, is not dirty or is missing, or we reach the + * end of the range. */ if (start < i_size) { /* Trim the write to the EOF; the extra data is ignored. Also @@ -2799,19 +2809,18 @@ static ssize_t cifs_write_back_from_locked_folio(st= ruct address_space *mapping, max_pages -=3D folio_nr_pages(folio); =20 if (max_pages > 0) - cifs_extend_writeback(mapping, &count, start, + cifs_extend_writeback(mapping, xas, &count, start, max_pages, max_len, &len); } - len =3D min_t(loff_t, len, max_len); } - - wdata->bytes =3D len; + len =3D min_t(unsigned long long, len, i_size - start); =20 /* We now have a contiguous set of dirty pages, each with writeback * set; the first page is still locked at this point, but all the rest * have been unlocked. */ folio_unlock(folio); + wdata->bytes =3D len; =20 if (start < i_size) { iov_iter_xarray(&wdata->iter, ITER_SOURCE, &mapping->i_pages, @@ -2862,102 +2871,118 @@ static ssize_t cifs_write_back_from_locked_folio(= struct address_space *mapping, /* * write a region of pages back to the server */ -static int cifs_writepages_region(struct address_space *mapping, - struct writeback_control *wbc, - loff_t start, loff_t end, loff_t *_next) +static ssize_t cifs_writepages_begin(struct address_space *mapping, + struct writeback_control *wbc, + struct xa_state *xas, + unsigned long long *_start, + unsigned long long end) { - struct folio_batch fbatch; + struct folio *folio; + unsigned long long start =3D *_start; + ssize_t ret; int skips =3D 0; =20 - folio_batch_init(&fbatch); - do { - int nr; - pgoff_t index =3D start / PAGE_SIZE; +search_again: + /* Find the first dirty page. */ + rcu_read_lock(); =20 - nr =3D filemap_get_folios_tag(mapping, &index, end / PAGE_SIZE, - PAGECACHE_TAG_DIRTY, &fbatch); - if (!nr) + for (;;) { + folio =3D xas_find_marked(xas, end / PAGE_SIZE, PAGECACHE_TAG_DIRTY); + if (xas_retry(xas, folio) || xa_is_value(folio)) + continue; + if (!folio) break; =20 - for (int i =3D 0; i < nr; i++) { - ssize_t ret; - struct folio *folio =3D fbatch.folios[i]; + if (!folio_try_get_rcu(folio)) { + xas_reset(xas); + continue; + } =20 -redo_folio: - start =3D folio_pos(folio); /* May regress with THPs */ + if (unlikely(folio !=3D xas_reload(xas))) { + folio_put(folio); + xas_reset(xas); + continue; + } =20 - /* At this point we hold neither the i_pages lock nor the - * page lock: the page may be truncated or invalidated - * (changing page->mapping to NULL), or even swizzled - * back from swapper_space to tmpfs file mapping - */ - if (wbc->sync_mode !=3D WB_SYNC_NONE) { - ret =3D folio_lock_killable(folio); - if (ret < 0) - goto write_error; - } else { - if (!folio_trylock(folio)) - goto skip_write; - } + xas_pause(xas); + break; + } + rcu_read_unlock(); + if (!folio) + return 0; =20 - if (folio->mapping !=3D mapping || - !folio_test_dirty(folio)) { - start +=3D folio_size(folio); - folio_unlock(folio); - continue; - } + start =3D folio_pos(folio); /* May regress with THPs */ =20 - if (folio_test_writeback(folio) || - folio_test_fscache(folio)) { - folio_unlock(folio); - if (wbc->sync_mode =3D=3D WB_SYNC_NONE) - goto skip_write; + /* At this point we hold neither the i_pages lock nor the page lock: + * the page may be truncated or invalidated (changing page->mapping to + * NULL), or even swizzled back from swapper_space to tmpfs file + * mapping + */ +lock_again: + if (wbc->sync_mode !=3D WB_SYNC_NONE) { + ret =3D folio_lock_killable(folio); + if (ret < 0) + return ret; + } else { + if (!folio_trylock(folio)) + goto search_again; + } =20 - folio_wait_writeback(folio); + if (folio->mapping !=3D mapping || + !folio_test_dirty(folio)) { + start +=3D folio_size(folio); + folio_unlock(folio); + goto search_again; + } + + if (folio_test_writeback(folio) || + folio_test_fscache(folio)) { + folio_unlock(folio); + if (wbc->sync_mode !=3D WB_SYNC_NONE) { + folio_wait_writeback(folio); #ifdef CONFIG_CIFS_FSCACHE - folio_wait_fscache(folio); + folio_wait_fscache(folio); #endif - goto redo_folio; - } - - if (!folio_clear_dirty_for_io(folio)) - /* We hold the page lock - it should've been dirty. */ - WARN_ON(1); - - ret =3D cifs_write_back_from_locked_folio(mapping, wbc, folio, start, e= nd); - if (ret < 0) - goto write_error; - - start +=3D ret; - continue; - -write_error: - folio_batch_release(&fbatch); - *_next =3D start; - return ret; + goto lock_again; + } =20 -skip_write: - /* - * Too many skipped writes, or need to reschedule? - * Treat it as a write error without an error code. - */ + start +=3D folio_size(folio); + if (wbc->sync_mode =3D=3D WB_SYNC_NONE) { if (skips >=3D 5 || need_resched()) { ret =3D 0; - goto write_error; + goto out; } - - /* Otherwise, just skip that folio and go on to the next */ skips++; - start +=3D folio_size(folio); - continue; } + goto search_again; + } =20 - folio_batch_release(&fbatch); =09 - cond_resched(); - } while (wbc->nr_to_write > 0); + ret =3D cifs_write_back_from_locked_folio(mapping, wbc, xas, folio, start= , end); +out: + if (ret > 0) + *_start =3D start + ret; + return ret; +} =20 - *_next =3D start; - return 0; +/* + * Write a region of pages back to the server + */ +static int cifs_writepages_region(struct address_space *mapping, + struct writeback_control *wbc, + unsigned long long *_start, + unsigned long long end) +{ + ssize_t ret; + + XA_STATE(xas, &mapping->i_pages, *_start / PAGE_SIZE); + + do { + ret =3D cifs_writepages_begin(mapping, wbc, &xas, _start, end); + if (ret > 0 && wbc->nr_to_write > 0) + cond_resched(); + } while (ret > 0 && wbc->nr_to_write > 0); + + return ret > 0 ? 0 : ret; } =20 /* @@ -2966,7 +2991,7 @@ static int cifs_writepages_region(struct address_spac= e *mapping, static int cifs_writepages(struct address_space *mapping, struct writeback_control *wbc) { - loff_t start, next; + loff_t start, end; int ret; =20 /* We have to be careful as we can end up racing with setattr() @@ -2974,28 +2999,34 @@ static int cifs_writepages(struct address_space *ma= pping, * to prevent it. */ =20 - if (wbc->range_cyclic) { + if (wbc->range_cyclic && mapping->writeback_index) { start =3D mapping->writeback_index * PAGE_SIZE; - ret =3D cifs_writepages_region(mapping, wbc, start, LLONG_MAX, &next); - if (ret =3D=3D 0) { - mapping->writeback_index =3D next / PAGE_SIZE; - if (start > 0 && wbc->nr_to_write > 0) { - ret =3D cifs_writepages_region(mapping, wbc, 0, - start, &next); - if (ret =3D=3D 0) - mapping->writeback_index =3D - next / PAGE_SIZE; - } + ret =3D cifs_writepages_region(mapping, wbc, &start, LLONG_MAX); + if (ret < 0) + goto out; + + if (wbc->nr_to_write <=3D 0) { + mapping->writeback_index =3D start / PAGE_SIZE; + goto out; } + + start =3D 0; + end =3D mapping->writeback_index * PAGE_SIZE; + mapping->writeback_index =3D 0; + ret =3D cifs_writepages_region(mapping, wbc, &start, end); + if (ret =3D=3D 0) + mapping->writeback_index =3D start / PAGE_SIZE; } else if (wbc->range_start =3D=3D 0 && wbc->range_end =3D=3D LLONG_MAX) { - ret =3D cifs_writepages_region(mapping, wbc, 0, LLONG_MAX, &next); + start =3D 0; + ret =3D cifs_writepages_region(mapping, wbc, &start, LLONG_MAX); if (wbc->nr_to_write > 0 && ret =3D=3D 0) - mapping->writeback_index =3D next / PAGE_SIZE; + mapping->writeback_index =3D start / PAGE_SIZE; } else { - ret =3D cifs_writepages_region(mapping, wbc, - wbc->range_start, wbc->range_end, &next); + start =3D wbc->range_start; + ret =3D cifs_writepages_region(mapping, wbc, &start, wbc->range_end); } =20 +out: return ret; } =20 --=20 2.43.0