From nobody Sat Nov 30 10:49:07 2024 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 799E913A242 for ; Tue, 10 Sep 2024 07:09:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725952145; cv=none; b=H7KgAlpRlXwyB1sOIvo/I+P5F7Kx5ypY1XyDzICKh2ksoHffl2XUdviV2kefmuvIoV4k050H52OOPVwj1DG2v+nzJm3eoSL4tXXrUXPPQKr3YM7vdMMt5GpXBXNIa7CIqrYN6rCMvF4THUA66laJjokP7sD+xDNzPgEdj1qYJo4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725952145; c=relaxed/simple; bh=Fm76LKJzY3dy54K30JUyiFP+rq/lVDxt0hop++523Rc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ORgEHGQRjrKbiJ675Gl94eQ25NqdgVWhlUxhQh1LB5dIJToENlgBu1IYL0GRTo3MZi87R4XSCe52Hux7xgAwUvudNn5l/9wvSpcusFGMv7HKyaYn0mfPH+8gpbgbjp6MOeDwNV5ZW67hvopmgUN9BQ0bMZV0aJIQp+InGF8zp+4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=ns2jW4FT; arc=none smtp.client-ip=115.124.30.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="ns2jW4FT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1725952133; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=E2CocgZnZvJRWS5izeSHjTwYb+PIF/HSTj1Q9lZ/QI0=; b=ns2jW4FTOiyCJZnTn+BBsSC7rHONQZMtu3NZyrf1cQ9uw/KkQ0E3dUwL9VqOMynQZH96oKkmMbPR3dltxOpI9zYa1fL5vDS25/HQufuQb2ESMQ9udDY8ct2G6RWi2Ai/yMPjrHHMsWVjEdvvN0Sr5WW+Y96/fIO7LsqyEhh1efY= Received: from x31i01179.sqa.na131.tbsite.net(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0WEjLqPD_1725952127) by smtp.aliyun-inc.com; Tue, 10 Sep 2024 15:08:52 +0800 From: Gao Xiang To: linux-erofs@lists.ozlabs.org Cc: LKML , Gao Xiang , syzbot+4fc98ed414ae63d1ada2@syzkaller.appspotmail.com, syzbot+de04e06b28cfecf2281c@syzkaller.appspotmail.com, syzbot+c8c8238b394be4a1087d@syzkaller.appspotmail.com Subject: [PATCH v2] erofs: handle overlapped pclusters out of crafted images properly Date: Tue, 10 Sep 2024 15:08:47 +0800 Message-ID: <20240910070847.3356592-1-hsiangkao@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240904111334.995920-1-hsiangkao@linux.alibaba.com> References: <20240904111334.995920-1-hsiangkao@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" syzbot reported a task hang issue due to a deadlock case where it is waiting for the folio lock of a cached folio that will be used for cache I/Os. After looking into the crafted fuzzed image, I found it's formed with several overlapped big pclusters as below: Ext: logical offset | length : physical offset | length 0: 0.. 16384 | 16384 : 151552.. 167936 | 16384 1: 16384.. 32768 | 16384 : 155648.. 172032 | 16384 2: 32768.. 49152 | 16384 : 537223168.. 537239552 | 16384 ... Here, extent 0/1 are physically overlapped although it's entirely _impossible_ for normal filesystem images generated by mkfs. First, managed folios containing compressed data will be marked as up-to-date and then unlocked immediately (unlike in-place folios) when compressed I/Os are complete. If physical blocks are not submitted in the incremental order, there should be separate BIOs to avoid dependency issues. However, the current code mis-arranges z_erofs_fill_bio_vec() and BIO submission which causes unexpected BIO waits. Second, managed folios will be connected to their own pclusters for efficient inter-queries. However, this is somewhat hard to implement easily if overlapped big pclusters exist. Again, these only appear in fuzzed images so let's simply fall back to temporary short-lived pages for correctness. Additionally, it justifies that referenced managed folios cannot be truncated for now and reverts part of commit 2080ca1ed3e4 ("erofs: tidy up `struct z_erofs_bvec`") for simplicity although it shouldn't be any difference. Reported-by: syzbot+4fc98ed414ae63d1ada2@syzkaller.appspotmail.com Reported-by: syzbot+de04e06b28cfecf2281c@syzkaller.appspotmail.com Reported-by: syzbot+c8c8238b394be4a1087d@syzkaller.appspotmail.com Tested-by: syzbot+4fc98ed414ae63d1ada2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/0000000000002fda01061e334873@google.com Fixes: 8e6c8fa9f2e9 ("erofs: enable big pcluster feature") Signed-off-by: Gao Xiang --- changes since v1: - Avoid re-entering z_erofs_fill_bio_vec() if bio_add_page() fails; fs/erofs/zdata.c | 71 ++++++++++++++++++++++++++---------------------- 1 file changed, 38 insertions(+), 33 deletions(-) diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 2271cb74ae3a..dca11ab0ab75 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -1392,6 +1392,7 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec, struct z_erofs_bvec zbv; struct address_space *mapping; struct folio *folio; + struct page *page; int bs =3D i_blocksize(f->inode); =20 /* Except for inplace folios, the entire folio can be used for I/Os */ @@ -1414,7 +1415,6 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec, * file-backed folios will be used instead. */ if (folio->private =3D=3D (void *)Z_EROFS_PREALLOCATED_PAGE) { - folio->private =3D 0; tocache =3D true; goto out_tocache; } @@ -1432,7 +1432,7 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec, } =20 folio_lock(folio); - if (folio->mapping =3D=3D mc) { + if (likely(folio->mapping =3D=3D mc)) { /* * The cached folio is still in managed cache but without * a valid `->private` pcluster hint. Let's reconnect them. @@ -1442,41 +1442,44 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bv= ec, /* compressed_bvecs[] already takes a ref before */ folio_put(folio); } - - /* no need to submit if it is already up-to-date */ - if (folio_test_uptodate(folio)) { - folio_unlock(folio); - bvec->bv_page =3D NULL; + if (likely(folio->private =3D=3D pcl)) { + /* don't submit cache I/Os again if already uptodate */ + if (folio_test_uptodate(folio)) { + folio_unlock(folio); + bvec->bv_page =3D NULL; + } + return; } - return; + /* + * Already linked with another pcluster, which only appears in + * crafted images by fuzzers for now. But handle this anyway. + */ + tocache =3D false; /* use temporary short-lived pages */ + } else { + DBG_BUGON(1); /* referenced managed folios can't be truncated */ + tocache =3D true; } - - /* - * It has been truncated, so it's unsafe to reuse this one. Let's - * allocate a new page for compressed data. - */ - DBG_BUGON(folio->mapping); - tocache =3D true; folio_unlock(folio); folio_put(folio); out_allocfolio: - zbv.page =3D erofs_allocpage(&f->pagepool, gfp | __GFP_NOFAIL); + page =3D erofs_allocpage(&f->pagepool, gfp | __GFP_NOFAIL); spin_lock(&pcl->obj.lockref.lock); - if (pcl->compressed_bvecs[nr].page) { - erofs_pagepool_add(&f->pagepool, zbv.page); + if (unlikely(pcl->compressed_bvecs[nr].page !=3D zbv.page)) { + erofs_pagepool_add(&f->pagepool, page); spin_unlock(&pcl->obj.lockref.lock); cond_resched(); goto repeat; } - bvec->bv_page =3D pcl->compressed_bvecs[nr].page =3D zbv.page; - folio =3D page_folio(zbv.page); - /* first mark it as a temporary shortlived folio (now 1 ref) */ - folio->private =3D (void *)Z_EROFS_SHORTLIVED_PAGE; + bvec->bv_page =3D pcl->compressed_bvecs[nr].page =3D page; + folio =3D page_folio(page); spin_unlock(&pcl->obj.lockref.lock); out_tocache: if (!tocache || bs !=3D PAGE_SIZE || - filemap_add_folio(mc, folio, pcl->obj.index + nr, gfp)) + filemap_add_folio(mc, folio, pcl->obj.index + nr, gfp)) { + /* turn into a temporary shortlived folio (1 ref) */ + folio->private =3D (void *)Z_EROFS_SHORTLIVED_PAGE; return; + } folio_attach_private(folio, pcl); /* drop a refcount added by allocpage (then 2 refs in total here) */ folio_put(folio); @@ -1611,13 +1614,10 @@ static void z_erofs_submit_queue(struct z_erofs_dec= ompress_frontend *f, cur =3D mdev.m_pa; end =3D cur + pcl->pclustersize; do { - z_erofs_fill_bio_vec(&bvec, f, pcl, i++, mc); - if (!bvec.bv_page) - continue; - + bvec.bv_page =3D NULL; if (bio && (cur !=3D last_pa || bio->bi_bdev !=3D mdev.m_bdev)) { -io_retry: +drain_io: if (erofs_is_fileio_mode(EROFS_SB(sb))) erofs_fileio_submit_bio(bio); else if (erofs_is_fscache_mode(sb)) @@ -1632,6 +1632,15 @@ static void z_erofs_submit_queue(struct z_erofs_deco= mpress_frontend *f, bio =3D NULL; } =20 + if (!bvec.bv_page) { + z_erofs_fill_bio_vec(&bvec, f, pcl, i++, mc); + if (!bvec.bv_page) + continue; + if (cur + bvec.bv_len > end) + bvec.bv_len =3D end - cur; + DBG_BUGON(bvec.bv_len < sb->s_blocksize); + } + if (unlikely(PageWorkingset(bvec.bv_page)) && !memstall) { psi_memstall_enter(&pflags); @@ -1654,13 +1663,9 @@ static void z_erofs_submit_queue(struct z_erofs_deco= mpress_frontend *f, ++nr_bios; } =20 - if (cur + bvec.bv_len > end) - bvec.bv_len =3D end - cur; - DBG_BUGON(bvec.bv_len < sb->s_blocksize); if (!bio_add_page(bio, bvec.bv_page, bvec.bv_len, bvec.bv_offset)) - goto io_retry; - + goto drain_io; last_pa =3D cur + bvec.bv_len; bypass =3D false; } while ((cur +=3D bvec.bv_len) < end); --=20 2.43.5