From nobody Mon Jun 8 06:36:17 2026 Received: from mail-dy1-f182.google.com (mail-dy1-f182.google.com [74.125.82.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CCA33D3319 for ; Fri, 5 Jun 2026 21:25:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694711; cv=none; b=W8/ylS8GN7r0/0KuajajD9ztlHZlIgugF3tbCAxMiKxcW2Qq3OQZ8vIukqIUe5J+wa7UcFeVU63EsFb7kWHt1QrEx3vd4cgfeJAOzeCihsv8ol13bJdUFVV4EKaCIi9zgqF8RuJvawnm0hXlQLso2/3vxWM/z9oble2mA6QTQLE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694711; c=relaxed/simple; bh=dgTrnecW+2ucjp/KxH0AVTCpsgURs8qTeepX479O2Bk=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=ZQvXWEIwULP7yLxZSI7rjATOgD0iu+gHIdBFCDSEBpncmlcAP8M6rtFlejLFS6j7nQxzmk0Pgfx3IB1f0mKNCGEkkuFtkGSgI1e9JCCLV2YPwWHmNZnLUUJu8ckt7dO2JcD4RgwdUFRYjOW2FOegY35BOGG9A3FhU/S6VUQFIns= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=CDDVDfI3; arc=none smtp.client-ip=74.125.82.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="CDDVDfI3" Received: by mail-dy1-f182.google.com with SMTP id 5a478bee46e88-3078e0dcd67so459427eec.0 for ; Fri, 05 Jun 2026 14:25:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780694708; x=1781299508; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=3sm9KsNtK+9swBEXw68MSMuiqeGBAr1ECLfAgVezL08=; b=CDDVDfI32d4+F8QAMNgfjLSC05J74Gkoi6JEicTVKA7wX/LjJ9nMbpIsZhsQCJ3fq3 zqV8O0Hrg78kMxbt2v0m6c9hHIhEh6GuHieu9dbw4+T8SHtJqI+bxJSHFc6NdpDMAgoW kX10VToJp8ylgXJAXK2uL4QV1E+iOtkkBsNfS8mYyZJZE6fn3sLQBBOD2w/cGEiHeAnO voYkWIctgffn4rBpjPODjGLPx+gI52zbswhB8l9EborxBMoxCC2pI26J0XKFVthb1cTS td5NlcelQUoFE7uYtqb4sRnOLszW+SZrct3Ss5Lj1RWyqpliE0s0GYu5onRNxvTv2fHb vkHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780694708; x=1781299508; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=3sm9KsNtK+9swBEXw68MSMuiqeGBAr1ECLfAgVezL08=; b=Y8GT54FAtSMD/E12vQjeDqyKoj9yjNEPsQIrNOEC5jq+HivJ/pYrPyULm+qwDbwErZ K4f+5bwFmGn8VezXAYC6vcIVhyiLv2pfCL/44WWfG1oZsV1LV99MLSKH8iowrm2siNQn 98RdiyT+wWhyxCYYbTAlmVJUqm0raj78IoFQqXcnw8l9k1bE8GDk+K7XnyuNDN50MMgj AWax9Z0h8R/8qKMqyfxUdOHXGvrYn89Q8+LU7HsDix+qEqYt/Km8i4T1g/+jltGzVc4/ m60G1US4J4aakFmFnwVRArHRvsmy5OGdgeODML/dJ7/jGxu2AFbHjMlgFJugVSrjCmNv 2IBw== X-Gm-Message-State: AOJu0YwS/PVlikGwzTT7yc9eESrgHjoIIwVQzh7ebhfMqtjPsYzjzQRp 3/ZsqueHAF9H6Z/GnomMHXS+b8PrnwvEIxxCcLMpLEZN6+ZKzJF2X8/ojODjRQ== X-Gm-Gg: Acq92OHHbNhAszIcemsC5AqsOeOUiXp+62rR4ePFr005BQmKVD5tYlH7v4UCeNn3YoN oJzZs9xMxKGXn5r8nrdx56yEVCqmfWN0K9AG4Dx27UzmdnjW02Q2ozYxPr3umuYpuaM9j+jg52A JWeGZMZCmmOlyjlMu8lihjvAOb9x/4NLZWxvDwFcBEKK+VSlmqfM4y7VIGjhXCYSjlUrt2lftIV 4Nb0MIbNbpQyL8QV75Galp71bzgklX2sYINosDXrhCBb6XvJsM1t3m7H44tl10HVFumPq0P5foH m8RjXi6v6jKnlcHfWqw7AnPLbPpcB7rLdAWxN1A61X4eImspVXmcbe6+Epd6Py/6SIcEXh2+eeL 4Mm9ntRVnkcge0qeekU/CNPWXxhyXPT3jyHi2IqBX0V89r4Oeguetce2y8NmcXZY0zCqC1JN1w7 uMUOmGzZLzaw8InoxD/iHsckaDlTcUXR9j59EY7f9wYmXEoQAvkXoYOT9JfR9oij+Eb6UHgeaSo M80nu07Ql7gVwh8hHGMlyCPTCn341OmQpgpYiDQBda7HC4mXHveXHAAfAJWuQ== X-Received: by 2002:a05:7300:80ce:b0:2dd:6937:79d5 with SMTP id 5a478bee46e88-3077af681eamr2969049eec.8.1780694708129; Fri, 05 Jun 2026 14:25:08 -0700 (PDT) Received: from daehojeong-desktop.mtv.corp.google.com ([2a00:79e0:2e7c:8:36b0:9062:f19d:e1c8]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-3074dba046esm8048036eec.9.2026.06.05.14.25.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jun 2026 14:25:07 -0700 (PDT) From: Daeho Jeong To: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, kernel-team@android.com Cc: Daeho Jeong Subject: [PATCH v2] f2fs: support dynamic include/exclude for device aliasing Date: Fri, 5 Jun 2026 14:25:04 -0700 Message-ID: <20260605212504.1080138-1-daeho43@gmail.com> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Daeho Jeong This patch adds a dynamic management feature to the existing device aliasing functionality. It allows users to dynamically exclude or include specific devices from the filesystem's free pool at runtime through new ioctls. To support this, three new ioctls are introduced: - F2FS_IOC_EXCLUDE_DEV_ALIAS: This reclaims the space occupied by a device aliasing file. It first performs a capacity check, resets GC victim information for the target range, marks the segments as in-use to prevent new allocations, and then triggers GC to migrate existing valid data out of the range. Finally, it reserves these blocks in the SIT to effectively exclude the device from the usable capacity. - F2FS_IOC_INCLUDE_DEV_ALIAS: This releases the reserved space of a previously excluded device aliasing file. It truncates the blocks associated with the file, which makes them available for general filesystem allocation again. - F2FS_IOC_GET_DEV_ALIAS_STATUS: This retrieves the current aliasing status of a device aliasing file, returning whether the file is included (active alias) or excluded (inactive alias, with blocks fully allocated on the device). Signed-off-by: Daeho Jeong --- v2: prevent operations during checkpoint=3Ddisabled. --- Documentation/filesystems/f2fs.rst | 35 ++++ fs/f2fs/f2fs.h | 9 +- fs/f2fs/file.c | 270 ++++++++++++++++++++++++++++- fs/f2fs/gc.c | 30 ++-- fs/f2fs/namei.c | 11 ++ fs/f2fs/segment.c | 178 +++++++++++++------ fs/f2fs/segment.h | 11 ++ fs/f2fs/super.c | 34 ++++ include/uapi/linux/f2fs.h | 7 + 9 files changed, 517 insertions(+), 68 deletions(-) diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems= /f2fs.rst index 7e4031631286..d154c8ac0cd7 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -1036,6 +1036,41 @@ So, the key idea is, user can do any file operations= on /dev/vdc, and reclaim the space after the use, while the space is counted as /data. That doesn't require modifying partition size and filesystem format. =20 +Dynamic Device Aliasing Management +---------------------------------- + +In addition to static device aliasing by deleting the aliasing file, F2FS +supports dynamic management of device aliasing. This mechanism allows the = system +to dynamically transition partition ownership between F2FS userdata and ex= ternal +entities (e.g., zRAM, raw partition) based on system requirements without +deleting the master aliasing file or requiring unmount/remount. + +The master aliasing file is created during the initial format of the file = system +and remains as a persistent control entity (ioctl gateway) in the root dir= ectory. + +- Partition Exclusion (In-service to Aliased) + When a specific partition needs to be dedicated to external services (e.= g., zRAM), + a user can exclude the device alias range via ioctl. The kernel resets G= C victim + information for the target range, marks segments as in-use to prevent new + allocations, and triggers forced GC to migrate existing valid data out o= f the + range. Finally, it reserves these blocks in the SIT to effectively exclu= de the + device from the usable capacity. + +- Partition Inclusion (Aliased to In-service) + When external usage concludes, the space is reclaimed not by deleting th= e file, + but through the inclusion ioctl. The kernel truncates blocks associated = with + the file, releasing them back to general filesystem allocation. + +.. code-block:: + + # f2fs_io dev_alias include /mnt/f2fs/vdc.file + # df -h + /dev/vdb 64G 753M 64G 2% /mnt/f2fs + + # f2fs_io dev_alias exclude /mnt/f2fs/vdc.file + # df -h + /dev/vdb 64G 33G 32G 52% /mnt/f2fs + Per-file Read-Only Large Folio Support -------------------------------------- =20 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 29f81a496b72..5e0c5701c088 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1398,6 +1398,8 @@ struct f2fs_dev_info { unsigned int total_segments; block_t start_blk; block_t end_blk; + bool has_alias; + bool is_excluding; #ifdef CONFIG_BLK_DEV_ZONED unsigned int nr_blkz; /* Total number of zones */ unsigned long *blkz_seq; /* Bitmap indicating sequential zones */ @@ -3970,7 +3972,10 @@ int f2fs_create_flush_cmd_control(struct f2fs_sb_inf= o *sbi); int f2fs_flush_device_cache(struct f2fs_sb_info *sbi); void f2fs_destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free); void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr, - unsigned int len); + unsigned int len); +void f2fs_reserve_device_alias(struct f2fs_sb_info *sbi, block_t addr, + unsigned int len); + bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr); int f2fs_start_discard_thread(struct f2fs_sb_info *sbi); void f2fs_drop_discard_cmd(struct f2fs_sb_info *sbi); @@ -4189,6 +4194,8 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi); int f2fs_gc_range(struct f2fs_sb_info *sbi, unsigned int start_seg, unsigned int end_seg, bool dry_run, unsigned int dry_run_sections); +void f2fs_reset_gc_victim_resource(struct f2fs_sb_info *sbi, + unsigned int start, unsigned int end); int f2fs_resize_fs(struct file *filp, __u64 block_count); int __init f2fs_create_garbage_collection_cache(void); void f2fs_destroy_garbage_collection_cache(void); diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index abcf6f486dd7..c26a57fc7a31 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -803,13 +803,25 @@ int f2fs_do_truncate_blocks(struct inode *inode, u64 = from, bool lock) =20 if (IS_DEVICE_ALIASING(inode)) { struct extent_tree *et =3D F2FS_I(inode)->extent_tree[EX_READ]; - struct extent_info ei =3D et->largest; + struct extent_info ei; + + if (!et) { + f2fs_folio_put(ifolio, true); + err =3D -ENODATA; + goto out; + } + + read_lock(&et->lock); + ei =3D et->largest; + read_unlock(&et->lock); =20 f2fs_invalidate_blocks(sbi, ei.blk, ei.len); =20 dec_valid_block_count(sbi, inode, ei.len); f2fs_update_time(sbi, REQ_TIME); =20 + f2fs_drop_extent_tree(inode); + f2fs_folio_put(ifolio, true); goto out; } @@ -1092,8 +1104,9 @@ int f2fs_setattr(struct mnt_idmap *idmap, struct dent= ry *dentry, return -EPERM; =20 if ((attr->ia_valid & ATTR_SIZE)) { - if (!f2fs_is_compress_backend_ready(inode) || - IS_DEVICE_ALIASING(inode)) + if (IS_DEVICE_ALIASING(inode)) + return -EPERM; + if (!f2fs_is_compress_backend_ready(inode)) return -EOPNOTSUPP; if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED) && !IS_ALIGNED(attr->ia_size, @@ -2115,6 +2128,9 @@ static int f2fs_setflags_common(struct inode *inode, = u32 iflags, u32 mask) if (IS_NOQUOTA(inode)) return -EPERM; =20 + if (IS_DEVICE_ALIASING(inode)) + return -EPERM; + if ((iflags ^ masked_flags) & F2FS_CASEFOLD_FL) { if (!f2fs_sb_has_casefold(F2FS_I_SB(inode))) return -EOPNOTSUPP; @@ -2197,6 +2213,7 @@ static const struct { { F2FS_DIRSYNC_FL, FS_DIRSYNC_FL }, { F2FS_PROJINHERIT_FL, FS_PROJINHERIT_FL }, { F2FS_CASEFOLD_FL, FS_CASEFOLD_FL }, + { F2FS_DEVICE_ALIAS_FL, F2FS_DEVICE_ALIAS_FL }, }; =20 #define F2FS_GETTABLE_FS_FL ( \ @@ -2214,7 +2231,8 @@ static const struct { FS_INLINE_DATA_FL | \ FS_NOCOW_FL | \ FS_VERITY_FL | \ - FS_CASEFOLD_FL) + FS_CASEFOLD_FL | \ + F2FS_DEVICE_ALIAS_FL) =20 #define F2FS_SETTABLE_FS_FL ( \ FS_COMPR_FL | \ @@ -2663,6 +2681,17 @@ static int f2fs_ioc_get_encryption_policy(struct fil= e *filp, unsigned long arg) return fscrypt_ioctl_get_policy(filp, (void __user *)arg); } =20 +static int f2fs_ioc_get_dev_alias_status(struct file *filp, unsigned long = arg) +{ + struct inode *inode =3D file_inode(filp); + + if (!IS_DEVICE_ALIASING(inode)) + return -EINVAL; + + return put_user(F2FS_HAS_BLOCKS(inode) ? F2FS_DEV_ALIAS_STATUS_EXCLUDED : + F2FS_DEV_ALIAS_STATUS_INCLUDED, (u32 __user *)arg); +} + static int f2fs_ioc_get_encryption_pwsalt(struct file *filp, unsigned long= arg) { struct inode *inode =3D file_inode(filp); @@ -3599,6 +3628,230 @@ static int f2fs_ioc_get_dev_alias_file(struct file = *filp, unsigned long arg) (u32 __user *)arg); } =20 +static int f2fs_ioc_exclude_dev_alias(struct file *filp) +{ + struct inode *inode =3D file_inode(filp); + struct f2fs_sb_info *sbi =3D F2FS_I_SB(inode); + struct extent_tree *et =3D F2FS_I(inode)->extent_tree[EX_READ]; + struct extent_info ei; + struct cp_control cpc =3D { CP_SYNC, 0, 0, 0 }; + struct f2fs_lock_context lc; + blkcnt_t count; + unsigned int start, end, segno; + int type, i, err; + + if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) + return -EINVAL; + + err =3D mnt_want_write_file(filp); + if (err) + return err; + + inode_lock(inode); + + if (!IS_DEVICE_ALIASING(inode)) { + err =3D -EINVAL; + goto out_inode_unlock; + } + + if (F2FS_HAS_BLOCKS(inode)) { + err =3D 0; + goto out_inode_unlock; + } + + for (i =3D 1; i < sbi->s_ndevs; i++) { + char *name =3D strrchr(FDEV(i).path, '/'); + + name =3D name ? name + 1 : FDEV(i).path; + if (!strcmp(name, filp->f_path.dentry->d_name.name)) { + ei.blk =3D FDEV(i).start_blk; + ei.len =3D FDEV(i).total_segments << sbi->log_blocks_per_seg; + ei.fofs =3D 0; + break; + } + } + + if (i =3D=3D sbi->s_ndevs) { + err =3D -ENODATA; + goto out_inode_unlock; + } + + count =3D ei.len; + err =3D inc_valid_block_count(sbi, inode, &count, false); + if (err) + goto out_inode_unlock; + + f2fs_down_write(&sbi->gc_lock); + f2fs_lock_op(sbi, &lc); + + FDEV(f2fs_target_device_index(sbi, ei.blk)).is_excluding =3D true; + + start =3D GET_SEGNO(sbi, ei.blk); + end =3D GET_SEGNO(sbi, ei.blk + ei.len - 1); + + /* Reset the victim information to prevent GC from targeting the range */ + f2fs_reset_gc_victim_resource(sbi, start, end); + + /* Mark the range as inuse to prevent new allocations in it */ + for (segno =3D start; segno <=3D end; segno++) + __set_test_and_inuse(sbi, segno); + + /* Move out cursegs from the target range */ + for (type =3D CURSEG_HOT_DATA; type < NR_CURSEG_PERSIST_TYPE; type++) { + err =3D f2fs_allocate_segment_for_resize(sbi, type, start, end); + if (err) { + f2fs_unlock_op(sbi, &lc); + goto out_gc_unlock; + } + } + + f2fs_unlock_op(sbi, &lc); + f2fs_up_write(&sbi->gc_lock); + + /* Write checkpoint synchronously to flush all pending writes and free sp= ace */ + err =3D f2fs_write_checkpoint(sbi, &cpc); + if (err) { + f2fs_down_write(&sbi->gc_lock); + goto out_gc_unlock; + } + + /* Re-acquire gc_lock and cp_rwsem read lock for the entire range GC */ + f2fs_down_write(&sbi->gc_lock); + f2fs_lock_op(sbi, &lc); + + /* do GC to move out valid blocks in the range all at once! */ + err =3D f2fs_gc_range(sbi, start, end, false, 0); + if (err) { + f2fs_unlock_op(sbi, &lc); + goto out_gc_unlock; + } + + if (et) { + write_lock(&et->lock); + et->largest =3D ei; + write_unlock(&et->lock); + } + clear_inode_flag(inode, FI_NO_EXTENT); + + f2fs_reserve_device_alias(sbi, ei.blk, ei.len); + + i_size_write(inode, (loff_t)ei.len << PAGE_SHIFT); + f2fs_update_inode_page(inode); + + FDEV(f2fs_target_device_index(sbi, ei.blk)).is_excluding =3D false; + + f2fs_unlock_op(sbi, &lc); + f2fs_up_write(&sbi->gc_lock); + + inode_unlock(inode); + mnt_drop_write_file(filp); + + err =3D f2fs_write_checkpoint(sbi, &cpc); + return err; + +out_gc_unlock: + FDEV(f2fs_target_device_index(sbi, ei.blk)).is_excluding =3D false; + f2fs_up_write(&sbi->gc_lock); + + /* + * Put successfully GC'ed segments back into PRE list so checkpoint + * commits and frees them! + */ + f2fs_lock_op(sbi, &lc); + for (segno =3D start; segno <=3D end; segno++) { + if (get_valid_blocks(sbi, segno, false) =3D=3D 0) { + mutex_lock(&DIRTY_I(sbi)->seglist_lock); + if (!test_and_set_bit(segno, DIRTY_I(sbi)->dirty_segmap[PRE])) + DIRTY_I(sbi)->nr_dirty[PRE]++; + mutex_unlock(&DIRTY_I(sbi)->seglist_lock); + } + } + f2fs_unlock_op(sbi, &lc); + + count =3D ei.len; + dec_valid_block_count(sbi, inode, count); + + inode_unlock(inode); + mnt_drop_write_file(filp); + + f2fs_write_checkpoint(sbi, &cpc); + return err; + +out_inode_unlock: + inode_unlock(inode); + mnt_drop_write_file(filp); + return err; +} + +static int f2fs_ioc_include_dev_alias(struct file *filp) +{ + struct inode *inode =3D file_inode(filp); + struct f2fs_sb_info *sbi =3D F2FS_I_SB(inode); + struct extent_tree *et =3D F2FS_I(inode)->extent_tree[EX_READ]; + struct extent_info ei =3D {0, }; + struct cp_control cpc =3D { CP_SYNC, 0, 0, 0 }; + struct f2fs_lock_context lc; + int err; + + if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) + return -EINVAL; + + err =3D mnt_want_write_file(filp); + if (err) + return err; + + inode_lock(inode); + + if (!IS_DEVICE_ALIASING(inode)) { + err =3D -EINVAL; + goto out_inode_unlock; + } + + if (!F2FS_HAS_BLOCKS(inode)) { + err =3D 0; + goto out_inode_unlock; + } + + err =3D filemap_write_and_wait(inode->i_mapping); + if (err) + goto out_inode_unlock; + + if (et) { + read_lock(&et->lock); + ei =3D et->largest; + read_unlock(&et->lock); + } + + f2fs_down_write(&sbi->gc_lock); + f2fs_lock_op(sbi, &lc); + + truncate_setsize(inode, 0); + + err =3D f2fs_truncate_blocks(inode, 0, false); + if (err) { + i_size_write(inode, (loff_t)ei.len << PAGE_SHIFT); + f2fs_unlock_op(sbi, &lc); + f2fs_up_write(&sbi->gc_lock); + goto out_inode_unlock; + } + + f2fs_update_inode_page(inode); + + f2fs_unlock_op(sbi, &lc); + f2fs_up_write(&sbi->gc_lock); + + inode_unlock(inode); + mnt_drop_write_file(filp); + + err =3D f2fs_write_checkpoint(sbi, &cpc); + return err; + +out_inode_unlock: + inode_unlock(inode); + mnt_drop_write_file(filp); + return err; +} + static int f2fs_ioc_io_prio(struct file *filp, unsigned long arg) { struct inode *inode =3D file_inode(filp); @@ -4721,8 +4974,14 @@ static long __f2fs_ioctl(struct file *filp, unsigned= int cmd, unsigned long arg) return f2fs_ioc_compress_file(filp); case F2FS_IOC_GET_DEV_ALIAS_FILE: return f2fs_ioc_get_dev_alias_file(filp, arg); + case F2FS_IOC_GET_DEV_ALIAS_STATUS: + return f2fs_ioc_get_dev_alias_status(filp, arg); case F2FS_IOC_IO_PRIO: return f2fs_ioc_io_prio(filp, arg); + case F2FS_IOC_EXCLUDE_DEV_ALIAS: + return f2fs_ioc_exclude_dev_alias(filp); + case F2FS_IOC_INCLUDE_DEV_ALIAS: + return f2fs_ioc_include_dev_alias(filp); default: return -ENOTTY; } @@ -5447,7 +5706,10 @@ long f2fs_compat_ioctl(struct file *file, unsigned i= nt cmd, unsigned long arg) case F2FS_IOC_DECOMPRESS_FILE: case F2FS_IOC_COMPRESS_FILE: case F2FS_IOC_GET_DEV_ALIAS_FILE: + case F2FS_IOC_GET_DEV_ALIAS_STATUS: case F2FS_IOC_IO_PRIO: + case F2FS_IOC_EXCLUDE_DEV_ALIAS: + case F2FS_IOC_INCLUDE_DEV_ALIAS: break; default: return -ENOIOCTLCMD; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 60378614bc54..755df9b6bbaa 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -2143,29 +2143,37 @@ int f2fs_gc_range(struct f2fs_sb_info *sbi, return 0; } =20 +void f2fs_reset_gc_victim_resource(struct f2fs_sb_info *sbi, + unsigned int start, unsigned int end) +{ + int i; + + mutex_lock(&DIRTY_I(sbi)->seglist_lock); + for (i =3D 0; i < MAX_GC_POLICY; i++) + if (SIT_I(sbi)->last_victim[i] >=3D start && + SIT_I(sbi)->last_victim[i] <=3D end) + SIT_I(sbi)->last_victim[i] =3D 0; + + for (i =3D BG_GC; i <=3D FG_GC; i++) + if (sbi->next_victim_seg[i] >=3D start && + sbi->next_victim_seg[i] <=3D end) + sbi->next_victim_seg[i] =3D NULL_SEGNO; + mutex_unlock(&DIRTY_I(sbi)->seglist_lock); +} + static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int secs, bool dry_run) { unsigned int next_inuse, start, end; struct cp_control cpc =3D { CP_RESIZE, 0, 0, 0 }; - int gc_mode, gc_type; int err =3D 0; int type; =20 - /* Force block allocation for GC */ MAIN_SECS(sbi) -=3D secs; start =3D MAIN_SECS(sbi) * SEGS_PER_SEC(sbi); end =3D MAIN_SEGS(sbi) - 1; =20 - mutex_lock(&DIRTY_I(sbi)->seglist_lock); - for (gc_mode =3D 0; gc_mode < MAX_GC_POLICY; gc_mode++) - if (SIT_I(sbi)->last_victim[gc_mode] >=3D start) - SIT_I(sbi)->last_victim[gc_mode] =3D 0; - - for (gc_type =3D BG_GC; gc_type <=3D FG_GC; gc_type++) - if (sbi->next_victim_seg[gc_type] >=3D start) - sbi->next_victim_seg[gc_type] =3D NULL_SEGNO; - mutex_unlock(&DIRTY_I(sbi)->seglist_lock); + f2fs_reset_gc_victim_resource(sbi, start, end); =20 /* Move out cursegs from the target range */ for (type =3D CURSEG_HOT_DATA; type < NR_CURSEG_PERSIST_TYPE; type++) { diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index e360f08a9586..b7974242ead1 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -553,6 +553,9 @@ static int f2fs_unlink(struct inode *dir, struct dentry= *dentry) =20 trace_f2fs_unlink_enter(dir, dentry); =20 + if (IS_DEVICE_ALIASING(inode)) + return -EPERM; + if (unlikely(f2fs_cp_error(sbi))) { err =3D -EIO; goto out; @@ -931,6 +934,9 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct = inode *old_dir, bool old_is_dir =3D S_ISDIR(old_inode->i_mode); int err; =20 + if (IS_DEVICE_ALIASING(old_inode)) + return -EPERM; + if (unlikely(f2fs_cp_error(sbi))) return -EIO; if (!f2fs_is_checkpoint_ready(sbi)) @@ -1000,6 +1006,8 @@ static int f2fs_rename(struct mnt_idmap *idmap, struc= t inode *old_dir, } =20 if (new_inode) { + if (IS_DEVICE_ALIASING(new_inode)) + return -EPERM; =20 err =3D -ENOTEMPTY; if (old_is_dir && !f2fs_empty_dir(new_inode)) @@ -1127,6 +1135,9 @@ static int f2fs_cross_rename(struct inode *old_dir, s= truct dentry *old_dentry, int old_nlink =3D 0, new_nlink =3D 0; int err; =20 + if (IS_DEVICE_ALIASING(old_inode) || IS_DEVICE_ALIASING(new_inode)) + return -EPERM; + if (unlikely(f2fs_cp_error(sbi))) return -EIO; if (!f2fs_is_checkpoint_ready(sbi)) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 6a97fe76712b..c0ddc09adc51 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2498,44 +2498,51 @@ static int update_sit_entry_for_alloc(struct f2fs_s= b_info *sbi, struct seg_entry #ifdef CONFIG_F2FS_CHECK_FS bool mir_exist; #endif + int del_count =3D del; + int i; + + f2fs_bug_on(sbi, GET_SEGNO(sbi, blkaddr) !=3D GET_SEGNO(sbi, blkaddr + de= l_count - 1)); =20 - exist =3D f2fs_test_and_set_bit(offset, se->cur_valid_map); + for (i =3D 0; i < del_count; i++) { + exist =3D f2fs_test_and_set_bit(offset + i, se->cur_valid_map); #ifdef CONFIG_F2FS_CHECK_FS - mir_exist =3D f2fs_test_and_set_bit(offset, - se->cur_valid_map_mir); - if (unlikely(exist !=3D mir_exist)) { - f2fs_err(sbi, "Inconsistent error when setting bitmap, blk:%u, old bit:%= d", - blkaddr, exist); - f2fs_bug_on(sbi, 1); - } + mir_exist =3D f2fs_test_and_set_bit(offset + i, + se->cur_valid_map_mir); + if (unlikely(exist !=3D mir_exist)) { + f2fs_err(sbi, "Inconsistent error when setting bitmap, blk:%u, old bit:= %d", + blkaddr + i, exist); + f2fs_bug_on(sbi, 1); + } #endif - if (unlikely(exist)) { - f2fs_err(sbi, "Bitmap was wrongly set, blk:%u", blkaddr); - f2fs_bug_on(sbi, 1); - se->valid_blocks--; - del =3D 0; - } + if (unlikely(exist)) { + f2fs_err(sbi, "Bitmap was wrongly set, blk:%u", blkaddr + i); + f2fs_bug_on(sbi, 1); + se->valid_blocks--; + del -=3D 1; + continue; + } =20 - if (f2fs_block_unit_discard(sbi) && - !f2fs_test_and_set_bit(offset, se->discard_map)) - sbi->discard_blks--; + if (f2fs_block_unit_discard(sbi) && + !f2fs_test_and_set_bit(offset + i, se->discard_map)) + sbi->discard_blks--; =20 - /* - * SSR should never reuse block which is checkpointed - * or newly invalidated. - */ - if (!is_sbi_flag_set(sbi, SBI_CP_DISABLED)) { - if (!f2fs_test_and_set_bit(offset, se->ckpt_valid_map)) { - se->ckpt_valid_blocks++; - if (__is_large_section(sbi)) - get_sec_entry(sbi, segno)->ckpt_valid_blocks++; + /* + * SSR should never reuse block which is checkpointed + * or newly invalidated. + */ + if (!is_sbi_flag_set(sbi, SBI_CP_DISABLED)) { + if (!f2fs_test_and_set_bit(offset + i, se->ckpt_valid_map)) { + se->ckpt_valid_blocks++; + if (__is_large_section(sbi)) + get_sec_entry(sbi, segno)->ckpt_valid_blocks++; + } } - } =20 - if (!f2fs_test_bit(offset, se->ckpt_valid_map)) { - se->ckpt_valid_blocks +=3D del; - if (__is_large_section(sbi)) - get_sec_entry(sbi, segno)->ckpt_valid_blocks +=3D del; + if (!f2fs_test_bit(offset + i, se->ckpt_valid_map)) { + se->ckpt_valid_blocks +=3D 1; + if (__is_large_section(sbi)) + get_sec_entry(sbi, segno)->ckpt_valid_blocks +=3D 1; + } } =20 if (__is_large_section(sbi)) @@ -2590,9 +2597,14 @@ void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi= , block_t addr, unsigned int segno =3D GET_SEGNO(sbi, addr); struct sit_info *sit_i =3D SIT_I(sbi); block_t addr_start =3D addr, addr_end =3D addr + len - 1; - unsigned int seg_num =3D GET_SEGNO(sbi, addr_end) - segno + 1; + unsigned int seg_num; unsigned int i =3D 1, max_blocks =3D sbi->blocks_per_seg, cnt; =20 + if (len =3D=3D 0) + return; + + seg_num =3D GET_SEGNO(sbi, addr_end) - segno + 1; + f2fs_bug_on(sbi, addr =3D=3D NULL_ADDR); if (addr =3D=3D NEW_ADDR || addr =3D=3D COMPRESS_ADDR) return; @@ -2625,6 +2637,51 @@ void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi= , block_t addr, up_write(&sit_i->sentry_lock); } =20 +void f2fs_reserve_device_alias(struct f2fs_sb_info *sbi, block_t addr, + unsigned int len) +{ + unsigned int segno =3D GET_SEGNO(sbi, addr); + struct sit_info *sit_i =3D SIT_I(sbi); + block_t addr_start =3D addr, addr_end =3D addr + len - 1; + unsigned int seg_num; + unsigned int i =3D 1, max_blocks =3D sbi->blocks_per_seg, cnt; + + if (len =3D=3D 0) + return; + + seg_num =3D GET_SEGNO(sbi, addr_end) - segno + 1; + + down_write(&sit_i->sentry_lock); + + if (seg_num =3D=3D 1) + cnt =3D len; + else + cnt =3D max_blocks - GET_BLKOFF_FROM_SEG0(sbi, addr); + + do { + update_segment_mtime(sbi, addr_start, 0); + update_sit_entry(sbi, addr_start, cnt); + + /* Remove the segment from PRE (prefree) to prevent checkpoint from free= ing it! */ + mutex_lock(&DIRTY_I(sbi)->seglist_lock); + if (test_and_clear_bit(segno, DIRTY_I(sbi)->dirty_segmap[PRE])) + DIRTY_I(sbi)->nr_dirty[PRE]--; + mutex_unlock(&DIRTY_I(sbi)->seglist_lock); + + /* add it into dirty seglist */ + locate_dirty_segment(sbi, segno); + + /* update @addr_start and @cnt and @segno */ + addr_start =3D START_BLOCK(sbi, ++segno); + if (++i =3D=3D seg_num) + cnt =3D GET_BLKOFF_FROM_SEG0(sbi, addr_end) + 1; + else + cnt =3D max_blocks; + } while (i <=3D seg_num); + + up_write(&sit_i->sentry_lock); +} + bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr) { struct sit_info *sit_i =3D SIT_I(sbi); @@ -2783,6 +2840,7 @@ static int get_new_segment(struct f2fs_sb_info *sbi, unsigned int alloc_policy =3D sbi->allocate_section_policy; unsigned int alloc_hint =3D sbi->allocate_section_hint; bool init =3D true; + bool looped =3D false; int i; int ret =3D 0; =20 @@ -2833,33 +2891,49 @@ static int get_new_segment(struct f2fs_sb_info *sbi, find_other_zone: secno =3D find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint); =20 -#ifdef CONFIG_BLK_DEV_ZONED - if (secno >=3D MAIN_SECS(sbi) && f2fs_sb_has_blkzoned(sbi)) { - /* Write only to sequential zones */ - if (sbi->blkzone_alloc_policy =3D=3D BLKZONE_ALLOC_ONLY_SEQ) { - hint =3D GET_SEC_FROM_SEG(sbi, sbi->first_seq_zone_segno); - secno =3D find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint); - } else - secno =3D find_first_zero_bit(free_i->free_secmap, - MAIN_SECS(sbi)); - if (secno >=3D MAIN_SECS(sbi)) { - ret =3D -ENOSPC; - f2fs_bug_on(sbi, 1); - goto out_unlock; - } - } -#endif - if (secno >=3D MAIN_SECS(sbi)) { - secno =3D find_first_zero_bit(free_i->free_secmap, - MAIN_SECS(sbi)); - if (secno >=3D MAIN_SECS(sbi)) { + if (looped) { ret =3D -ENOSPC; f2fs_bug_on(sbi, !pinning); goto out_unlock; } +#ifdef CONFIG_BLK_DEV_ZONED + /* Write only to sequential zones */ + if (f2fs_sb_has_blkzoned(sbi) && + sbi->blkzone_alloc_policy =3D=3D BLKZONE_ALLOC_ONLY_SEQ) + hint =3D GET_SEC_FROM_SEG(sbi, sbi->first_seq_zone_segno); + else +#endif + hint =3D 0; + looped =3D true; + goto find_other_zone; } + segno =3D GET_SEG_FROM_SEC(sbi, secno); + + if (f2fs_sb_has_device_alias(sbi) && pinning && f2fs_is_multi_device(sbi)= ) { + int devi =3D f2fs_target_device_index(sbi, START_BLOCK(sbi, segno)); + + if (FDEV(devi).has_alias) { + unsigned int end_segno; + + while (devi < sbi->s_ndevs && FDEV(devi).has_alias) { + block_t next_blk; + + end_segno =3D GET_SEGNO(sbi, FDEV(devi).end_blk); + hint =3D GET_SEC_FROM_SEG(sbi, end_segno) + 1; + + if (hint >=3D MAIN_SECS(sbi) || ++devi >=3D sbi->s_ndevs) + break; + + next_blk =3D START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, hint)); + if (next_blk < FDEV(devi).start_blk || + next_blk > FDEV(devi).end_blk) + break; + } + goto find_other_zone; + } + } zoneno =3D GET_ZONE_FROM_SEC(sbi, secno); =20 /* give up on finding another zone */ diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 068845660b0f..914523f5d3ea 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -980,6 +980,17 @@ static inline bool sec_usage_check(struct f2fs_sb_info= *sbi, unsigned int secno) { if (is_cursec(sbi, secno) || (sbi->cur_victim_sec =3D=3D secno)) return true; + if (f2fs_sb_has_device_alias(sbi) && f2fs_is_multi_device(sbi)) { + int i; + block_t start_blk =3D START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno)); + + for (i =3D 0; i < sbi->s_ndevs; i++) { + if (FDEV(i).is_excluding && + start_blk >=3D FDEV(i).start_blk && + start_blk <=3D FDEV(i).end_blk) + return true; + } + } return false; } =20 diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 9d421a07d2d5..ee599d202fc9 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -4916,6 +4916,38 @@ static void f2fs_tuning_parameters(struct f2fs_sb_in= fo *sbi) sbi->readdir_ra =3D true; } =20 +static void f2fs_restore_device_alias(struct f2fs_sb_info *sbi) +{ + struct inode *root =3D d_inode(sbi->sb->s_root); + struct f2fs_dir_entry *de; + struct folio *folio; + int i; + + if (!f2fs_sb_has_device_alias(sbi)) + return; + + for (i =3D 1; i < sbi->s_ndevs; i++) { + char *name =3D strrchr(FDEV(i).path, '/'); + struct qstr qstr; + + name =3D name ? name + 1 : FDEV(i).path; + qstr.name =3D name; + qstr.len =3D strlen(name); + + de =3D f2fs_find_entry(root, &qstr, &folio); + if (de) { + struct inode *inode =3D f2fs_iget(sbi->sb, le32_to_cpu(de->ino)); + + if (!IS_ERR(inode)) { + if (IS_DEVICE_ALIASING(inode)) + FDEV(i).has_alias =3D true; + iput(inode); + } + f2fs_folio_put(folio, 0); + } + } +} + static int f2fs_fill_super(struct super_block *sb, struct fs_context *fc) { struct f2fs_fs_context *ctx =3D fc->fs_private; @@ -5341,6 +5373,8 @@ static int f2fs_fill_super(struct super_block *sb, st= ruct fs_context *fc) f2fs_update_time(sbi, REQ_TIME); clear_sbi_flag(sbi, SBI_CP_DISABLED_QUICK); =20 + f2fs_restore_device_alias(sbi); + sbi->umount_lock_holder =3D NULL; return 0; =20 diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h index 795e26258355..6ca6ae06918e 100644 --- a/include/uapi/linux/f2fs.h +++ b/include/uapi/linux/f2fs.h @@ -45,6 +45,9 @@ #define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25) #define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32) #define F2FS_IOC_IO_PRIO _IOW(F2FS_IOCTL_MAGIC, 27, __u32) +#define F2FS_IOC_EXCLUDE_DEV_ALIAS _IO(F2FS_IOCTL_MAGIC, 28) +#define F2FS_IOC_INCLUDE_DEV_ALIAS _IO(F2FS_IOCTL_MAGIC, 29) +#define F2FS_IOC_GET_DEV_ALIAS_STATUS _IOR(F2FS_IOCTL_MAGIC, 30, __u32) =20 /* * should be same as XFS_IOC_GOINGDOWN. @@ -70,6 +73,10 @@ enum { F2FS_IOPRIO_MAX, }; =20 +/* for F2FS_IOC_GET_DEV_ALIAS_STATUS */ +#define F2FS_DEV_ALIAS_STATUS_INCLUDED 0 +#define F2FS_DEV_ALIAS_STATUS_EXCLUDED 1 + struct f2fs_gc_range { __u32 sync; __u64 start; --=20 2.54.0.1032.g2f8565e1d1-goog