[PATCH] btrfs: Fix data race in log_conflicting_inodes

Hao-ran Zheng posted 1 patch 3 weeks, 2 days ago
fs/btrfs/tree-log.c | 7 +++++++
fs/btrfs/tree-log.h | 1 +
2 files changed, 8 insertions(+)
[PATCH] btrfs: Fix data race in log_conflicting_inodes
Posted by Hao-ran Zheng 3 weeks, 2 days ago
The Data Race occurs when the `log_conflicting_inodes()` function is
executed in different threads at the same time. When one thread assigns
a value to `ctx->logging_conflict_inodes` while another thread performs
an `if(ctx->logging_conflict_inodes)` judgment or modifies it at the
same time, a data contention problem may arise.

Further, an atomicity violation may also occur here. Consider the
following case, when a thread A `if(ctx->logging_conflict_inodes)`
passes the judgment, the execution switches to another thread B, at
which time the value of `ctx->logging_conflict_inodes` has not yet
been assigned true, which would result in multiple threads executing
`log_conflicting_inodes()`.

To address this issue, it is recommended to add locks to protect
`logging_conflict_inodes` in the `btrfs_log_ctx` structure, and lock
protection during assignment and judgment. This modification ensures
that the value of `ctx->logging_conflict_inodes` does not change during
the validation process, thereby maintaining its integrity.

Signed-off-by: Hao-ran Zheng <zhenghaoran@buaa.edu.cn>
---
 fs/btrfs/tree-log.c | 7 +++++++
 fs/btrfs/tree-log.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 9637c7cdc0cf..9cdbf280ca9a 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2854,6 +2854,7 @@ void btrfs_init_log_ctx(struct btrfs_log_ctx *ctx, struct btrfs_inode *inode)
 	INIT_LIST_HEAD(&ctx->conflict_inodes);
 	ctx->num_conflict_inodes = 0;
 	ctx->logging_conflict_inodes = false;
+	spin_lock_init(&ctx->logging_conflict_inodes_lock);
 	ctx->scratch_eb = NULL;
 }
 
@@ -5779,16 +5780,20 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
 				  struct btrfs_log_ctx *ctx)
 {
 	int ret = 0;
+	unsigned long logging_conflict_inodes_flags;
 
 	/*
 	 * Conflicting inodes are logged by the first call to btrfs_log_inode(),
 	 * otherwise we could have unbounded recursion of btrfs_log_inode()
 	 * calls. This check guarantees we can have only 1 level of recursion.
 	 */
+	spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
 	if (ctx->logging_conflict_inodes)
+		spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
 		return 0;
 
 	ctx->logging_conflict_inodes = true;
+	spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
 
 	/*
 	 * New conflicting inodes may be found and added to the list while we
@@ -5869,7 +5874,9 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
 			break;
 	}
 
+	spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
 	ctx->logging_conflict_inodes = false;
+	spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
 	if (ret)
 		free_conflicting_inodes(ctx);
 
diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
index dc313e6bb2fa..0f862d0c80f2 100644
--- a/fs/btrfs/tree-log.h
+++ b/fs/btrfs/tree-log.h
@@ -44,6 +44,7 @@ struct btrfs_log_ctx {
 	struct list_head conflict_inodes;
 	int num_conflict_inodes;
 	bool logging_conflict_inodes;
+	spinlock_t logging_conflict_inodes_lock;
 	/*
 	 * Used for fsyncs that need to copy items from the subvolume tree to
 	 * the log tree (full sync flag set or copy everything flag set) to
-- 
2.34.1
Re: [PATCH] btrfs: Fix data race in log_conflicting_inodes
Posted by kernel test robot 3 weeks, 1 day ago
Hi Hao-ran,

kernel test robot noticed the following build errors:

[auto build test ERROR on kdave/for-next]
[also build test ERROR on linus/master v6.12-rc5 next-20241101]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Hao-ran-Zheng/btrfs-Fix-data-race-in-log_conflicting_inodes/20241101-115429
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
patch link:    https://lore.kernel.org/r/20241101035133.925251-1-zhenghaoran%40buaa.edu.cn
patch subject: [PATCH] btrfs: Fix data race in log_conflicting_inodes
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20241102/202411021448.6pjzV4h1-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241102/202411021448.6pjzV4h1-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411021448.6pjzV4h1-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from fs/btrfs/tree-log.c:8:
   In file included from include/linux/blkdev.h:9:
   In file included from include/linux/blk_types.h:10:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:8:
   In file included from include/linux/cacheflush.h:5:
   In file included from arch/x86/include/asm/cacheflush.h:5:
   In file included from include/linux/mm.h:2213:
   include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     504 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     505 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     511 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     512 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     524 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     525 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
>> fs/btrfs/tree-log.c:5790:26: error: no member named 'conflict_inodes_lock' in 'struct btrfs_log_ctx'; did you mean 'conflict_inodes'?
    5790 |         spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
         |                                 ^~~~~~~~~~~~~~~~~~~~
         |                                 conflict_inodes
   include/linux/spinlock.h:381:39: note: expanded from macro 'spin_lock_irqsave'
     381 |         raw_spin_lock_irqsave(spinlock_check(lock), flags);     \
         |                                              ^
   include/linux/spinlock.h:244:34: note: expanded from macro 'raw_spin_lock_irqsave'
     244 |                 flags = _raw_spin_lock_irqsave(lock);   \
         |                                                ^
   fs/btrfs/tree-log.h:44:19: note: 'conflict_inodes' declared here
      44 |         struct list_head conflict_inodes;
         |                          ^
   fs/btrfs/tree-log.c:5792:32: error: no member named 'conflict_inodes_lock' in 'struct btrfs_log_ctx'; did you mean 'conflict_inodes'?
    5792 |                 spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
         |                                              ^~~~~~~~~~~~~~~~~~~~
         |                                              conflict_inodes
   fs/btrfs/tree-log.h:44:19: note: 'conflict_inodes' declared here
      44 |         struct list_head conflict_inodes;
         |                          ^
>> fs/btrfs/tree-log.c:5793:3: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation]
    5793 |                 return 0;
         |                 ^
   fs/btrfs/tree-log.c:5791:2: note: previous statement is here
    5791 |         if (ctx->logging_conflict_inodes)
         |         ^
   fs/btrfs/tree-log.c:5796:31: error: no member named 'conflict_inodes_lock' in 'struct btrfs_log_ctx'; did you mean 'conflict_inodes'?
    5796 |         spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
         |                                      ^~~~~~~~~~~~~~~~~~~~
         |                                      conflict_inodes
   fs/btrfs/tree-log.h:44:19: note: 'conflict_inodes' declared here
      44 |         struct list_head conflict_inodes;
         |                          ^
   fs/btrfs/tree-log.c:5877:26: error: no member named 'conflict_inodes_lock' in 'struct btrfs_log_ctx'; did you mean 'conflict_inodes'?
    5877 |         spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
         |                                 ^~~~~~~~~~~~~~~~~~~~
         |                                 conflict_inodes
   include/linux/spinlock.h:381:39: note: expanded from macro 'spin_lock_irqsave'
     381 |         raw_spin_lock_irqsave(spinlock_check(lock), flags);     \
         |                                              ^
   include/linux/spinlock.h:244:34: note: expanded from macro 'raw_spin_lock_irqsave'
     244 |                 flags = _raw_spin_lock_irqsave(lock);   \
         |                                                ^
   fs/btrfs/tree-log.h:44:19: note: 'conflict_inodes' declared here
      44 |         struct list_head conflict_inodes;
         |                          ^
   fs/btrfs/tree-log.c:5879:31: error: no member named 'conflict_inodes_lock' in 'struct btrfs_log_ctx'; did you mean 'conflict_inodes'?
    5879 |         spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
         |                                      ^~~~~~~~~~~~~~~~~~~~
         |                                      conflict_inodes
   fs/btrfs/tree-log.h:44:19: note: 'conflict_inodes' declared here
      44 |         struct list_head conflict_inodes;
         |                          ^
   5 warnings and 5 errors generated.


vim +5790 fs/btrfs/tree-log.c

  5777	
  5778	static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
  5779					  struct btrfs_root *root,
  5780					  struct btrfs_log_ctx *ctx)
  5781	{
  5782		int ret = 0;
  5783		unsigned long logging_conflict_inodes_flags;
  5784	
  5785		/*
  5786		 * Conflicting inodes are logged by the first call to btrfs_log_inode(),
  5787		 * otherwise we could have unbounded recursion of btrfs_log_inode()
  5788		 * calls. This check guarantees we can have only 1 level of recursion.
  5789		 */
> 5790		spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
  5791		if (ctx->logging_conflict_inodes)
  5792			spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
> 5793			return 0;
  5794	
  5795		ctx->logging_conflict_inodes = true;
  5796		spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
  5797	
  5798		/*
  5799		 * New conflicting inodes may be found and added to the list while we
  5800		 * are logging a conflicting inode, so keep iterating while the list is
  5801		 * not empty.
  5802		 */
  5803		while (!list_empty(&ctx->conflict_inodes)) {
  5804			struct btrfs_ino_list *curr;
  5805			struct inode *inode;
  5806			u64 ino;
  5807			u64 parent;
  5808	
  5809			curr = list_first_entry(&ctx->conflict_inodes,
  5810						struct btrfs_ino_list, list);
  5811			ino = curr->ino;
  5812			parent = curr->parent;
  5813			list_del(&curr->list);
  5814			kfree(curr);
  5815	
  5816			inode = btrfs_iget_logging(ino, root);
  5817			/*
  5818			 * If the other inode that had a conflicting dir entry was
  5819			 * deleted in the current transaction, we need to log its parent
  5820			 * directory. See the comment at add_conflicting_inode().
  5821			 */
  5822			if (IS_ERR(inode)) {
  5823				ret = PTR_ERR(inode);
  5824				if (ret != -ENOENT)
  5825					break;
  5826	
  5827				inode = btrfs_iget_logging(parent, root);
  5828				if (IS_ERR(inode)) {
  5829					ret = PTR_ERR(inode);
  5830					break;
  5831				}
  5832	
  5833				/*
  5834				 * Always log the directory, we cannot make this
  5835				 * conditional on need_log_inode() because the directory
  5836				 * might have been logged in LOG_INODE_EXISTS mode or
  5837				 * the dir index of the conflicting inode is not in a
  5838				 * dir index key range logged for the directory. So we
  5839				 * must make sure the deletion is recorded.
  5840				 */
  5841				ret = btrfs_log_inode(trans, BTRFS_I(inode),
  5842						      LOG_INODE_ALL, ctx);
  5843				btrfs_add_delayed_iput(BTRFS_I(inode));
  5844				if (ret)
  5845					break;
  5846				continue;
  5847			}
  5848	
  5849			/*
  5850			 * Here we can use need_log_inode() because we only need to log
  5851			 * the inode in LOG_INODE_EXISTS mode and rename operations
  5852			 * update the log, so that the log ends up with the new name and
  5853			 * without the old name.
  5854			 *
  5855			 * We did this check at add_conflicting_inode(), but here we do
  5856			 * it again because if some other task logged the inode after
  5857			 * that, we can avoid doing it again.
  5858			 */
  5859			if (!need_log_inode(trans, BTRFS_I(inode))) {
  5860				btrfs_add_delayed_iput(BTRFS_I(inode));
  5861				continue;
  5862			}
  5863	
  5864			/*
  5865			 * We are safe logging the other inode without acquiring its
  5866			 * lock as long as we log with the LOG_INODE_EXISTS mode. We
  5867			 * are safe against concurrent renames of the other inode as
  5868			 * well because during a rename we pin the log and update the
  5869			 * log with the new name before we unpin it.
  5870			 */
  5871			ret = btrfs_log_inode(trans, BTRFS_I(inode), LOG_INODE_EXISTS, ctx);
  5872			btrfs_add_delayed_iput(BTRFS_I(inode));
  5873			if (ret)
  5874				break;
  5875		}
  5876	
  5877		spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
  5878		ctx->logging_conflict_inodes = false;
  5879		spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
  5880		if (ret)
  5881			free_conflicting_inodes(ctx);
  5882	
  5883		return ret;
  5884	}
  5885	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] btrfs: Fix data race in log_conflicting_inodes
Posted by kernel test robot 3 weeks, 1 day ago
Hi Hao-ran,

kernel test robot noticed the following build errors:

[auto build test ERROR on kdave/for-next]
[also build test ERROR on linus/master v6.12-rc5 next-20241101]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Hao-ran-Zheng/btrfs-Fix-data-race-in-log_conflicting_inodes/20241101-115429
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
patch link:    https://lore.kernel.org/r/20241101035133.925251-1-zhenghaoran%40buaa.edu.cn
patch subject: [PATCH] btrfs: Fix data race in log_conflicting_inodes
config: x86_64-rhel-8.3 (https://download.01.org/0day-ci/archive/20241102/202411021443.lsHICRJl-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241102/202411021443.lsHICRJl-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411021443.lsHICRJl-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/sched.h:2145,
                    from fs/btrfs/tree-log.c:6:
   fs/btrfs/tree-log.c: In function 'log_conflicting_inodes':
>> fs/btrfs/tree-log.c:5790:33: error: 'struct btrfs_log_ctx' has no member named 'conflict_inodes_lock'; did you mean 'conflict_inodes'?
    5790 |         spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
         |                                 ^~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock.h:244:48: note: in definition of macro 'raw_spin_lock_irqsave'
     244 |                 flags = _raw_spin_lock_irqsave(lock);   \
         |                                                ^~~~
   fs/btrfs/tree-log.c:5790:9: note: in expansion of macro 'spin_lock_irqsave'
    5790 |         spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
         |         ^~~~~~~~~~~~~~~~~
   fs/btrfs/tree-log.c:5792:46: error: 'struct btrfs_log_ctx' has no member named 'conflict_inodes_lock'; did you mean 'conflict_inodes'?
    5792 |                 spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
         |                                              ^~~~~~~~~~~~~~~~~~~~
         |                                              conflict_inodes
>> fs/btrfs/tree-log.c:5791:9: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
    5791 |         if (ctx->logging_conflict_inodes)
         |         ^~
   fs/btrfs/tree-log.c:5793:17: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if'
    5793 |                 return 0;
         |                 ^~~~~~
   fs/btrfs/tree-log.c:5796:38: error: 'struct btrfs_log_ctx' has no member named 'conflict_inodes_lock'; did you mean 'conflict_inodes'?
    5796 |         spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
         |                                      ^~~~~~~~~~~~~~~~~~~~
         |                                      conflict_inodes
   fs/btrfs/tree-log.c:5877:33: error: 'struct btrfs_log_ctx' has no member named 'conflict_inodes_lock'; did you mean 'conflict_inodes'?
    5877 |         spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
         |                                 ^~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock.h:244:48: note: in definition of macro 'raw_spin_lock_irqsave'
     244 |                 flags = _raw_spin_lock_irqsave(lock);   \
         |                                                ^~~~
   fs/btrfs/tree-log.c:5877:9: note: in expansion of macro 'spin_lock_irqsave'
    5877 |         spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
         |         ^~~~~~~~~~~~~~~~~
   fs/btrfs/tree-log.c:5879:38: error: 'struct btrfs_log_ctx' has no member named 'conflict_inodes_lock'; did you mean 'conflict_inodes'?
    5879 |         spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
         |                                      ^~~~~~~~~~~~~~~~~~~~
         |                                      conflict_inodes


vim +5790 fs/btrfs/tree-log.c

  5777	
  5778	static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
  5779					  struct btrfs_root *root,
  5780					  struct btrfs_log_ctx *ctx)
  5781	{
  5782		int ret = 0;
  5783		unsigned long logging_conflict_inodes_flags;
  5784	
  5785		/*
  5786		 * Conflicting inodes are logged by the first call to btrfs_log_inode(),
  5787		 * otherwise we could have unbounded recursion of btrfs_log_inode()
  5788		 * calls. This check guarantees we can have only 1 level of recursion.
  5789		 */
> 5790		spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
> 5791		if (ctx->logging_conflict_inodes)
  5792			spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
  5793			return 0;
  5794	
  5795		ctx->logging_conflict_inodes = true;
  5796		spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
  5797	
  5798		/*
  5799		 * New conflicting inodes may be found and added to the list while we
  5800		 * are logging a conflicting inode, so keep iterating while the list is
  5801		 * not empty.
  5802		 */
  5803		while (!list_empty(&ctx->conflict_inodes)) {
  5804			struct btrfs_ino_list *curr;
  5805			struct inode *inode;
  5806			u64 ino;
  5807			u64 parent;
  5808	
  5809			curr = list_first_entry(&ctx->conflict_inodes,
  5810						struct btrfs_ino_list, list);
  5811			ino = curr->ino;
  5812			parent = curr->parent;
  5813			list_del(&curr->list);
  5814			kfree(curr);
  5815	
  5816			inode = btrfs_iget_logging(ino, root);
  5817			/*
  5818			 * If the other inode that had a conflicting dir entry was
  5819			 * deleted in the current transaction, we need to log its parent
  5820			 * directory. See the comment at add_conflicting_inode().
  5821			 */
  5822			if (IS_ERR(inode)) {
  5823				ret = PTR_ERR(inode);
  5824				if (ret != -ENOENT)
  5825					break;
  5826	
  5827				inode = btrfs_iget_logging(parent, root);
  5828				if (IS_ERR(inode)) {
  5829					ret = PTR_ERR(inode);
  5830					break;
  5831				}
  5832	
  5833				/*
  5834				 * Always log the directory, we cannot make this
  5835				 * conditional on need_log_inode() because the directory
  5836				 * might have been logged in LOG_INODE_EXISTS mode or
  5837				 * the dir index of the conflicting inode is not in a
  5838				 * dir index key range logged for the directory. So we
  5839				 * must make sure the deletion is recorded.
  5840				 */
  5841				ret = btrfs_log_inode(trans, BTRFS_I(inode),
  5842						      LOG_INODE_ALL, ctx);
  5843				btrfs_add_delayed_iput(BTRFS_I(inode));
  5844				if (ret)
  5845					break;
  5846				continue;
  5847			}
  5848	
  5849			/*
  5850			 * Here we can use need_log_inode() because we only need to log
  5851			 * the inode in LOG_INODE_EXISTS mode and rename operations
  5852			 * update the log, so that the log ends up with the new name and
  5853			 * without the old name.
  5854			 *
  5855			 * We did this check at add_conflicting_inode(), but here we do
  5856			 * it again because if some other task logged the inode after
  5857			 * that, we can avoid doing it again.
  5858			 */
  5859			if (!need_log_inode(trans, BTRFS_I(inode))) {
  5860				btrfs_add_delayed_iput(BTRFS_I(inode));
  5861				continue;
  5862			}
  5863	
  5864			/*
  5865			 * We are safe logging the other inode without acquiring its
  5866			 * lock as long as we log with the LOG_INODE_EXISTS mode. We
  5867			 * are safe against concurrent renames of the other inode as
  5868			 * well because during a rename we pin the log and update the
  5869			 * log with the new name before we unpin it.
  5870			 */
  5871			ret = btrfs_log_inode(trans, BTRFS_I(inode), LOG_INODE_EXISTS, ctx);
  5872			btrfs_add_delayed_iput(BTRFS_I(inode));
  5873			if (ret)
  5874				break;
  5875		}
  5876	
  5877		spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
  5878		ctx->logging_conflict_inodes = false;
  5879		spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
  5880		if (ret)
  5881			free_conflicting_inodes(ctx);
  5882	
  5883		return ret;
  5884	}
  5885	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] btrfs: Fix data race in log_conflicting_inodes
Posted by Filipe Manana 3 weeks, 2 days ago
On Fri, Nov 1, 2024 at 3:52 AM Hao-ran Zheng <zhenghaoran@buaa.edu.cn> wrote:
>
> The Data Race occurs when the `log_conflicting_inodes()` function is
> executed in different threads at the same time. When one thread assigns
> a value to `ctx->logging_conflict_inodes` while another thread performs
> an `if(ctx->logging_conflict_inodes)` judgment or modifies it at the
> same time, a data contention problem may arise.

No, there's no problem at all.
A log context is thread local, it's never shared between threads.

>
> Further, an atomicity violation may also occur here. Consider the
> following case, when a thread A `if(ctx->logging_conflict_inodes)`
> passes the judgment, the execution switches to another thread B, at
> which time the value of `ctx->logging_conflict_inodes` has not yet
> been assigned true, which would result in multiple threads executing
> `log_conflicting_inodes()`.

No. When you make such claims, please provide a sequence diagram that
shows how the tasks interact, what their call stacks are, so that we
can see where the race happens.

But again, this is completely wrong because a log context (struct
btrfs_log_ctx) is never shared between threads.

Thanks.

>
> To address this issue, it is recommended to add locks to protect
> `logging_conflict_inodes` in the `btrfs_log_ctx` structure, and lock
> protection during assignment and judgment. This modification ensures
> that the value of `ctx->logging_conflict_inodes` does not change during
> the validation process, thereby maintaining its integrity.
>
> Signed-off-by: Hao-ran Zheng <zhenghaoran@buaa.edu.cn>
> ---
>  fs/btrfs/tree-log.c | 7 +++++++
>  fs/btrfs/tree-log.h | 1 +
>  2 files changed, 8 insertions(+)
>
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 9637c7cdc0cf..9cdbf280ca9a 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -2854,6 +2854,7 @@ void btrfs_init_log_ctx(struct btrfs_log_ctx *ctx, struct btrfs_inode *inode)
>         INIT_LIST_HEAD(&ctx->conflict_inodes);
>         ctx->num_conflict_inodes = 0;
>         ctx->logging_conflict_inodes = false;
> +       spin_lock_init(&ctx->logging_conflict_inodes_lock);
>         ctx->scratch_eb = NULL;
>  }
>
> @@ -5779,16 +5780,20 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
>                                   struct btrfs_log_ctx *ctx)
>  {
>         int ret = 0;
> +       unsigned long logging_conflict_inodes_flags;
>
>         /*
>          * Conflicting inodes are logged by the first call to btrfs_log_inode(),
>          * otherwise we could have unbounded recursion of btrfs_log_inode()
>          * calls. This check guarantees we can have only 1 level of recursion.
>          */
> +       spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);

Even if this was remotely correct, why the irqsave? The fsync code is
never called under irq context.

>         if (ctx->logging_conflict_inodes)
> +               spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
>                 return 0;
>
>         ctx->logging_conflict_inodes = true;
> +       spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
>
>         /*
>          * New conflicting inodes may be found and added to the list while we
> @@ -5869,7 +5874,9 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
>                         break;
>         }
>
> +       spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
>         ctx->logging_conflict_inodes = false;
> +       spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
>         if (ret)
>                 free_conflicting_inodes(ctx);
>
> diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
> index dc313e6bb2fa..0f862d0c80f2 100644
> --- a/fs/btrfs/tree-log.h
> +++ b/fs/btrfs/tree-log.h
> @@ -44,6 +44,7 @@ struct btrfs_log_ctx {
>         struct list_head conflict_inodes;
>         int num_conflict_inodes;
>         bool logging_conflict_inodes;
> +       spinlock_t logging_conflict_inodes_lock;
>         /*
>          * Used for fsyncs that need to copy items from the subvolume tree to
>          * the log tree (full sync flag set or copy everything flag set) to
> --
> 2.34.1
>
>
Re: [PATCH] btrfs: Fix data race in log_conflicting_inodes
Posted by Qu Wenruo 3 weeks, 2 days ago

在 2024/11/1 14:21, Hao-ran Zheng 写道:
> The Data Race occurs when the `log_conflicting_inodes()` function is
> executed in different threads at the same time. When one thread assigns
> a value to `ctx->logging_conflict_inodes` while another thread performs
> an `if(ctx->logging_conflict_inodes)` judgment or modifies it at the
> same time, a data contention problem may arise.
>
> Further, an atomicity violation may also occur here. Consider the
> following case, when a thread A `if(ctx->logging_conflict_inodes)`
> passes the judgment, the execution switches to another thread B, at
> which time the value of `ctx->logging_conflict_inodes` has not yet
> been assigned true, which would result in multiple threads executing
> `log_conflicting_inodes()`.
>
> To address this issue, it is recommended to add locks to protect
> `logging_conflict_inodes` in the `btrfs_log_ctx` structure, and lock
> protection during assignment and judgment. This modification ensures
> that the value of `ctx->logging_conflict_inodes` does not change during
> the validation process, thereby maintaining its integrity.
>
> Signed-off-by: Hao-ran Zheng <zhenghaoran@buaa.edu.cn>
> ---
>   fs/btrfs/tree-log.c | 7 +++++++
>   fs/btrfs/tree-log.h | 1 +
>   2 files changed, 8 insertions(+)
>
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 9637c7cdc0cf..9cdbf280ca9a 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -2854,6 +2854,7 @@ void btrfs_init_log_ctx(struct btrfs_log_ctx *ctx, struct btrfs_inode *inode)
>   	INIT_LIST_HEAD(&ctx->conflict_inodes);
>   	ctx->num_conflict_inodes = 0;
>   	ctx->logging_conflict_inodes = false;
> +	spin_lock_init(&ctx->logging_conflict_inodes_lock);
>   	ctx->scratch_eb = NULL;
>   }
>
> @@ -5779,16 +5780,20 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
>   				  struct btrfs_log_ctx *ctx)
>   {
>   	int ret = 0;
> +	unsigned long logging_conflict_inodes_flags;
>
>   	/*
>   	 * Conflicting inodes are logged by the first call to btrfs_log_inode(),
>   	 * otherwise we could have unbounded recursion of btrfs_log_inode()
>   	 * calls. This check guarantees we can have only 1 level of recursion.
>   	 */
> +	spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
>   	if (ctx->logging_conflict_inodes)
> +		spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);

Not an expert on the log tree, but in the above case, the only thing the
spinlock is protecting is a bool.

This looks overkilled to me.

Yes, several booleans can be stored in to a single byte, which can cause
problems.

But in that case, why not changing those booleans into a unsigned long
and use test_bit()/set_bit()/clear_bit() so that the bit operation will
be atomic and no need for the extra spinlock.

Although I haven't check the other boolean usage, but at least for this
@logging_conflict_inodes variable, it looks like atomic bit operation is
safe.

Thanks,
Qu


>   		return 0;
>
>   	ctx->logging_conflict_inodes = true;
> +	spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
>
>   	/*
>   	 * New conflicting inodes may be found and added to the list while we
> @@ -5869,7 +5874,9 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
>   			break;
>   	}
>
> +	spin_lock_irqsave(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
>   	ctx->logging_conflict_inodes = false;
> +	spin_unlock_irqrestore(&ctx->conflict_inodes_lock, logging_conflict_inodes_flags);
>   	if (ret)
>   		free_conflicting_inodes(ctx);
>
> diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
> index dc313e6bb2fa..0f862d0c80f2 100644
> --- a/fs/btrfs/tree-log.h
> +++ b/fs/btrfs/tree-log.h
> @@ -44,6 +44,7 @@ struct btrfs_log_ctx {
>   	struct list_head conflict_inodes;
>   	int num_conflict_inodes;
>   	bool logging_conflict_inodes;
> +	spinlock_t logging_conflict_inodes_lock;
>   	/*
>   	 * Used for fsyncs that need to copy items from the subvolume tree to
>   	 * the log tree (full sync flag set or copy everything flag set) to