[PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.

Jijie Shao posted 7 patches 1 year, 3 months ago
There is a newer version of this series
[PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
Posted by Jijie Shao 1 year, 3 months ago
From: Hao Lan <lanhao@huawei.com>

This patch modifies the implementation of debugfs:
When the user process stops unexpectedly, not all data of the file system
is read. In this case, the save_buf pointer is not released. When the user
process is called next time, save_buf is used to copy the cached data
to the user space. As a result, the queried data is inconsistent. To solve
this problem, determine whether the function is invoked for the first time
based on the value of *ppos. If *ppos is 0, obtain the actual data.

Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Guangwei Zhang <zhangwangwei6@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
index 807eb3bbb11c..841e5af7b2be 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
@@ -1293,8 +1293,10 @@ static ssize_t hns3_dbg_read(struct file *filp, char __user *buffer,
 
 		/* save the buffer addr until the last read operation */
 		*save_buf = read_buf;
+	}
 
-		/* get data ready for the first time to read */
+	/* get data ready for the first time to read */
+	if (!*ppos) {
 		ret = hns3_dbg_read_cmd(dbg_data, hns3_dbg_cmd[index].cmd,
 					read_buf, hns3_dbg_cmd[index].buf_len);
 		if (ret)
-- 
2.33.0
Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
Posted by Jakub Kicinski 1 year, 3 months ago
On Thu, 7 Nov 2024 21:30:19 +0800 Jijie Shao wrote:
> Subject: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
> Date: Thu, 7 Nov 2024 21:30:19 +0800
> X-Mailer: git-send-email 2.30.0
> 
> From: Hao Lan <lanhao@huawei.com>
> 
> This patch modifies the implementation of debugfs:
> When the user process stops unexpectedly, not all data of the file system
> is read. In this case, the save_buf pointer is not released. When the user
> process is called next time, save_buf is used to copy the cached data
> to the user space. As a result, the queried data is inconsistent. To solve
> this problem, determine whether the function is invoked for the first time
> based on the value of *ppos. If *ppos is 0, obtain the actual data.

What do you mean by "inconsistent" ?
What if two processes read the file at once?
Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
Posted by Jijie Shao 1 year, 2 months ago
on 2024/11/12 9:25, Jakub Kicinski wrote:
> On Thu, 7 Nov 2024 21:30:19 +0800 Jijie Shao wrote:
>> Subject: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
>> Date: Thu, 7 Nov 2024 21:30:19 +0800
>> X-Mailer: git-send-email 2.30.0
>>
>> From: Hao Lan <lanhao@huawei.com>
>>
>> This patch modifies the implementation of debugfs:
>> When the user process stops unexpectedly, not all data of the file system
>> is read. In this case, the save_buf pointer is not released. When the user
>> process is called next time, save_buf is used to copy the cached data
>> to the user space. As a result, the queried data is inconsistent. To solve
>> this problem, determine whether the function is invoked for the first time
>> based on the value of *ppos. If *ppos is 0, obtain the actual data.
> What do you mean by "inconsistent" ?
> What if two processes read the file at once?

inconsistent:
Before this modification,
if the previous read operation is stopped before complete, the buffer is not released.
In the next read operation (perhaps after a long time), the driver does not read again.
Instead, the driver returns the bufffer content, which causes outdated data to be obtained.
As a result, the obtained data is inconsistent with the actual data.

In this patch, ppos is used to determine whether a new read operation is performed.
If yes, the driver updates the data in the buffer to ensure that the queried data is fresh.
But, if two processes read the same file at once, The read operation that ends first releases the buffer.
As a result, the other read operation re-alloc buffer memory. However, because the value of ppos is not 0,
the data is not updated again. As a result, the queried data is truncated.

This is a bug and I will fix it in the next version.

Thanks
Jijie Shao



Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
Posted by Jakub Kicinski 1 year, 2 months ago
On Wed, 13 Nov 2024 13:59:32 +0800 Jijie Shao wrote:
> inconsistent:
> Before this modification,
> if the previous read operation is stopped before complete, the buffer is not released.
> In the next read operation (perhaps after a long time), the driver does not read again.
> Instead, the driver returns the bufffer content, which causes outdated data to be obtained.
> As a result, the obtained data is inconsistent with the actual data.

I think the word "stale" would fit the situation better.

> In this patch, ppos is used to determine whether a new read operation is performed.
> If yes, the driver updates the data in the buffer to ensure that the queried data is fresh.
> But, if two processes read the same file at once, The read operation that ends first releases the buffer.
> As a result, the other read operation re-alloc buffer memory. However, because the value of ppos is not 0,
> the data is not updated again. As a result, the queried data is truncated.
> 
> This is a bug and I will fix it in the next version.

Let's say two reads are necessary to read the data:

 reader A                  reader B
  read()
   - alloc
   - hns3_dbg_read_cmd()
                           read()
                           read()
                           read(EOF) 
                            - free
  read()
   - alloc
   - hns3_dbg_read_cmd()
  read(EOF) 
   - free

The data for read A is half from one hns3_dbg_read_cmd() and half from
another. Does it not cause any actual inconsistency?

Also, just to be sure, it's not possible to lseek on these files, right?
Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
Posted by Jijie Shao 1 year, 2 months ago
on 2024/11/14 8:31, Jakub Kicinski wrote:
> On Wed, 13 Nov 2024 13:59:32 +0800 Jijie Shao wrote:
>> inconsistent:
>> Before this modification,
>> if the previous read operation is stopped before complete, the buffer is not released.
>> In the next read operation (perhaps after a long time), the driver does not read again.
>> Instead, the driver returns the bufffer content, which causes outdated data to be obtained.
>> As a result, the obtained data is inconsistent with the actual data.
> I think the word "stale" would fit the situation better.
>
>> In this patch, ppos is used to determine whether a new read operation is performed.
>> If yes, the driver updates the data in the buffer to ensure that the queried data is fresh.
>> But, if two processes read the same file at once, The read operation that ends first releases the buffer.
>> As a result, the other read operation re-alloc buffer memory. However, because the value of ppos is not 0,
>> the data is not updated again. As a result, the queried data is truncated.
>>
>> This is a bug and I will fix it in the next version.
> Let's say two reads are necessary to read the data:
>
>   reader A                  reader B
>    read()
>     - alloc
>     - hns3_dbg_read_cmd()
>                             read()
>                             read()
>                             read(EOF)
>                              - free
>    read()
>     - alloc
>     - hns3_dbg_read_cmd()
>    read(EOF)
>     - free
>
> The data for read A is half from one hns3_dbg_read_cmd() and half from
> another. Does it not cause any actual inconsistency?
>
> Also, just to be sure, it's not possible to lseek on these files, right?

The patch of this version has the following problems:
  reader A                       reader B
   read()
    - alloc
    - *ppos == 0
     - hns3_dbg_read_cmd()
                                 read()
                                 read()
                                 read(EOF)
                                  - free
   read()
    - alloc
    - *ppos != 0
   read(EOF)
    - free
    
if reader B free the buffer, reader A will alloc a new buffer again,
but *ppos != 0, so hns3_dbg_read_cmd() will not be called.
So reader A cannot get the left data.

I plan to introduce the "need_update" variable in the next version,
with the default value is false. Run the alloc command to change the value to true:
  reader A                           reader B
   read()
    - need_update = false
    - alloc
       - need_update = true
    - *ppos == 0 || need_update
       - hns3_dbg_read_cmd()
                                     read()
                                     read()
                                     read(EOF)
                                     - free
   read()
    - alloc
      - need_update = true
    - *ppos == 0 || need_update
      - hns3_dbg_read_cmd()
   read(EOF)
    - free
So,  after reader A alloc a new buffer again, need_update will set to true.
hns3_dbg_read_cmd() will be called again to update data.

But there's still a problem:
The data for reader A is half from one hns3_dbg_read_cmd() and half from another.
However, due to the short interval between calls to hns3_dbg_read_cmd(),
and the fact that users do little to do like so, this problem is relatively acceptable.

We're also trying to fix the problem completely.
For example, an independent buffer is allocated for each reader:
  reader A                   reader B
   read()                    read()
   - alloc                   - alloc
     - hns3_dbg_read_cmd()    -hns3_dbg_read_cmd()
   read()                    read()
   read()                    read()
   read(EOF)                 read(EOF)
   - free                    - free
  
But, driver needs to maintain the mapping between the buffer and file.
And if the reader exits before read EOF, a large amount of memory is not free:
  reader
   read()
   - alloc
     - hns3_dbg_read_cmd()
   read()
   read()
   == reader exit ==
Maybe it's not a good way

As for lseek, driver needs to call lseek at the right time to reread the data.
  reader A                           reader B
    read()
    alloc
    hns3_dbg_read_cmd()
    *ppos == 0
    read()
                                     read()
                                     read()
                                     read(EOF)
                                     - free
    alloc
    hns3_dbg_read_cmd()
    - *ppos != 0
    - lseek()
    - *ppos == 0
    reread()
    read()
    read(EOF)
    free

I can't find any examples of similar implementations.
I'm not sure if there's a suitable lseek interface that the driver can call directly.


Another way is seq_file, which may be a solution,
as far as I know, each seq_file has a separate buffer and can be expanded automatically.
So it might be possible to solve the problem
But even if the solution is feasible, this will require a major refactoring of hns3 debugfs

Thanks
Jijie Shao


Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
Posted by Jakub Kicinski 1 year, 2 months ago
On Mon, 9 Dec 2024 22:14:37 +0800 Jijie Shao wrote:
> Another way is seq_file, which may be a solution,
> as far as I know, each seq_file has a separate buffer and can be expanded automatically.
> So it might be possible to solve the problem
> But even if the solution is feasible, this will require a major refactoring of hns3 debugfs

seq_file is generally used for text output

can you not hook in the allocation and execution of the cmd into the
.open handler and freeing in to the .close handler? You already use
explicit file_ops for this file.
Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
Posted by Jijie Shao 1 year, 1 month ago
on 2024/12/10 5:13, Jakub Kicinski wrote:
> On Mon, 9 Dec 2024 22:14:37 +0800 Jijie Shao wrote:
>> Another way is seq_file, which may be a solution,
>> as far as I know, each seq_file has a separate buffer and can be expanded automatically.
>> So it might be possible to solve the problem
>> But even if the solution is feasible, this will require a major refactoring of hns3 debugfs
> seq_file is generally used for text output
>
> can you not hook in the allocation and execution of the cmd into the
> .open handler and freeing in to the .close handler? You already use
> explicit file_ops for this file.


Thank you very much for your advice.

When I modified the code according to your comments,
I found that the problem mentioned in this path can be solved.
I implement .open() and .release() handler for debugfs file_operations,
Move allocation buffer and execution of the cmd to the .open() handler
and freeing in to the .release() handler.
Also allocate separate buffer for each reader and associate the buffer with the file pointer.
In this case, there is no shared buffer, which causes data inconsistency.

However, a new problem is introduced:
If the framework does not call .release() for some reason, the buffer cannot be freed, causing memory leakage.
Maybe it's acceptable?


  static ssize_t hns3_dbg_read(struct file *filp, char __user *buffer,
  			     size_t count, loff_t *ppos)
  {
-	struct hns3_dbg_data *dbg_data = filp->private_data;
+	char *buf = filp->private_data;
+
+	return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
+}
+
+static int hns3_dbg_open(struct inode *inode, struct file *filp)
+{
+	struct hns3_dbg_data *dbg_data = inode->i_private;
  	struct hnae3_handle *handle = dbg_data->handle;
  	struct hns3_nic_priv *priv = handle->priv;
-	ssize_t size = 0;
-	char **save_buf;
-	char *read_buf;
  	u32 index;
+	char *buf;
  	int ret;
  
+	if (!test_bit(HNS3_NIC_STATE_INITED, &priv->state) ||
+	    test_bit(HNS3_NIC_STATE_RESETTING, &priv->state))
+		return -EBUSY;
+
  	ret = hns3_dbg_get_cmd_index(dbg_data, &index);
  	if (ret)
  		return ret;
  
-	mutex_lock(&handle->dbgfs_lock);
-	save_buf = &handle->dbgfs_buf[index];
-
-	if (!test_bit(HNS3_NIC_STATE_INITED, &priv->state) ||
-	    test_bit(HNS3_NIC_STATE_RESETTING, &priv->state)) {
-		ret = -EBUSY;
-		goto out;
-	}
-
-	if (*save_buf) {
-		read_buf = *save_buf;
-	} else {
-		read_buf = kvzalloc(hns3_dbg_cmd[index].buf_len, GFP_KERNEL);
-		if (!read_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
-
-		/* save the buffer addr until the last read operation */
-		*save_buf = read_buf;
-
-		/* get data ready for the first time to read */
-		ret = hns3_dbg_read_cmd(dbg_data, hns3_dbg_cmd[index].cmd,
-					read_buf, hns3_dbg_cmd[index].buf_len);
-		if (ret)
-			goto out;
-	}
+	buf = kvzalloc(hns3_dbg_cmd[index].buf_len, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
  
-	size = simple_read_from_buffer(buffer, count, ppos, read_buf,
-				       strlen(read_buf));
-	if (size > 0) {
-		mutex_unlock(&handle->dbgfs_lock);
-		return size;
+	ret = hns3_dbg_read_cmd(dbg_data, hns3_dbg_cmd[index].cmd,
+				buf, hns3_dbg_cmd[index].buf_len);
+	if (ret) {
+		kvfree(buf);
+		return ret;
  	}
  
-out:
-	/* free the buffer for the last read operation */
-	if (*save_buf) {
-		kvfree(*save_buf);
-		*save_buf = NULL;
-	}
+	filp->private_data = buf;
+	return 0;
+}
  
-	mutex_unlock(&handle->dbgfs_lock);
-	return ret;
+static int hns3_dbg_release(struct inode *inode, struct file *filp)
+{
+	kvfree(filp->private_data);
+	filp->private_data = NULL;
+	return 0;
  }
  
  static const struct file_operations hns3_dbg_fops = {
  	.owner = THIS_MODULE,
-	.open  = simple_open,
+	.open  = hns3_dbg_open,
  	.read  = hns3_dbg_read,
+	.release = hns3_dbg_release,
  };


Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
Posted by Jakub Kicinski 1 year, 1 month ago
On Fri, 13 Dec 2024 21:11:49 +0800 Jijie Shao wrote:
> If the framework does not call .release() for some reason, the buffer
> cannot be freed, causing memory leakage. Maybe it's acceptable?

Are you sure? How did you test?
Looking at the code debugfs itself uses release in a similar way
(u32_array_fops, for example), so I think it should work.