[PATCH v2] 9p/trans_fd: mark concurrent read and writes to p9_conn->err

Ignacio Encinas posted 1 patch 9 months, 1 week ago
There is a newer version of this series
net/9p/trans_fd.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
[PATCH v2] 9p/trans_fd: mark concurrent read and writes to p9_conn->err
Posted by Ignacio Encinas 9 months, 1 week ago
Writes for the error value of a connection are spinlock-protected inside
p9_conn_cancel, but lockless reads are present elsewhere to avoid
performing unnecessary work after an error has been met.

Mark the write and lockless reads to make KCSAN happy. Mark the write as
exclusive following the recommendation in "Lock-Protected Writes with
Lockless Reads" in tools/memory-model/Documentation/access-marking.txt
while we are at it.

Reported-by: syzbot+d69a7cc8c683c2cb7506@syzkaller.appspotmail.com
Reported-by: syzbot+483d6c9b9231ea7e1851@syzkaller.appspotmail.com
Signed-off-by: Ignacio Encinas <ignacio@iencinas.com>
---
Changes in v2:

Drop unnecessary READ_ONCE in p9_fd_request (that I added in v1)

-@@ net/9p/trans_fd.c: static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
- 
-   spin_lock(&m->req_lock);
- 
--  if (m->err < 0) {
-+  if (READ_ONCE(m->err) < 0) {
-           spin_unlock(&m->req_lock);
-           return m->err;
-   }

- Link to v1: https://lore.kernel.org/r/20250308-p9_conn_err_benign_data_race-v1-1-729e57d5832b@iencinas.com
---
 net/9p/trans_fd.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 196060dc6138af10e99ad04a76ee36a11f770c65..758eaeddf32b480e0a88914da66f923f7021e608 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -196,7 +196,8 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
 		return;
 	}
 
-	m->err = err;
+	WRITE_ONCE(m->err, err);
+	ASSERT_EXCLUSIVE_WRITER(m->err);
 
 	list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) {
 		list_move(&req->req_list, &cancel_list);
@@ -283,7 +284,7 @@ static void p9_read_work(struct work_struct *work)
 
 	m = container_of(work, struct p9_conn, rq);
 
-	if (m->err < 0)
+	if (READ_ONCE(m->err) < 0)
 		return;
 
 	p9_debug(P9_DEBUG_TRANS, "start mux %p pos %zd\n", m, m->rc.offset);
@@ -450,7 +451,7 @@ static void p9_write_work(struct work_struct *work)
 
 	m = container_of(work, struct p9_conn, wq);
 
-	if (m->err < 0) {
+	if (READ_ONCE(m->err) < 0) {
 		clear_bit(Wworksched, &m->wsched);
 		return;
 	}
@@ -622,7 +623,7 @@ static void p9_poll_mux(struct p9_conn *m)
 	__poll_t n;
 	int err = -ECONNRESET;
 
-	if (m->err < 0)
+	if (READ_ONCE(m->err) < 0)
 		return;
 
 	n = p9_fd_poll(m->client, NULL, &err);

---
base-commit: 2a520073e74fbb956b5564818fc5529dcc7e9f0e
change-id: 20250308-p9_conn_err_benign_data_race-2758fe8bbed0

Best regards,
-- 
Ignacio Encinas <ignacio@iencinas.com>
Re: [PATCH v2] 9p/trans_fd: mark concurrent read and writes to p9_conn->err
Posted by Dominique Martinet 9 months ago
Ignacio Encinas wrote on Thu, Mar 13, 2025 at 07:08:19PM +0100:
> Writes for the error value of a connection are spinlock-protected inside
> p9_conn_cancel, but lockless reads are present elsewhere to avoid
> performing unnecessary work after an error has been met.
> 
> Mark the write and lockless reads to make KCSAN happy. Mark the write as
> exclusive following the recommendation in "Lock-Protected Writes with
> Lockless Reads" in tools/memory-model/Documentation/access-marking.txt
> while we are at it.
> 
> Reported-by: syzbot+d69a7cc8c683c2cb7506@syzkaller.appspotmail.com
> Reported-by: syzbot+483d6c9b9231ea7e1851@syzkaller.appspotmail.com
> Signed-off-by: Ignacio Encinas <ignacio@iencinas.com>
> ---
> Changes in v2:
> 
> Drop unnecessary READ_ONCE in p9_fd_request (that I added in v1)

Ah, sorry; I think you misread my comment for v1 (or perhaps you
disagreed in the response and I misread that!)

I was thinking that style-wise it's better to access the err field
through READ/WRITE_ONCE everywhere, even if it's locked; so suggested
this diff from v1:
----
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index f163f6fc7354..65270c028f52 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -192,7 +192,7 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
 
        spin_lock(&m->req_lock);
 
-       if (m->err) {
+       if (READ_ONCE(m->err)) {
                spin_unlock(&m->req_lock);
                return;
        }
----

OTOH, looking at this again:
> --  if (m->err < 0) {
> -+  if (READ_ONCE(m->err) < 0) {
> -           spin_unlock(&m->req_lock);
> -           return m->err;

There's this access out of the lock so perhaps this should look like
this instead (with or without the READ_ONCE)

+       err = READ_ONCE(m->err);
+       if (err < 0) {
                spin_unlock(&m->req_lock);
-               return m->err;
+               return err;


Anyway, m->err is only written exactly once so it doesn't matter the
least in practice, and it looks like gcc generates exactly the same
thing (... even if I make that `return READ_ONCE(m->err)` which
surprises me a bit..), so this is just yak shaving.

I don't care all that much so I'll just pick this v2 as it's more
consistent, but feel free to send a v3 if you have an opinion, or if
someone else chips in.

Thanks!
-- 
Dominique
Re: [PATCH v2] 9p/trans_fd: mark concurrent read and writes to p9_conn->err
Posted by Ignacio Encinas Rubio 9 months ago
On 16/3/25 22:24, Dominique Martinet wrote:
> Ignacio Encinas wrote on Thu, Mar 13, 2025 at 07:08:19PM +0100:
>> Changes in v2:
>>
>> Drop unnecessary READ_ONCE in p9_fd_request (that I added in v1)
> 
> Ah, sorry; I think you misread my comment for v1 (or perhaps you
> disagreed in the response and I misread that!)

Yeah, I disagreed. Sorry about the misunderstanding. As these are not
strictly necessary I thought it would be best to not add them.

> I was thinking that style-wise it's better to access the err field
> through READ/WRITE_ONCE everywhere, even if it's locked; so suggested
> this diff from v1:
> ----
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index f163f6fc7354..65270c028f52 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -192,7 +192,7 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
>  
>         spin_lock(&m->req_lock);
>  
> -       if (m->err) {
> +       if (READ_ONCE(m->err)) {
>                 spin_unlock(&m->req_lock);
>                 return;
>         }
> ----

Got it. I'll follow your recommendation for the v3. I'll reflect it in
the commit message just in case someone does a git blame and wonders
about these couple of READ_ONCEs.

> 
> OTOH, looking at this again:
>> --  if (m->err < 0) {
>> -+  if (READ_ONCE(m->err) < 0) {
>> -           spin_unlock(&m->req_lock);
>> -           return m->err;
> 
> There's this access out of the lock so perhaps this should look like
> this instead (with or without the READ_ONCE)
> 
> +       err = READ_ONCE(m->err);
> +       if (err < 0) {
>                 spin_unlock(&m->req_lock);
> -               return m->err;
> +               return err;

Oops, this is embarrassing... Thanks for catching it.

> Anyway, m->err is only written exactly once so it doesn't matter the
> least in practice, 

I think this one deserves a fix, I disagree :)

> and it looks like gcc generates exactly the same
> thing (... even if I make that `return READ_ONCE(m->err)` which
> surprises me a bit..), so this is just yak shaving.

This is weird... I'll double check because it shouldn't generate the
same code as far as I know.

> I don't care all that much so I'll just pick this v2 as it's more
> consistent, but feel free to send a v3 if you have an opinion, or if
> someone else chips in.

To summarize, my plan is sending a v3 with the couple of READ_ONCE you
suggested and fixing the unlocked plain access.

Thanks!
Re: [PATCH v2] 9p/trans_fd: mark concurrent read and writes to p9_conn->err
Posted by Ignacio Encinas Rubio 9 months ago
Trimming CC to avoid spamming people (I hope that's ok)

Hello Dominique!

On 17/3/25 18:01, Ignacio Encinas Rubio wrote:
> On 16/3/25 22:24, Dominique Martinet wrote:
>> There's this access out of the lock so perhaps this should look like
>> this instead (with or without the READ_ONCE)
>>
>> +       err = READ_ONCE(m->err);
>> +       if (err < 0) {
>>                 spin_unlock(&m->req_lock);
>> -               return m->err;
>> +               return err;
> 
> Oops, this is embarrassing... Thanks for catching it.
> 
>> Anyway, m->err is only written exactly once so it doesn't matter the
>> least in practice, 
> 
> I think this one deserves a fix, I disagree :)
> 
>> and it looks like gcc generates exactly the same
>> thing (... even if I make that `return READ_ONCE(m->err)` which
>> surprises me a bit..), so this is just yak shaving.
> 
> This is weird... I'll double check because it shouldn't generate the
> same code as far as I know.

I had a bit of time to check this. I understood you said that (A)

	err = READ_ONCE(m->err);
	if (err < 0) {
		spin_unlock(&m->req_lock);
		return READ_ONCE(m->err);
	}

compiles to the same thing as (B)

	err = READ_ONCE(m->err);
	if (err < 0) {
		spin_unlock(&m->req_lock);
		return err;
	}

if you didn't say this, just ignore this email :). With gcc (GCC) 
14.2.1 20250110 (Red Hat 14.2.1-7) I'm seeing a difference:

``` (A)
movl	40(%rbx), %eax	# MEM[(const volatile int *)ts_13 + 40B], _14
# net/9p/trans_fd.c:679: 	if (err < 0) {
testl	%eax, %eax	# _14
js	.L323	#,

[...]

.L323:
# ./include/linux/spinlock.h:391: 	raw_spin_unlock(&lock->rlock);
	movq	%r12, %rdi	# _21,
	call	_raw_spin_unlock	#
# net/9p/trans_fd.c:681: 		return READ_ONCE(m->err);
	movl	40(%rbx), %eax	# MEM[(const volatile int *)ts_13 + 40B], <retval>
# net/9p/trans_fd.c:697: }
	popq	%rbx	#
	popq	%rbp	#
	popq	%r12	#
	jmp	__x86_return_thunk
```

``` (B)
movl	40(%rbx), %r12d	# MEM[(const volatile int *)ts_13 + 40B], <retval>
# net/9p/trans_fd.c:679: 	if (err < 0) {
testl	%r12d, %r12d	# <retval>
js	.L323	#,

[...]

.L323:
# ./include/linux/spinlock.h:391: 	raw_spin_unlock(&lock->rlock);
	movq	%r13, %rdi	# _20,
	call	_raw_spin_unlock	#
# net/9p/trans_fd.c:697: }
	movl	%r12d, %eax	# <retval>,
	popq	%rbx	#
	popq	%rbp	#
	popq	%r12	#
	popq	%r13	#
	jmp	__x86_return_thunk
```

(A) performs another memory read after the spinlock has been unlocked
while (B) reuses the value from the register. If you're using an old GCC
it might have bugs. I can't recall where exactly but I have seen links
to GCC bugs regarding this issues somewhere (LWN posts or kernel docs?)

To get the assembly I just got the command from .trans_fd.o.cmd and 
added "-S -fverbose-asm" (I can't really read x86 assembly)
Re: [PATCH v2] 9p/trans_fd: mark concurrent read and writes to p9_conn->err
Posted by Dominique Martinet 9 months ago
Ignacio Encinas Rubio wrote on Tue, Mar 18, 2025 at 10:21:52PM +0100:
> Trimming CC to avoid spamming people (I hope that's ok)

Probably fine :)

> > This is weird... I'll double check because it shouldn't generate the
> > same code as far as I know.
> 
> I had a bit of time to check this. I understood you said that (A)
> 
> 	err = READ_ONCE(m->err);
> 	if (err < 0) {
> 		spin_unlock(&m->req_lock);
> 		return READ_ONCE(m->err);
> 	}
> 
> compiles to the same thing as (B)
> 
> 	err = READ_ONCE(m->err);
> 	if (err < 0) {
> 		spin_unlock(&m->req_lock);
> 		return err;
> 	}
> 
> if you didn't say this, just ignore this email :).

Right, I checked by looking at `objdump -D` output after rebuilding the
.o.
I've just double-checked and these two are identical:
```
err = READ_ONCE(m->err);
if (err < 0) {
    spin_unlock(&m->req_lock);
    return m->err;
}
```
```
err = READ_ONCE(m->err);
if (err < 0) {
    spin_unlock(&m->req_lock);
    return READ_ONCE(m->err);
}
```
but now I'm double-checking returning the local variable
```
```
err = READ_ONCE(m->err);
if (err < 0) {
    spin_unlock(&m->req_lock);
    return err;
}
```
is properly different (and is the right thing to do), so I must have
checked the wrong .o, sorry for the confusion.
(that or the minor gcc upgrade I did this morning change this, but I'd
rather inclined to think I screwed up...)

> With gcc (GCC) 14.2.1 20250110 (Red Hat 14.2.1-7) I'm seeing a difference:
> (A) performs another memory read after the spinlock has been unlocked
> while (B) reuses the value from the register. If you're using an old GCC
> it might have bugs. I can't recall where exactly but I have seen links
> to GCC bugs regarding this issues somewhere (LWN posts or kernel docs?)

Right, I'm seeing one less mov as well

So, yeah, v3 with this would be great :)
I don't have a strong opinion on the READ_ONCE within the lock, so if
you think it's clearer without it then I'm fine with that.

Thanks for taking the time to check!
-- 
Dominique Martinet | Asmadeus