[v3] selftests/mm: Some cleanups from trying to run them

[PATCH v3 08/10] selftests/mm: Skip gup_longerm tests on weird filesystems

Posted by Brendan Jackman 11 months, 2 weeks ago

Some filesystems don't support funtract()ing unlinked files. They return
ENOENT. In that case, skip the test.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 tools/testing/selftests/mm/gup_longterm.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/gup_longterm.c b/tools/testing/selftests/mm/gup_longterm.c
index 879e9e4e8cce8127656fabe098abf7db5f6c5e23..494ec4102111b9c96fb4947b29c184735ceb8e1c 100644
--- a/tools/testing/selftests/mm/gup_longterm.c
+++ b/tools/testing/selftests/mm/gup_longterm.c
@@ -96,7 +96,15 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared)
 	int ret;
 
 	if (ftruncate(fd, size)) {
-		ksft_test_result_fail("ftruncate() failed (%s)\n", strerror(errno));
+		if (errno == ENOENT) {
+			/*
+			 * This can happen if the file has been unlinked and the
+			 * filesystem doesn't support truncating unlinked files.
+			 */
+			ksft_test_result_skip("ftruncate() failed with ENOENT\n");
+		} else {
+			ksft_test_result_fail("ftruncate() failed (%s)\n", strerror(errno));
+		}
 		return;
 	}
 

-- 
2.48.1.711.g2feabab25a-goog

Re: [PATCH v3 08/10] selftests/mm: Skip gup_longerm tests on weird filesystems

Posted by David Hildenbrand 11 months, 1 week ago

On 28.02.25 17:54, Brendan Jackman wrote:
> Some filesystems don't support funtract()ing unlinked files. They return
> ENOENT. In that case, skip the test.
> 

That's not documented in the man page, so is this a bug of these 
filesystems?

What are examples for these weird filesystems?

As we have the fstype available, we could instead simply reject more 
filesystems earlier. See fs_is_unknown().

> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
>   tools/testing/selftests/mm/gup_longterm.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/mm/gup_longterm.c b/tools/testing/selftests/mm/gup_longterm.c
> index 879e9e4e8cce8127656fabe098abf7db5f6c5e23..494ec4102111b9c96fb4947b29c184735ceb8e1c 100644
> --- a/tools/testing/selftests/mm/gup_longterm.c
> +++ b/tools/testing/selftests/mm/gup_longterm.c
> @@ -96,7 +96,15 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared)
>   	int ret;
>   
>   	if (ftruncate(fd, size)) {
> -		ksft_test_result_fail("ftruncate() failed (%s)\n", strerror(errno));
> +		if (errno == ENOENT) {
> +			/*
> +			 * This can happen if the file has been unlinked and the
> +			 * filesystem doesn't support truncating unlinked files.
> +			 */
> +			ksft_test_result_skip("ftruncate() failed with ENOENT\n");
> +		} else {
> +			ksft_test_result_fail("ftruncate() failed (%s)\n", strerror(errno));
> +		}
>   		return;
>   	}
>   
> 


-- 
Cheers,

David / dhildenb

Re: [PATCH v3 08/10] selftests/mm: Skip gup_longerm tests on weird filesystems

Posted by Brendan Jackman 11 months, 1 week ago

On Thu, Mar 06, 2025 at 10:28:09AM +0100, David Hildenbrand wrote:
> On 28.02.25 17:54, Brendan Jackman wrote:
> > Some filesystems don't support funtract()ing unlinked files. They return
> > ENOENT. In that case, skip the test.
> > 
> 
> That's not documented in the man page, so is this a bug of these
> filesystems?

Um...

unlink(2) does say:

  If the name was the last link to a file but any processes still have
  the file open, the file will remain in existence until the last file
  descriptor referring to it is closed.

And POSIX says

  If one or more processes have the file open when the last link is
  removed, the link shall be removed before unlink() returns, but the
  removal of the file contents shall be postponed until all references
  to the file are closed

I didn't call it a bug in the commit message because my impression was
always that filesystem semantics are broadly determined by vibes. But
looking at the above I do feel more confident that the "unlink isn't
delete" thing is actually a pretty solid expectation.

> What are examples for these weird filesystems?

My experience of the issue is with 9pfs. broonie reported on #mm that
NFS can display similar issues but I haven't hit it myself.

> As we have the fstype available, we could instead simply reject more
> filesystems earlier. See fs_is_unknown().

Oh. I didn't know this was so easy, I thought that checking the
filesystem type would require some awful walk to find the mountpoint
and join it against the mount list. (Now I think about it, I should
have recorded this rationale in the commit message, so you could
easily see my bogus reasoning).

If there's a syscall to just say "what FS is this file on please?"
we should just do that and explicitly denylist the systems that are
known to have issues. I will just do 9pfs for now. Maybe we can log
warning if the error shows up on systems that aren't listed, then if
someone does run into it on NFS they should get a strong clue about
what the problem is.

Thanks!

Re: [PATCH v3 08/10] selftests/mm: Skip gup_longerm tests on weird filesystems

Posted by David Hildenbrand 11 months, 1 week ago

On 06.03.25 13:42, Brendan Jackman wrote:
> On Thu, Mar 06, 2025 at 10:28:09AM +0100, David Hildenbrand wrote:
>> On 28.02.25 17:54, Brendan Jackman wrote:
>>> Some filesystems don't support funtract()ing unlinked files. They return
>>> ENOENT. In that case, skip the test.
>>>
>>
>> That's not documented in the man page, so is this a bug of these
>> filesystems?
> 
Note that I meant that ftruncate doesn't mention this in the man page.
The only occurrence is

"ENOENT The named file does not exist.", and that only applies to
truncate, not ftruncate.

> Um...
> 
> unlink(2) does say:
> 
>    If the name was the last link to a file but any processes still have
>    the file open, the file will remain in existence until the last file
>    descriptor referring to it is closed.
> 
> And POSIX says
> 
>    If one or more processes have the file open when the last link is
>    removed, the link shall be removed before unlink() returns, but the
>    removal of the file contents shall be postponed until all references
>    to the file are closed

Right, it's supposed to just stay around, it simply cannot be looked up anymore.

> I didn't call it a bug in the commit message because my impression was
> always that filesystem semantics are broadly determined by vibes. But
> looking at the above I do feel more confident that the "unlink isn't
> delete" thing is actually a pretty solid expectation.

I have a faint recollection that 9pfs is problematic with unlink ...
and indeed:

https://gitlab.com/qemu-project/qemu/-/issues/103

I'm not sure at this point what's expected to work and what not with
9pfs at this point.

> 
>> What are examples for these weird filesystems?
> 
> My experience of the issue is with 9pfs. broonie reported on #mm that
> NFS can display similar issues but I haven't hit it myself.
> >> As we have the fstype available, we could instead simply reject more
>> filesystems earlier. See fs_is_unknown().
> 
> Oh. I didn't know this was so easy, I thought that checking the
> filesystem type would require some awful walk to find the mountpoint
> and join it against the mount list. (Now I think about it, I should
> have recorded this rationale in the commit message, so you could
> easily see my bogus reasoning).
> 
> If there's a syscall to just say "what FS is this file on please?"
> we should just do that and explicitly denylist the systems that are
> known to have issues. I will just do 9pfs for now. Maybe we can log
> warning if the error shows up on systems that aren't listed, then if
> someone does run into it on NFS they should get a strong clue about
> what the problem is.

Yes, just skip 9pfs early, and mention in the commit message that 9pfs
has a history of being probematic with "use-after-unlink", maybe
mentioning the discussion I linked above.

Maybe something like this would work?

diff --git a/tools/testing/selftests/mm/gup_longterm.c b/tools/testing/selftests/mm/gup_longterm.c
index 9423ad439a614..349e40d3979f2 100644
--- a/tools/testing/selftests/mm/gup_longterm.c
+++ b/tools/testing/selftests/mm/gup_longterm.c
@@ -47,6 +47,16 @@ static __fsword_t get_fs_type(int fd)
         return ret ? 0 : fs.f_type;
  }
  
+static bool fs_is_problematic(__fsword_t fs_type)
+{
+       switch (fs_type) {
+       case V9FS_MAGIC:
+               return false;
+       default:
+               return true;
+       }
+}
+
  static bool fs_is_unknown(__fsword_t fs_type)
  {
         /*
@@ -95,6 +105,11 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared)
         char *mem;
         int ret;
  
+       if (fs_is_problematic(fs_type)) {
+               ksft_test_result_skip("problematic fs\n");
+               return;
+       }
+
         if (ftruncate(fd, size)) {
                 ksft_test_result_fail("ftruncate() failed\n");
                 return;


I am not 100% sure if V9FS_MAGIC is what we should be using? "man fstatfs" lists
most magic values.

-- 
Cheers,

David / dhildenb

Re: [PATCH v3 08/10] selftests/mm: Skip gup_longerm tests on weird filesystems

Posted by Brendan Jackman 11 months ago

On Thu, 6 Mar 2025 at 15:40, David Hildenbrand <david@redhat.com> wrote:
> Yes, just skip 9pfs early, and mention in the commit message that 9pfs
> has a history of being probematic with "use-after-unlink", maybe
> mentioning the discussion I linked above.
>
> Maybe something like this would work?
>
> diff --git a/tools/testing/selftests/mm/gup_longterm.c b/tools/testing/selftests/mm/gup_longterm.c
> index 9423ad439a614..349e40d3979f2 100644
> --- a/tools/testing/selftests/mm/gup_longterm.c
> +++ b/tools/testing/selftests/mm/gup_longterm.c
> @@ -47,6 +47,16 @@ static __fsword_t get_fs_type(int fd)
>          return ret ? 0 : fs.f_type;q
>   }
>
> +static bool fs_is_problematic(__fsword_t fs_type)
> +{
> +       switch (fs_type) {
> +       case V9FS_MAGIC:
> +               return false;
> +       default:
> +               return true;
> +       }
> +}

Ugh, some fun discoveries.

1. fstatfs() seems to have the same bug as ftruncate() i.e. it doesn't
work on unlinked files on 9pfs. This can be worked around by calling
it on the parent directory, but...

2. 9pfs seems to pass the f_type through from the host. So you can't
detect it this way anyway.

[3. I guess overlayfs & friends would also be an issue here although
that doesn't affect my usecase.]

Anyway, I think we would have to scrape /proc/mounts to do this :(

I think the proper way to deal with this is something like what I've
described here[0]. I.e. have a central facility as part of kselftest
to detect relevant characteristics of the platform. This logic could
be written in a proper programming language or in Bash, then the
relevant info could be passed in via the environment or whatever (e.g.
export KSFT_SYSENV_cwd_ftruncate_unlinked_works=1).

[0] https://lore.kernel.org/all/Z8WJEsEAwUPeMkqy@google.com/

But, to find an immediate way to get these tests working, I think we
are stuck with just peeking at errno and guessing for the time being.

Re: [PATCH v3 08/10] selftests/mm: Skip gup_longerm tests on weird filesystems

Posted by David Hildenbrand 11 months ago

On 11.03.25 14:00, Brendan Jackman wrote:
> On Thu, 6 Mar 2025 at 15:40, David Hildenbrand <david@redhat.com> wrote:
>> Yes, just skip 9pfs early, and mention in the commit message that 9pfs
>> has a history of being probematic with "use-after-unlink", maybe
>> mentioning the discussion I linked above.
>>
>> Maybe something like this would work?
>>
>> diff --git a/tools/testing/selftests/mm/gup_longterm.c b/tools/testing/selftests/mm/gup_longterm.c
>> index 9423ad439a614..349e40d3979f2 100644
>> --- a/tools/testing/selftests/mm/gup_longterm.c
>> +++ b/tools/testing/selftests/mm/gup_longterm.c
>> @@ -47,6 +47,16 @@ static __fsword_t get_fs_type(int fd)
>>           return ret ? 0 : fs.f_type;q
>>    }
>>
>> +static bool fs_is_problematic(__fsword_t fs_type)
>> +{
>> +       switch (fs_type) {
>> +       case V9FS_MAGIC:
>> +               return false;
>> +       default:
>> +               return true;
>> +       }
>> +}
> 
> Ugh, some fun discoveries.
> 
> 1. fstatfs() seems to have the same bug as ftruncate() i.e. it doesn't
> work on unlinked files on 9pfs. This can be worked around by calling
> it on the parent directory, but...

oO what a piece of bad software :(

> 
> 2. 9pfs seems to pass the f_type through from the host. So you can't
> detect it this way anyway.
> 
> [3. I guess overlayfs & friends would also be an issue here although
> that doesn't affect my usecase.]
> 
> Anyway, I think we would have to scrape /proc/mounts to do this :(
> 

The question I am asking myself: is this a 9pfs design bug or is it a 
9pfs hypervisor bug. Because we shouldn't try too hard to work around 
hypervisor bugs.

Which 9pfs implementation are you using in the hypervisor?

> I think the proper way to deal with this is something like what I've
> described here[0]. I.e. have a central facility as part of kselftest
> to detect relevant characteristics of the platform. This logic could
> be written in a proper programming language or in Bash, then the
> relevant info could be passed in via the environment or whatever (e.g.
> export KSFT_SYSENV_cwd_ftruncate_unlinked_works=1).
> 
> [0] https://lore.kernel.org/all/Z8WJEsEAwUPeMkqy@google.com/
> 
> But, to find an immediate way to get these tests working, I think we
> are stuck with just peeking at errno and guessing for the time being.



-- 
Cheers,

David / dhildenb

Re: [PATCH v3 08/10] selftests/mm: Skip gup_longerm tests on weird filesystems

Posted by Brendan Jackman 11 months ago

On Tue, Mar 11, 2025 at 08:53:02PM +0100, David Hildenbrand wrote:
> > 2. 9pfs seems to pass the f_type through from the host. So you can't
> > detect it this way anyway.
> > 
> > [3. I guess overlayfs & friends would also be an issue here although
> > that doesn't affect my usecase.]
> > 
> > Anyway, I think we would have to scrape /proc/mounts to do this :(
> > 
> 
> The question I am asking myself: is this a 9pfs design bug or is it a 9pfs
> hypervisor bug. Because we shouldn't try too hard to work around hypervisor
> bugs.
> 
> Which 9pfs implementation are you using in the hypervisor?

I'm using QEMU via virtme-ng. IIUC virtme-ng knows how to use viortfs
for the rootfs, but for individually-mounted directories with
--rwdir/--rodir it uses 9pfs unconditionally.

Even if it's a bug in QEMU, I think it is worth working around this
one way or another. QEMU by far the most practical way to run these
tests, and virtme-ng is probably the most popular/practical way to do
that. I think even if we are confident it's just a bunch of broken
code that isn't even in Linux, it's pragmatic to spend a certain
amount of energy on having green tests there.

(Also, this f_type thing might be totally intentional specified
filesystem behaviour, I don't know).

Re: [PATCH v3 08/10] selftests/mm: Skip gup_longerm tests on weird filesystems

Posted by David Hildenbrand 11 months ago

On 12.03.25 09:34, Brendan Jackman wrote:
> On Tue, Mar 11, 2025 at 08:53:02PM +0100, David Hildenbrand wrote:
>>> 2. 9pfs seems to pass the f_type through from the host. So you can't
>>> detect it this way anyway.
>>>
>>> [3. I guess overlayfs & friends would also be an issue here although
>>> that doesn't affect my usecase.]
>>>
>>> Anyway, I think we would have to scrape /proc/mounts to do this :(
>>>
>>
>> The question I am asking myself: is this a 9pfs design bug or is it a 9pfs
>> hypervisor bug. Because we shouldn't try too hard to work around hypervisor
>> bugs.
>>
>> Which 9pfs implementation are you using in the hypervisor?
> 
> I'm using QEMU via virtme-ng. IIUC virtme-ng knows how to use viortfs
> for the rootfs, but for individually-mounted directories with
> --rwdir/--rodir it uses 9pfs unconditionally.

Ah okay, that makes sense.

> 
> Even if it's a bug in QEMU, I think it is worth working around this
> one way or another. QEMU by far the most practical way to run these
> tests, and virtme-ng is probably the most popular/practical way to do
> that.

I'm afraid yes. Although allocating temp files form 9pfs is rather ... 
weird. :) One would assume that /tmp is usually backed by tmpfs. But 
well, a disto can do what it wants.

> I think even if we are confident it's just a bunch of broken
> code that isn't even in Linux, it's pragmatic to spend a certain
> amount of energy on having green tests there.
> 

Yeah, we're trying ...

> (Also, this f_type thing might be totally intentional specified
> filesystem behaviour, I don't know).

I assume it's broken in various ways to mimic that you are a file system 
which you are not.

Your approach is likely the easiest approach to deal with this 9pfs crap.

Can you document in the code+description better what we learned, and why 
we cannot even trust f_type with crappy 9pfs?

---
Cheers,

David / dhildenb

Re: [PATCH v3 08/10] selftests/mm: Skip gup_longerm tests on weird filesystems

Posted by Brendan Jackman 11 months ago

> > Even if it's a bug in QEMU, I think it is worth working around this
> > one way or another. QEMU by far the most practical way to run these
> > tests, and virtme-ng is probably the most popular/practical way to do
> > that.
> 
> I'm afraid yes. Although allocating temp files form 9pfs is rather ...
> weird. :) One would assume that /tmp is usually backed by tmpfs. But well, a
> disto can do what it wants.

Ah yeah but these tests also use mkstemp() in the CWD i.e. the
location of run_vmtests.sh, it isn't /tmp that is causing this at the
moment. (At some point I thought I was hitting the issue there too,
but I think I was mistaken, like just not reading the test logs
properly or something).

> > I think even if we are confident it's just a bunch of broken
> > code that isn't even in Linux, it's pragmatic to spend a certain
> > amount of energy on having green tests there.
> > 
> 
> Yeah, we're trying ...
> 
> > (Also, this f_type thing might be totally intentional specified
> > filesystem behaviour, I don't know).
> 
> I assume it's broken in various ways to mimic that you are a file system
> which you are not.
> 
> Your approach is likely the easiest approach to deal with this 9pfs crap.
> 
> Can you document in the code+description better what we learned, and why we
> cannot even trust f_type with crappy 9pfs?

Sure, I will be more verbose about it.

I've already sent v4 here:

https://lore.kernel.org/all/20250311-mm-selftests-v4-7-dec210a658f5@google.com/

So I will wait and see if there are any comments on the v4, if there
are I'll spin the extra commentary into v5 otherwise send it as a
followup, does that sound OK?

Re: [PATCH v3 08/10] selftests/mm: Skip gup_longerm tests on weird filesystems

Posted by David Hildenbrand 11 months ago

On 14.03.25 16:56, Brendan Jackman wrote:
>>> Even if it's a bug in QEMU, I think it is worth working around this
>>> one way or another. QEMU by far the most practical way to run these
>>> tests, and virtme-ng is probably the most popular/practical way to do
>>> that.
>>
>> I'm afraid yes. Although allocating temp files form 9pfs is rather ...
>> weird. :) One would assume that /tmp is usually backed by tmpfs. But well, a
>> disto can do what it wants.
> 
> Ah yeah but these tests also use mkstemp() in the CWD i.e. the
> location of run_vmtests.sh, it isn't /tmp that is causing this at the
> moment. (At some point I thought I was hitting the issue there too,
> but I think I was mistaken, like just not reading the test logs
> properly or something).

Ah, yes run_with_local_tmpfile() ... jep, I wrote that test, now my 
memory comes back; we wanted to test with actual filesystems (e.g., 
ext4, xfs) easily.

> 
>>> I think even if we are confident it's just a bunch of broken
>>> code that isn't even in Linux, it's pragmatic to spend a certain
>>> amount of energy on having green tests there.
>>>
>>
>> Yeah, we're trying ...
>>
>>> (Also, this f_type thing might be totally intentional specified
>>> filesystem behaviour, I don't know).
>>
>> I assume it's broken in various ways to mimic that you are a file system
>> which you are not.
>>
>> Your approach is likely the easiest approach to deal with this 9pfs crap.
>>
>> Can you document in the code+description better what we learned, and why we
>> cannot even trust f_type with crappy 9pfs?
> 
> Sure, I will be more verbose about it.
> 
> I've already sent v4 here:
> 
> https://lore.kernel.org/all/20250311-mm-selftests-v4-7-dec210a658f5@google.com/
> 
> So I will wait and see if there are any comments on the v4, if there
> are I'll spin the extra commentary into v5 otherwise send it as a
> followup, does that sound OK?


You can just ask Andrew to fixup the comment or description in a reply 
to the v4 patch. Andrew will let you know if he prefers a resend.

Thanks!


-- 
Cheers,

David / dhildenb

Re: [PATCH v3 08/10] selftests/mm: Skip gup_longerm tests on weird filesystems

Posted by David Hildenbrand 11 months ago

On 14.03.25 13:10, David Hildenbrand wrote:
> On 12.03.25 09:34, Brendan Jackman wrote:
>> On Tue, Mar 11, 2025 at 08:53:02PM +0100, David Hildenbrand wrote:
>>>> 2. 9pfs seems to pass the f_type through from the host. So you can't
>>>> detect it this way anyway.
>>>>
>>>> [3. I guess overlayfs & friends would also be an issue here although
>>>> that doesn't affect my usecase.]
>>>>
>>>> Anyway, I think we would have to scrape /proc/mounts to do this :(
>>>>
>>>
>>> The question I am asking myself: is this a 9pfs design bug or is it a 9pfs
>>> hypervisor bug. Because we shouldn't try too hard to work around hypervisor
>>> bugs.
>>>
>>> Which 9pfs implementation are you using in the hypervisor?
>>
>> I'm using QEMU via virtme-ng. IIUC virtme-ng knows how to use viortfs
>> for the rootfs, but for individually-mounted directories with
>> --rwdir/--rodir it uses 9pfs unconditionally.
> 
> Ah okay, that makes sense.
> 
>>
>> Even if it's a bug in QEMU, I think it is worth working around this
>> one way or another. QEMU by far the most practical way to run these
>> tests, and virtme-ng is probably the most popular/practical way to do
>> that.
> 
> I'm afraid yes. Although allocating temp files form 9pfs is rather ...
> weird. :) One would assume that /tmp is usually backed by tmpfs. But
> well, a disto can do what it wants.
> 
>> I think even if we are confident it's just a bunch of broken
>> code that isn't even in Linux, it's pragmatic to spend a certain
>> amount of energy on having green tests there.
>>
> 
> Yeah, we're trying ...
> 
>> (Also, this f_type thing might be totally intentional specified
>> filesystem behaviour, I don't know).
> 
> I assume it's broken in various ways to mimic that you are a file system
> which you are not.
> 
> Your approach is likely the easiest approach to deal with this 9pfs crap.
> 
> Can you document in the code+description better what we learned, and why
> we cannot even trust f_type with crappy 9pfs?

Staring a bit at that code, it's mostly 9p specific I think.

t14s: ~/git/linux s390x-file-thp2 $ git grep "= NFS_SUPER_MAGIC"
fs/nfs/super.c: buf->f_type = NFS_SUPER_MAGIC;
fs/nfs/super.c: sb->s_magic = NFS_SUPER_MAGIC;

t14s: ~/git/linux s390x-file-thp2 $ git grep "= V9FS_MAGIC"
fs/9p/vfs_super.c:      sb->s_magic = V9FS_MAGIC;
  $ git grep "f_type" | grep 9p
fs/9p/vfs_super.c:                      buf->f_type = rs.type;


-- 
Cheers,

David / dhildenb