[PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check

Oleg Nesterov posted 6 patches 2 years, 3 months ago
[PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
Posted by Oleg Nesterov 2 years, 3 months ago
Unless I am notally confused it is wrong. We are going to return or
skip next_task so we need to check next_task-files, not task->files.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/bpf/task_iter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 1589ec3faded..2264870ae3fc 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -82,7 +82,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 
 	common->pid_visiting = *tid;
 
-	if (skip_if_dup_files && task->files == task->group_leader->files) {
+	if (skip_if_dup_files && next_task->files == next_task->group_leader->files) {
 		task = next_task;
 		goto retry;
 	}
-- 
2.25.1.362.g51ebf55
Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
Posted by Yonghong Song 2 years, 3 months ago

On 8/25/23 9:19 AM, Oleg Nesterov wrote:
> Unless I am notally confused it is wrong. We are going to return or
> skip next_task so we need to check next_task-files, not task->files.

Thanks for capturing this. This is indeed an oversight.

Acked-by: Yonghong Song <yonghong.song@linux.dev>

> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>   kernel/bpf/task_iter.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 1589ec3faded..2264870ae3fc 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -82,7 +82,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
>   
>   	common->pid_visiting = *tid;
>   
> -	if (skip_if_dup_files && task->files == task->group_leader->files) {
> +	if (skip_if_dup_files && next_task->files == next_task->group_leader->files) {
>   		task = next_task;
>   		goto retry;
>   	}
Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
Posted by Oleg Nesterov 2 years, 3 months ago
Forgot to mention in the changelog...

In any case this doesn't look right. ->group_leader can exit before other
threads, call exit_files(), and in this case task_group_seq_get_next() will
check task->files == NULL.

On 08/25, Oleg Nesterov wrote:
>
> Unless I am notally confused it is wrong. We are going to return or
> skip next_task so we need to check next_task-files, not task->files.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/bpf/task_iter.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 1589ec3faded..2264870ae3fc 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -82,7 +82,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
>
>  	common->pid_visiting = *tid;
>
> -	if (skip_if_dup_files && task->files == task->group_leader->files) {
> +	if (skip_if_dup_files && next_task->files == next_task->group_leader->files) {
>  		task = next_task;
>  		goto retry;
>  	}
> --
> 2.25.1.362.g51ebf55
Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
Posted by Yonghong Song 2 years, 3 months ago

On 8/25/23 10:04 AM, Oleg Nesterov wrote:
> Forgot to mention in the changelog...
> 
> In any case this doesn't look right. ->group_leader can exit before other
> threads, call exit_files(), and in this case task_group_seq_get_next() will
> check task->files == NULL.

It is okay. This won't be affecting correctness. We will end with
calling bpf program for 'next_task'.

> 
> On 08/25, Oleg Nesterov wrote:
>>
>> Unless I am notally confused it is wrong. We are going to return or
>> skip next_task so we need to check next_task-files, not task->files.
>>
>> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>> ---
>>   kernel/bpf/task_iter.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index 1589ec3faded..2264870ae3fc 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -82,7 +82,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
>>
>>   	common->pid_visiting = *tid;
>>
>> -	if (skip_if_dup_files && task->files == task->group_leader->files) {
>> +	if (skip_if_dup_files && next_task->files == next_task->group_leader->files) {
>>   		task = next_task;
>>   		goto retry;
>>   	}
>> --
>> 2.25.1.362.g51ebf55
> 
>
Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
Posted by Oleg Nesterov 2 years, 3 months ago
On 08/25, Yonghong Song wrote:
>
> On 8/25/23 10:04 AM, Oleg Nesterov wrote:
> >Forgot to mention in the changelog...
> >
> >In any case this doesn't look right. ->group_leader can exit before other
> >threads, call exit_files(), and in this case task_group_seq_get_next() will
> >check task->files == NULL.
>
> It is okay. This won't be affecting correctness. We will end with
> calling bpf program for 'next_task'.

Well, I didn't mean it is necessarily wrong, I simply do not know.

But let's suppose that we have a thread group with the main thread M + 1000
sub-threads. In the likely case they all have the same ->files, CLONE_THREAD
without CLONE_FILES is not that common.

Let's assume the BPF_TASK_ITER_TGID case for simplicity.

Now lets look at task_file_seq_get_next() which passes skip_if_dup_files == 1
to task_seq_get_next() and thus to task_group_seq_get_next().

Now, in this case task_seq_get_next() will return non-NULL only once (OK, unless
task_file_seq_ops.stop() was called), it will return the group leader M first,
then after task_file_seq_get_next() "reports" all the fd's of M and increments
info->tid, the next task_seq_get_next(&info->tid, true) should return NULL because
of the skip_if_dup_files check in task_group_seq_get_next().

Right?

But. if the group leader M exits then M->files == NULL. And in this case
task_seq_get_next() will need to "inspect" all the sub-threads even if they all
have the same ->files pointer.

No?

Again, I am not saying this is a bug and quite possibly I misread this code, but
in any case the skip_if_dup_files logic looks sub-optimal and confusing to me.

Nevermind, please forget. This is minor even if I am right.

Thanks for rewiev!

Oleg.
Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
Posted by Yonghong Song 2 years, 3 months ago

On 8/27/23 1:19 PM, Oleg Nesterov wrote:
> On 08/25, Yonghong Song wrote:
>>
>> On 8/25/23 10:04 AM, Oleg Nesterov wrote:
>>> Forgot to mention in the changelog...
>>>
>>> In any case this doesn't look right. ->group_leader can exit before other
>>> threads, call exit_files(), and in this case task_group_seq_get_next() will
>>> check task->files == NULL.
>>
>> It is okay. This won't be affecting correctness. We will end with
>> calling bpf program for 'next_task'.
> 
> Well, I didn't mean it is necessarily wrong, I simply do not know.
> 
> But let's suppose that we have a thread group with the main thread M + 1000
> sub-threads. In the likely case they all have the same ->files, CLONE_THREAD
> without CLONE_FILES is not that common.
> 
> Let's assume the BPF_TASK_ITER_TGID case for simplicity.
> 
> Now lets look at task_file_seq_get_next() which passes skip_if_dup_files == 1
> to task_seq_get_next() and thus to task_group_seq_get_next().
> 
> Now, in this case task_seq_get_next() will return non-NULL only once (OK, unless
> task_file_seq_ops.stop() was called), it will return the group leader M first,
> then after task_file_seq_get_next() "reports" all the fd's of M and increments
> info->tid, the next task_seq_get_next(&info->tid, true) should return NULL because
> of the skip_if_dup_files check in task_group_seq_get_next().
> 
> Right?
> 
> But. if the group leader M exits then M->files == NULL. And in this case
> task_seq_get_next() will need to "inspect" all the sub-threads even if they all
> have the same ->files pointer.

That is correct. I do not have practical experience on how much
possibility this scenario may happen. I assume it should be very low.
If this is not the case, we might need to revisit.

> 
> No?
> 
> Again, I am not saying this is a bug and quite possibly I misread this code, but
> in any case the skip_if_dup_files logic looks sub-optimal and confusing to me.
> 
> Nevermind, please forget. This is minor even if I am right.
> 
> Thanks for rewiev!
> 
> Oleg.
>
Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
Posted by Oleg Nesterov 2 years, 3 months ago
On 08/27, Yonghong Song wrote:
>
> On 8/27/23 1:19 PM, Oleg Nesterov wrote:
> >
> >But. if the group leader M exits then M->files == NULL. And in this case
> >task_seq_get_next() will need to "inspect" all the sub-threads even if they all
> >have the same ->files pointer.
>
> That is correct. I do not have practical experience on how much
> possibility this scenario may happen. I assume it should be very low.

Yes. I just tried to explain why the ->files check looks confusing to me.
Nevermind.

Could you review 6/6 as well?

Should I fold 1-5 into a single patch? I tried to document every change
and simplify the review, but I do not want to blow the git history.

Oleg.
Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
Posted by Yonghong Song 2 years, 3 months ago

On 8/28/23 3:54 AM, Oleg Nesterov wrote:
> On 08/27, Yonghong Song wrote:
>>
>> On 8/27/23 1:19 PM, Oleg Nesterov wrote:
>>>
>>> But. if the group leader M exits then M->files == NULL. And in this case
>>> task_seq_get_next() will need to "inspect" all the sub-threads even if they all
>>> have the same ->files pointer.
>>
>> That is correct. I do not have practical experience on how much
>> possibility this scenario may happen. I assume it should be very low.
> 
> Yes. I just tried to explain why the ->files check looks confusing to me.
> Nevermind.
> 
> Could you review 6/6 as well?

I think we can wait patch 6/6 after
    https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
is merged.

> 
> Should I fold 1-5 into a single patch? I tried to document every change
> and simplify the review, but I do not want to blow the git history.

Currently, because patch 6, the whole patch set cannot be tested by
bpf CI since it has a build failure:
   https://github.com/kernel-patches/bpf/pull/5580
I suggest you get patch 1-5 and resubmit with tag like
   "bpf-next v2"
   [Patch bpf-next v2 x/5] ...
so CI can build with different architectures and compilers to
ensure everything builds and runs fine.

> 
> Oleg.
>
Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
Posted by Oleg Nesterov 2 years, 3 months ago
On 08/28, Yonghong Song wrote:
>
> On 8/28/23 3:54 AM, Oleg Nesterov wrote:
> >
> >Could you review 6/6 as well?
>
> I think we can wait patch 6/6 after
>    https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
> is merged.

OK.

> >Should I fold 1-5 into a single patch? I tried to document every change
> >and simplify the review, but I do not want to blow the git history.
>
> Currently, because patch 6, the whole patch set cannot be tested by
> bpf CI since it has a build failure:
>   https://github.com/kernel-patches/bpf/pull/5580

Heh. I thought this is obvious. I thought you can test 1-5 without 6/6
and _review_ 6/6.

I simply can't understand how can this pull/5580 come when I specially
mentioned

	> 6/6 obviously depends on
	>
	>	[PATCH 1/2] introduce __next_thread(), fix next_tid() vs exec() race
	>	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
	>
	> which was not merged yet.

in 0/6.

> I suggest you get patch 1-5 and resubmit with tag like
>   "bpf-next v2"
>   [Patch bpf-next v2 x/5] ...
> so CI can build with different architectures and compilers to
> ensure everything builds and runs fine.

I think we can wait for

	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/

as you suggest above, then I'll send the s/next_thread/__next_thread/
oneliner without 1-5. I no longer think it makes sense to try to cleanup
the poor task_group_seq_get_next() when IMHO the whole task_iter logic
needs the complete rewrite. Yes, yes, I know, it is very easy to blame
someone else's code, sorry can't resist ;)

The only "fix" in this series is 3/6, but this code has more serious
bugs, so I guess we can forget it.

Oleg.
Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
Posted by Yonghong Song 2 years, 3 months ago

On 8/30/23 7:54 PM, Oleg Nesterov wrote:
> On 08/28, Yonghong Song wrote:
>>
>> On 8/28/23 3:54 AM, Oleg Nesterov wrote:
>>>
>>> Could you review 6/6 as well?
>>
>> I think we can wait patch 6/6 after
>>     https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
>> is merged.
> 
> OK.
> 
>>> Should I fold 1-5 into a single patch? I tried to document every change
>>> and simplify the review, but I do not want to blow the git history.
>>
>> Currently, because patch 6, the whole patch set cannot be tested by
>> bpf CI since it has a build failure:
>>    https://github.com/kernel-patches/bpf/pull/5580
> 
> Heh. I thought this is obvious. I thought you can test 1-5 without 6/6
> and _review_ 6/6.
> 
> I simply can't understand how can this pull/5580 come when I specially
> mentioned
> 
> 	> 6/6 obviously depends on
> 	>
> 	>	[PATCH 1/2] introduce __next_thread(), fix next_tid() vs exec() race
> 	>	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
> 	>
> 	> which was not merged yet.
> 
> in 0/6.

The process in CI for testing is fully automated, and it does
not look at commit message. That is why it takes the whole
series. This is true for all other patch set.

> 
>> I suggest you get patch 1-5 and resubmit with tag like
>>    "bpf-next v2"
>>    [Patch bpf-next v2 x/5] ...
>> so CI can build with different architectures and compilers to
>> ensure everything builds and runs fine.
> 
> I think we can wait for
> 
> 	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
> 
> as you suggest above, then I'll send the s/next_thread/__next_thread/
> oneliner without 1-5. I no longer think it makes sense to try to cleanup
> the poor task_group_seq_get_next() when IMHO the whole task_iter logic
> needs the complete rewrite. Yes, yes, I know, it is very easy to blame
> someone else's code, sorry can't resist ;)
> 
> The only "fix" in this series is 3/6, but this code has more serious
> bugs, so I guess we can forget it.
> 
> Oleg.
>
Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
Posted by Oleg Nesterov 2 years, 3 months ago
On 08/31, Yonghong Song wrote:
>
> On 8/30/23 7:54 PM, Oleg Nesterov wrote:
> >
> >I simply can't understand how can this pull/5580 come when I specially
> >mentioned
> >
> >	> 6/6 obviously depends on
> >	>
> >	>	[PATCH 1/2] introduce __next_thread(), fix next_tid() vs exec() race
> >	>	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
> >	>
> >	> which was not merged yet.
> >
> >in 0/6.
>
> The process in CI for testing is fully automated,

Ah, OK, sorry then.

> >>I suggest you get patch 1-5 and resubmit with tag like
> >>   "bpf-next v2"
> >>   [Patch bpf-next v2 x/5] ...
> >>so CI can build with different architectures and compilers to
> >>ensure everything builds and runs fine.

OK, will do when I have time.

Thanks,

Oleg.