[RFC PATCH 05/16] mm/madvise: define and use madvise_behavior struct for madvise_do_behavior()

SeongJae Park posted 16 patches 9 months, 2 weeks ago
There is a newer version of this series
[RFC PATCH 05/16] mm/madvise: define and use madvise_behavior struct for madvise_do_behavior()
Posted by SeongJae Park 9 months, 2 weeks ago
To implement batched tlb flushes for MADV_DONTNEED[_LOCKED] and
MADV_FREE, an mmu_gather object in addition to the behavior integer need
to be passed to the internal logics.  Define a struct for passing such
information together with the behavior value without increasing number
of parameters of all code paths towards the internal logic.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/madvise.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index c5e1a4d1df72..3346e593e07d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1665,9 +1665,15 @@ static bool is_memory_populate(int behavior)
 	}
 }
 
+struct madvise_behavior {
+	int behavior;
+};
+
 static int madvise_do_behavior(struct mm_struct *mm,
-		unsigned long start, size_t len_in, int behavior)
+		unsigned long start, size_t len_in,
+		struct madvise_behavior *madv_behavior)
 {
+	int behavior = madv_behavior->behavior;
 	struct blk_plug plug;
 	unsigned long end;
 	int error;
@@ -1762,13 +1768,14 @@ static int madvise_do_behavior(struct mm_struct *mm,
 int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior)
 {
 	int error;
+	struct madvise_behavior madv_behavior = {.behavior = behavior};
 
 	if (madvise_should_skip(start, len_in, behavior, &error))
 		return error;
 	error = madvise_lock(mm, behavior);
 	if (error)
 		return error;
-	error = madvise_do_behavior(mm, start, len_in, behavior);
+	error = madvise_do_behavior(mm, start, len_in, &madv_behavior);
 	madvise_unlock(mm, behavior);
 
 	return error;
@@ -1785,6 +1792,7 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
 {
 	ssize_t ret = 0;
 	size_t total_len;
+	struct madvise_behavior madv_behavior = {.behavior = behavior};
 
 	total_len = iov_iter_count(iter);
 
@@ -1800,7 +1808,8 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
 		if (madvise_should_skip(start, len_in, behavior, &error))
 			ret = error;
 		else
-			ret = madvise_do_behavior(mm, start, len_in, behavior);
+			ret = madvise_do_behavior(mm, start, len_in,
+					&madv_behavior);
 		/*
 		 * An madvise operation is attempting to restart the syscall,
 		 * but we cannot proceed as it would not be correct to repeat
-- 
2.39.5
Re: [RFC PATCH 05/16] mm/madvise: define and use madvise_behavior struct for madvise_do_behavior()
Posted by Shakeel Butt 9 months, 2 weeks ago
On Wed, Mar 05, 2025 at 10:16:00AM -0800, SeongJae Park wrote:
> To implement batched tlb flushes for MADV_DONTNEED[_LOCKED] and
> MADV_FREE, an mmu_gather object in addition to the behavior integer need
> to be passed to the internal logics.  Define a struct for passing such
> information together with the behavior value without increasing number
> of parameters of all code paths towards the internal logic.
> 
> Signed-off-by: SeongJae Park <sj@kernel.org>
> ---
>  mm/madvise.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index c5e1a4d1df72..3346e593e07d 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1665,9 +1665,15 @@ static bool is_memory_populate(int behavior)
>  	}
>  }
>  
> +struct madvise_behavior {
> +	int behavior;
> +};

I think the patch 5 to 8 are just causing churn and will be much better
to be a single patch. Also why not make the above struct a general
madvise visit param struct which can be used by both
madvise_vma_anon_name() and madvise_vma_behavior()?
Re: [RFC PATCH 05/16] mm/madvise: define and use madvise_behavior struct for madvise_do_behavior()
Posted by Shakeel Butt 9 months, 2 weeks ago
On Wed, Mar 05, 2025 at 01:02:15PM -0800, Shakeel Butt wrote:
> On Wed, Mar 05, 2025 at 10:16:00AM -0800, SeongJae Park wrote:
> > To implement batched tlb flushes for MADV_DONTNEED[_LOCKED] and
> > MADV_FREE, an mmu_gather object in addition to the behavior integer need
> > to be passed to the internal logics.  Define a struct for passing such
> > information together with the behavior value without increasing number
> > of parameters of all code paths towards the internal logic.
> > 
> > Signed-off-by: SeongJae Park <sj@kernel.org>
> > ---
> >  mm/madvise.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index c5e1a4d1df72..3346e593e07d 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -1665,9 +1665,15 @@ static bool is_memory_populate(int behavior)
> >  	}
> >  }
> >  
> > +struct madvise_behavior {
> > +	int behavior;
> > +};
> 
> I think the patch 5 to 8 are just causing churn and will be much better
> to be a single patch. Also why not make the above struct a general
> madvise visit param struct which can be used by both
> madvise_vma_anon_name() and madvise_vma_behavior()?

Something like following:

diff --git a/mm/madvise.c b/mm/madvise.c
index c5e1a4d1df72..cbc3817366a6 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -890,11 +890,17 @@ static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma,
 	return true;
 }
 
+struct madvise_walk_param {
+	int behavior;
+	struct anon_vma_name *anon_name;
+};
+
 static long madvise_dontneed_free(struct vm_area_struct *vma,
 				  struct vm_area_struct **prev,
 				  unsigned long start, unsigned long end,
-				  int behavior)
+				  struct madvise_walk_param *arg)
 {
+	int behavior = arg->behavior;
 	struct mm_struct *mm = vma->vm_mm;
 
 	*prev = vma;
@@ -1249,8 +1255,9 @@ static long madvise_guard_remove(struct vm_area_struct *vma,
 static int madvise_vma_behavior(struct vm_area_struct *vma,
 				struct vm_area_struct **prev,
 				unsigned long start, unsigned long end,
-				unsigned long behavior)
+				struct madvise_walk_param *arg)
 {
+	int behavior = arg->behavior;
 	int error;
 	struct anon_vma_name *anon_name;
 	unsigned long new_flags = vma->vm_flags;
@@ -1270,7 +1277,7 @@ static int madvise_vma_behavior(struct vm_area_struct *vma,
 	case MADV_FREE:
 	case MADV_DONTNEED:
 	case MADV_DONTNEED_LOCKED:
-		return madvise_dontneed_free(vma, prev, start, end, behavior);
+		return madvise_dontneed_free(vma, prev, start, end, arg);
 	case MADV_NORMAL:
 		new_flags = new_flags & ~VM_RAND_READ & ~VM_SEQ_READ;
 		break;
@@ -1462,10 +1469,10 @@ static bool process_madvise_remote_valid(int behavior)
  */
 static
 int madvise_walk_vmas(struct mm_struct *mm, unsigned long start,
-		      unsigned long end, unsigned long arg,
+		      unsigned long end, struct madvise_walk_param *arg,
 		      int (*visit)(struct vm_area_struct *vma,
 				   struct vm_area_struct **prev, unsigned long start,
-				   unsigned long end, unsigned long arg))
+				   unsigned long end, struct madvise_walk_param *arg))
 {
 	struct vm_area_struct *vma;
 	struct vm_area_struct *prev;
@@ -1523,7 +1530,7 @@ int madvise_walk_vmas(struct mm_struct *mm, unsigned long start,
 static int madvise_vma_anon_name(struct vm_area_struct *vma,
 				 struct vm_area_struct **prev,
 				 unsigned long start, unsigned long end,
-				 unsigned long anon_name)
+				 struct madvise_walk_param *arg)
 {
 	int error;
 
@@ -1532,7 +1539,7 @@ static int madvise_vma_anon_name(struct vm_area_struct *vma,
 		return -EBADF;
 
 	error = madvise_update_vma(vma, prev, start, end, vma->vm_flags,
-				   (struct anon_vma_name *)anon_name);
+				   arg->anon_name);
 
 	/*
 	 * madvise() returns EAGAIN if kernel resources, such as
@@ -1548,6 +1555,7 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
 {
 	unsigned long end;
 	unsigned long len;
+	struct madvise_walk_param param = { .anon_name = anon_name };
 
 	if (start & ~PAGE_MASK)
 		return -EINVAL;
@@ -1564,7 +1572,7 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
 	if (end == start)
 		return 0;
 
-	return madvise_walk_vmas(mm, start, end, (unsigned long)anon_name,
+	return madvise_walk_vmas(mm, start, end, &param,
 				 madvise_vma_anon_name);
 }
 #endif /* CONFIG_ANON_VMA_NAME */
@@ -1666,8 +1674,10 @@ static bool is_memory_populate(int behavior)
 }
 
 static int madvise_do_behavior(struct mm_struct *mm,
-		unsigned long start, size_t len_in, int behavior)
+		unsigned long start, size_t len_in,
+		struct madvise_walk_param *arg)
 {
+	int behavior = arg->behavior;
 	struct blk_plug plug;
 	unsigned long end;
 	int error;
@@ -1681,7 +1691,7 @@ static int madvise_do_behavior(struct mm_struct *mm,
 	if (is_memory_populate(behavior))
 		error = madvise_populate(mm, start, end, behavior);
 	else
-		error = madvise_walk_vmas(mm, start, end, behavior,
+		error = madvise_walk_vmas(mm, start, end, arg,
 					  madvise_vma_behavior);
 	blk_finish_plug(&plug);
 	return error;
@@ -1762,13 +1772,14 @@ static int madvise_do_behavior(struct mm_struct *mm,
 int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior)
 {
 	int error;
+	struct madvise_walk_param arg = {.behavior = behavior};
 
 	if (madvise_should_skip(start, len_in, behavior, &error))
 		return error;
 	error = madvise_lock(mm, behavior);
 	if (error)
 		return error;
-	error = madvise_do_behavior(mm, start, len_in, behavior);
+	error = madvise_do_behavior(mm, start, len_in, &arg);
 	madvise_unlock(mm, behavior);
 
 	return error;
@@ -1785,6 +1796,7 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
 {
 	ssize_t ret = 0;
 	size_t total_len;
+	struct madvise_walk_param arg = {.behavior = behavior};
 
 	total_len = iov_iter_count(iter);
 
@@ -1800,7 +1812,7 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
 		if (madvise_should_skip(start, len_in, behavior, &error))
 			ret = error;
 		else
-			ret = madvise_do_behavior(mm, start, len_in, behavior);
+			ret = madvise_do_behavior(mm, start, len_in, &arg);
 		/*
 		 * An madvise operation is attempting to restart the syscall,
 		 * but we cannot proceed as it would not be correct to repeat
-- 
2.43.5
Re: [RFC PATCH 05/16] mm/madvise: define and use madvise_behavior struct for madvise_do_behavior()
Posted by SeongJae Park 9 months, 2 weeks ago
On Wed, 5 Mar 2025 13:40:17 -0800 Shakeel Butt <shakeel.butt@linux.dev> wrote:

> On Wed, Mar 05, 2025 at 01:02:15PM -0800, Shakeel Butt wrote:
> > On Wed, Mar 05, 2025 at 10:16:00AM -0800, SeongJae Park wrote:
> > > To implement batched tlb flushes for MADV_DONTNEED[_LOCKED] and
> > > MADV_FREE, an mmu_gather object in addition to the behavior integer need
> > > to be passed to the internal logics.  Define a struct for passing such
> > > information together with the behavior value without increasing number
> > > of parameters of all code paths towards the internal logic.
> > > 
> > > Signed-off-by: SeongJae Park <sj@kernel.org>
> > > ---
> > >  mm/madvise.c | 15 ++++++++++++---
> > >  1 file changed, 12 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > index c5e1a4d1df72..3346e593e07d 100644
> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -1665,9 +1665,15 @@ static bool is_memory_populate(int behavior)
> > >  	}
> > >  }
> > >  
> > > +struct madvise_behavior {
> > > +	int behavior;
> > > +};
> > 
> > I think the patch 5 to 8 are just causing churn and will be much better
> > to be a single patch.

I agree.  I will do so in the next version.

> > Also why not make the above struct a general
> > madvise visit param struct which can be used by both
> > madvise_vma_anon_name() and madvise_vma_behavior()?

I was also thinking we can further extend the struct for more clean and
efficient code, so agree to your fundamental point.  I ended up desiring to
focus on tlb flushes batching at the moment, though.

However, what you are suggesting is bit different from what I was thinking.  To
me, madvise_walk_vmas() is for general purposes and hence the visit functions
should be able to recieve an argument of arbitrary types.  It is true that
there are only two visit functions, but they seems doing very different purpose
works to me, so having a same type argument doesn't seem very straightforward
to understand its usage, nor efficient to my humble viewpoint.

> 
> Something like following:
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index c5e1a4d1df72..cbc3817366a6 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -890,11 +890,17 @@ static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma,
>  	return true;
>  }
>  
> +struct madvise_walk_param {
> +	int behavior;
> +	struct anon_vma_name *anon_name;
> +};

Only madvise_vma_behavior() will use 'behavior' field.  And only
madvise_vma_anon_name() will use 'anon_name' field.  But I will have to assume
both function _might_ use both fields when reading the madvise_walk_vmas()
function code.  That doesn't make my humble code reading that simple and
straightforward.

Also populating and passing a data structure that has data that would not
really be used seems not very efficient to me.

[...]
> @@ -1666,8 +1674,10 @@ static bool is_memory_populate(int behavior)
>  }
>  
>  static int madvise_do_behavior(struct mm_struct *mm,
> -		unsigned long start, size_t len_in, int behavior)
> +		unsigned long start, size_t len_in,
> +		struct madvise_walk_param *arg)
>  {
> +	int behavior = arg->behavior;
>  	struct blk_plug plug;
>  	unsigned long end;
>  	int error;
> @@ -1681,7 +1691,7 @@ static int madvise_do_behavior(struct mm_struct *mm,
>  	if (is_memory_populate(behavior))
>  		error = madvise_populate(mm, start, end, behavior);

'arg' is for madvise_walk_vmas() visit function, but we're using it as a
capsule for passing an information that will be used for madvise_do_behavior().
This also seems not very straightforward to my humble perspective.

I have no strong opinion and maybe my humble taste is too peculiar.  But, if
this is not a blocker for tlb flushes batcing, I'd like to suggest keep this
part as is for now, and revisit for more code cleanup later.  What do you
think, Shakeel?


Thanks,
SJ
Re: [RFC PATCH 05/16] mm/madvise: define and use madvise_behavior struct for madvise_do_behavior()
Posted by Shakeel Butt 9 months, 2 weeks ago
On Wed, Mar 05, 2025 at 03:56:32PM -0800, SeongJae Park wrote:
> On Wed, 5 Mar 2025 13:40:17 -0800 Shakeel Butt <shakeel.butt@linux.dev> wrote:
> 
> >  
> > +struct madvise_walk_param {
> > +	int behavior;
> > +	struct anon_vma_name *anon_name;
> > +};
> 
> Only madvise_vma_behavior() will use 'behavior' field.  And only
> madvise_vma_anon_name() will use 'anon_name' field.  But I will have to assume
> both function _might_ use both fields when reading the madvise_walk_vmas()
> function code.  That doesn't make my humble code reading that simple and
> straightforward.
> 
> Also populating and passing a data structure that has data that would not
> really be used seems not very efficient to me.
> 

This is not a new pattern. You can find a lot of examples in kernel
where a struct encapsulates multiple fields and its pointer is passed
around rather than those fields (or subset of them). You can check out
zap_details, vm_fault, fs_parameter. If some fields are mutually
exclusive you can have union in the struct. Also I am not sure what do
you mean by "not efficient" here. Inefficient in what sense? Unused
memory or something else?

> [...]
> > @@ -1666,8 +1674,10 @@ static bool is_memory_populate(int behavior)
> >  }
> >  
> >  static int madvise_do_behavior(struct mm_struct *mm,
> > -		unsigned long start, size_t len_in, int behavior)
> > +		unsigned long start, size_t len_in,
> > +		struct madvise_walk_param *arg)
> >  {
> > +	int behavior = arg->behavior;
> >  	struct blk_plug plug;
> >  	unsigned long end;
> >  	int error;
> > @@ -1681,7 +1691,7 @@ static int madvise_do_behavior(struct mm_struct *mm,
> >  	if (is_memory_populate(behavior))
> >  		error = madvise_populate(mm, start, end, behavior);
> 
> 'arg' is for madvise_walk_vmas() visit function, but we're using it as a
> capsule for passing an information that will be used for madvise_do_behavior().
> This also seems not very straightforward to my humble perspective.

Here we can keep behavior as parameter to madvise_walk_vmas() and define
struct madvise_walk_param inside it with the passed behavior. Anyways
this is a very common pattern in kernel.

> 
> I have no strong opinion and maybe my humble taste is too peculiar.  But, if
> this is not a blocker for tlb flushes batcing, I'd like to suggest keep this
> part as is for now, and revisit for more code cleanup later.  What do you
> think, Shakeel?
> 

Squashing patches 5 to 8 into one is the main request from me. My other
suggestion you can ignore but let's see what other says.
Re: [RFC PATCH 05/16] mm/madvise: define and use madvise_behavior struct for madvise_do_behavior()
Posted by SeongJae Park 9 months, 2 weeks ago
On Wed, 5 Mar 2025 19:37:28 -0800 Shakeel Butt <shakeel.butt@linux.dev> wrote:

> On Wed, Mar 05, 2025 at 03:56:32PM -0800, SeongJae Park wrote:
> > On Wed, 5 Mar 2025 13:40:17 -0800 Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > 
> > >  
> > > +struct madvise_walk_param {
> > > +	int behavior;
> > > +	struct anon_vma_name *anon_name;
> > > +};
> > 
> > Only madvise_vma_behavior() will use 'behavior' field.  And only
> > madvise_vma_anon_name() will use 'anon_name' field.  But I will have to assume
> > both function _might_ use both fields when reading the madvise_walk_vmas()
> > function code.  That doesn't make my humble code reading that simple and
> > straightforward.
> > 
> > Also populating and passing a data structure that has data that would not
> > really be used seems not very efficient to me.
> > 
> 
> This is not a new pattern. You can find a lot of examples in kernel
> where a struct encapsulates multiple fields and its pointer is passed
> around rather than those fields (or subset of them). You can check out
> zap_details, vm_fault, fs_parameter. If some fields are mutually
> exclusive you can have union in the struct.

You're right.  But we do so for readability and/or efficiency, right?  And what
I'm saying is that I'm not very sure if this change is making code much easier
to read and/or efficient.

> Also I am not sure what do
> you mean by "not efficient" here. Inefficient in what sense? Unused
> memory or something else?

I was meaning unused memory in this context.  It is a negligible extent, of
course.  Nevertheless, what I wanted to say is not that the change will degrade
efficiency too much, but I'm not clearly seeing the efficiency benefit that
explains why the change is something that should be made.

> 
> > [...]
> > > @@ -1666,8 +1674,10 @@ static bool is_memory_populate(int behavior)
> > >  }
> > >  
> > >  static int madvise_do_behavior(struct mm_struct *mm,
> > > -		unsigned long start, size_t len_in, int behavior)
> > > +		unsigned long start, size_t len_in,
> > > +		struct madvise_walk_param *arg)
> > >  {
> > > +	int behavior = arg->behavior;
> > >  	struct blk_plug plug;
> > >  	unsigned long end;
> > >  	int error;
> > > @@ -1681,7 +1691,7 @@ static int madvise_do_behavior(struct mm_struct *mm,
> > >  	if (is_memory_populate(behavior))
> > >  		error = madvise_populate(mm, start, end, behavior);
> > 
> > 'arg' is for madvise_walk_vmas() visit function, but we're using it as a
> > capsule for passing an information that will be used for madvise_do_behavior().
> > This also seems not very straightforward to my humble perspective.
> 
> Here we can keep behavior as parameter to madvise_walk_vmas() and define
> struct madvise_walk_param inside it with the passed behavior. Anyways
> this is a very common pattern in kernel.
> 
> > 
> > I have no strong opinion and maybe my humble taste is too peculiar.  But, if
> > this is not a blocker for tlb flushes batcing, I'd like to suggest keep this
> > part as is for now, and revisit for more code cleanup later.  What do you
> > think, Shakeel?
> > 
> 
> Squashing patches 5 to 8 into one is the main request from me. My other
> suggestion you can ignore but let's see what other says.

Makes sense.  Thank you for your inputs :)


Thanks,
SJ