[RFC PATCH] xen/gntdev: reduce stack usage by dynamically allocating gntdev_copy_batch

Abinash Singh posted 1 patch 3 months, 1 week ago
drivers/xen/gntdev.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
[RFC PATCH] xen/gntdev: reduce stack usage by dynamically allocating gntdev_copy_batch
Posted by Abinash Singh 3 months, 1 week ago
While building the kernel with LLVM, a warning was reported due to
excessive stack usage in `gntdev_ioctl`:

	drivers/xen/gntdev.c:991: warning: stack frame size (1160) exceeds limit (1024) in function 'gntdev_ioctl'

Further analysis revealed that the large stack frame was caused by
`struct gntdev_copy_batch`, which was declared on the stack inside
`gntdev_ioctl_grant_copy()`. Since this function was inlined into
`gntdev_ioctl`, the stack usage was attributed to the latter.

This patch fixes the issue by dynamically allocating `gntdev_copy_batch`
using `kmalloc()`, which significantly reduces the stack footprint and
eliminates the warning.

This approach is consistent with similar fixes upstream, such as:

commit fa26198d30f3 ("iommu/io-pgtable-arm: dynamically allocate selftest device struct")

Fixes: a4cdb556cae0 ("xen/gntdev: add ioctl for grant copy")
Signed-off-by: Abinash Singh <abinashsinghlalotra@gmail.com>
---
The stack usage was confirmed using the `-fstack-usage`  flag and mannual analysis, which showed:

  drivers/xen/gntdev.c:953: gntdev_ioctl_grant_copy.isra   1048 bytes
  drivers/xen/gntdev.c:826: gntdev_copy                     56 bytes

Since `gntdev_ioctl` was calling the inlined `gntdev_ioctl_grant_copy`, the total
frame size exceeded 1024 bytes, triggering the warning.

This patch addresses the warning and keeps stack usage within acceptable limits.
Is this patch fine or kmalloc may affect performance ?
---
 drivers/xen/gntdev.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 61faea1f0663..9811f3d7c87c 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -953,15 +953,19 @@ static int gntdev_grant_copy_seg(struct gntdev_copy_batch *batch,
 static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
 {
 	struct ioctl_gntdev_grant_copy copy;
-	struct gntdev_copy_batch batch;
+	struct gntdev_copy_batch *batch;
 	unsigned int i;
 	int ret = 0;
 
 	if (copy_from_user(&copy, u, sizeof(copy)))
 		return -EFAULT;
-
-	batch.nr_ops = 0;
-	batch.nr_pages = 0;
+
+	batch = kmalloc(sizeof(*batch), GFP_KERNEL);
+	if (!batch)
+		return -ENOMEM;
+
+	batch->nr_ops = 0;
+	batch->nr_pages = 0;
 
 	for (i = 0; i < copy.count; i++) {
 		struct gntdev_grant_copy_segment seg;
@@ -971,18 +975,20 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
 			goto out;
 		}
 
-		ret = gntdev_grant_copy_seg(&batch, &seg, &copy.segments[i].status);
+		ret = gntdev_grant_copy_seg(batch, &seg, &copy.segments[i].status);
 		if (ret < 0)
 			goto out;
 
 		cond_resched();
 	}
-	if (batch.nr_ops)
-		ret = gntdev_copy(&batch);
-	return ret;
+	if (batch->nr_ops)
+		ret = gntdev_copy(batch);
+	goto free_batch;
 
   out:
-	gntdev_put_pages(&batch);
+	gntdev_put_pages(batch);
+  free_batch:
+	kfree(batch);
 	return ret;
 }
 
-- 
2.43.0
Re: [RFC PATCH] xen/gntdev: reduce stack usage by dynamically allocating gntdev_copy_batch
Posted by Tu Dinh 3 months, 1 week ago
Hi,

On 30/06/2025 06:54, Abinash Singh wrote:
> While building the kernel with LLVM, a warning was reported due to
> excessive stack usage in `gntdev_ioctl`:
> 
> 	drivers/xen/gntdev.c:991: warning: stack frame size (1160) exceeds limit (1024) in function 'gntdev_ioctl'
> 
> Further analysis revealed that the large stack frame was caused by
> `struct gntdev_copy_batch`, which was declared on the stack inside
> `gntdev_ioctl_grant_copy()`. Since this function was inlined into
> `gntdev_ioctl`, the stack usage was attributed to the latter.
> 
> This patch fixes the issue by dynamically allocating `gntdev_copy_batch`
> using `kmalloc()`, which significantly reduces the stack footprint and
> eliminates the warning.
> 
> This approach is consistent with similar fixes upstream, such as:
> 
> commit fa26198d30f3 ("iommu/io-pgtable-arm: dynamically allocate selftest device struct")
> 
> Fixes: a4cdb556cae0 ("xen/gntdev: add ioctl for grant copy")
> Signed-off-by: Abinash Singh <abinashsinghlalotra@gmail.com>
> ---
> The stack usage was confirmed using the `-fstack-usage`  flag and mannual analysis, which showed:
> 
>    drivers/xen/gntdev.c:953: gntdev_ioctl_grant_copy.isra   1048 bytes
>    drivers/xen/gntdev.c:826: gntdev_copy                     56 bytes
> 
> Since `gntdev_ioctl` was calling the inlined `gntdev_ioctl_grant_copy`, the total
> frame size exceeded 1024 bytes, triggering the warning.
> 
> This patch addresses the warning and keeps stack usage within acceptable limits.
> Is this patch fine or kmalloc may affect performance ?
> ---

Have you measured the performance impact? gntdev_ioctl_grant_copy is 
called quite often especially by the backend. I'm afraid calling the 
allocator here may cause even more slowdown than there already is, 
especially when memory is tight.

>   drivers/xen/gntdev.c | 24 +++++++++++++++---------
>   1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 61faea1f0663..9811f3d7c87c 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -953,15 +953,19 @@ static int gntdev_grant_copy_seg(struct gntdev_copy_batch *batch,
>   static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
>   {
>   	struct ioctl_gntdev_grant_copy copy;
> -	struct gntdev_copy_batch batch;
> +	struct gntdev_copy_batch *batch;
>   	unsigned int i;
>   	int ret = 0;
>   
>   	if (copy_from_user(&copy, u, sizeof(copy)))
>   		return -EFAULT;
> -
> -	batch.nr_ops = 0;
> -	batch.nr_pages = 0;
> +
> +	batch = kmalloc(sizeof(*batch), GFP_KERNEL);
> +	if (!batch)
> +		return -ENOMEM;
> +
> +	batch->nr_ops = 0;
> +	batch->nr_pages = 0;
>   
>   	for (i = 0; i < copy.count; i++) {
>   		struct gntdev_grant_copy_segment seg;
> @@ -971,18 +975,20 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
>   			goto out;
>   		}
>   
> -		ret = gntdev_grant_copy_seg(&batch, &seg, &copy.segments[i].status);
> +		ret = gntdev_grant_copy_seg(batch, &seg, &copy.segments[i].status);
>   		if (ret < 0)
>   			goto out;
>   
>   		cond_resched();
>   	}
> -	if (batch.nr_ops)
> -		ret = gntdev_copy(&batch);
> -	return ret;
> +	if (batch->nr_ops)
> +		ret = gntdev_copy(batch);
> +	goto free_batch;
>   
>     out:
> -	gntdev_put_pages(&batch);
> +	gntdev_put_pages(batch);
> +  free_batch:
> +	kfree(batch);
>   	return ret;
>   }
>   



Ngoc Tu Dinh | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech
Re: [RFC PATCH] xen/gntdev: reduce stack usage by dynamically allocating gntdev_copy_batch
Posted by Abinash 3 months, 1 week ago
Hi ,

Thanks for pointing that out.

I haven’t measured the performance impact yet — my main focus was on
getting rid of the stack usage warning triggered by LLVM due to
inlining. But you're right, gntdev_ioctl_grant_copy() is on a hot
path, and calling kmalloc() there could definitely slow things down,
especially under memory pressure.

I’ll run some benchmarks to compare the current approach with the
dynamic allocation, and also look into alternatives — maybe
pre-allocating the struct or limiting inlining instead. If you have
any ideas or suggestions on how best to approach this, I’d be happy to
hear them.

Do you have any suggestions on how to test the performance?

Best,
Abinash


On Mon, 30 Jun 2025 at 16:05, Tu Dinh <ngoc-tu.dinh@vates.tech> wrote:
>
> Hi,
>
> On 30/06/2025 06:54, Abinash Singh wrote:
> > While building the kernel with LLVM, a warning was reported due to
> > excessive stack usage in `gntdev_ioctl`:
> >
> >       drivers/xen/gntdev.c:991: warning: stack frame size (1160) exceeds limit (1024) in function 'gntdev_ioctl'
> >
> > Further analysis revealed that the large stack frame was caused by
> > `struct gntdev_copy_batch`, which was declared on the stack inside
> > `gntdev_ioctl_grant_copy()`. Since this function was inlined into
> > `gntdev_ioctl`, the stack usage was attributed to the latter.
> >
> > This patch fixes the issue by dynamically allocating `gntdev_copy_batch`
> > using `kmalloc()`, which significantly reduces the stack footprint and
> > eliminates the warning.
> >
> > This approach is consistent with similar fixes upstream, such as:
> >
> > commit fa26198d30f3 ("iommu/io-pgtable-arm: dynamically allocate selftest device struct")
> >
> > Fixes: a4cdb556cae0 ("xen/gntdev: add ioctl for grant copy")
> > Signed-off-by: Abinash Singh <abinashsinghlalotra@gmail.com>
> > ---
> > The stack usage was confirmed using the `-fstack-usage`  flag and mannual analysis, which showed:
> >
> >    drivers/xen/gntdev.c:953: gntdev_ioctl_grant_copy.isra   1048 bytes
> >    drivers/xen/gntdev.c:826: gntdev_copy                     56 bytes
> >
> > Since `gntdev_ioctl` was calling the inlined `gntdev_ioctl_grant_copy`, the total
> > frame size exceeded 1024 bytes, triggering the warning.
> >
> > This patch addresses the warning and keeps stack usage within acceptable limits.
> > Is this patch fine or kmalloc may affect performance ?
> > ---
>
> Have you measured the performance impact? gntdev_ioctl_grant_copy is
> called quite often especially by the backend. I'm afraid calling the
> allocator here may cause even more slowdown than there already is,
> especially when memory is tight.
>
> >   drivers/xen/gntdev.c | 24 +++++++++++++++---------
> >   1 file changed, 15 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> > index 61faea1f0663..9811f3d7c87c 100644
> > --- a/drivers/xen/gntdev.c
> > +++ b/drivers/xen/gntdev.c
> > @@ -953,15 +953,19 @@ static int gntdev_grant_copy_seg(struct gntdev_copy_batch *batch,
> >   static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
> >   {
> >       struct ioctl_gntdev_grant_copy copy;
> > -     struct gntdev_copy_batch batch;
> > +     struct gntdev_copy_batch *batch;
> >       unsigned int i;
> >       int ret = 0;
> >
> >       if (copy_from_user(&copy, u, sizeof(copy)))
> >               return -EFAULT;
> > -
> > -     batch.nr_ops = 0;
> > -     batch.nr_pages = 0;
> > +
> > +     batch = kmalloc(sizeof(*batch), GFP_KERNEL);
> > +     if (!batch)
> > +             return -ENOMEM;
> > +
> > +     batch->nr_ops = 0;
> > +     batch->nr_pages = 0;
> >
> >       for (i = 0; i < copy.count; i++) {
> >               struct gntdev_grant_copy_segment seg;
> > @@ -971,18 +975,20 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
> >                       goto out;
> >               }
> >
> > -             ret = gntdev_grant_copy_seg(&batch, &seg, &copy.segments[i].status);
> > +             ret = gntdev_grant_copy_seg(batch, &seg, &copy.segments[i].status);
> >               if (ret < 0)
> >                       goto out;
> >
> >               cond_resched();
> >       }
> > -     if (batch.nr_ops)
> > -             ret = gntdev_copy(&batch);
> > -     return ret;
> > +     if (batch->nr_ops)
> > +             ret = gntdev_copy(batch);
> > +     goto free_batch;
> >
> >     out:
> > -     gntdev_put_pages(&batch);
> > +     gntdev_put_pages(batch);
> > +  free_batch:
> > +     kfree(batch);
> >       return ret;
> >   }
> >
>
>
>
> Ngoc Tu Dinh | Vates XCP-ng Developer
>
> XCP-ng & Xen Orchestra - Vates solutions
>
> web: https://vates.tech
>
Re: [RFC PATCH] xen/gntdev: reduce stack usage by dynamically allocating gntdev_copy_batch
Posted by Tu Dinh 3 months, 1 week ago
On 01/07/2025 23:53, Abinash wrote:
> Hi ,
> 
> Thanks for pointing that out.
> 
> I haven’t measured the performance impact yet — my main focus was on
> getting rid of the stack usage warning triggered by LLVM due to
> inlining. But you're right, gntdev_ioctl_grant_copy() is on a hot
> path, and calling kmalloc() there could definitely slow things down,
> especially under memory pressure.
> 
> I’ll run some benchmarks to compare the current approach with the
> dynamic allocation, and also look into alternatives — maybe
> pre-allocating the struct or limiting inlining instead. If you have
> any ideas or suggestions on how best to approach this, I’d be happy to
> hear them.
> 
> Do you have any suggestions on how to test the performance?
> 
> Best,
> Abinash
> 
> 

Preallocating may work but I'd be wary of synchronization if the 
preallocated struct is shared.

I'd look at optimizing status[] which should save quite a few bytes.

Reducing GNTDEV_COPY_BATCH could be a last resort, but that may also 
impact performance.

As for benchmarks, I think you can use iperf and fio with varying packet 
sizes/block sizes.

> On Mon, 30 Jun 2025 at 16:05, Tu Dinh <ngoc-tu.dinh@vates.tech> wrote:
>>
>> Hi,
>>
>> On 30/06/2025 06:54, Abinash Singh wrote:
>>> While building the kernel with LLVM, a warning was reported due to
>>> excessive stack usage in `gntdev_ioctl`:
>>>
>>>        drivers/xen/gntdev.c:991: warning: stack frame size (1160) exceeds limit (1024) in function 'gntdev_ioctl'
>>>
>>> Further analysis revealed that the large stack frame was caused by
>>> `struct gntdev_copy_batch`, which was declared on the stack inside
>>> `gntdev_ioctl_grant_copy()`. Since this function was inlined into
>>> `gntdev_ioctl`, the stack usage was attributed to the latter.
>>>
>>> This patch fixes the issue by dynamically allocating `gntdev_copy_batch`
>>> using `kmalloc()`, which significantly reduces the stack footprint and
>>> eliminates the warning.
>>>
>>> This approach is consistent with similar fixes upstream, such as:
>>>
>>> commit fa26198d30f3 ("iommu/io-pgtable-arm: dynamically allocate selftest device struct")
>>>
>>> Fixes: a4cdb556cae0 ("xen/gntdev: add ioctl for grant copy")
>>> Signed-off-by: Abinash Singh <abinashsinghlalotra@gmail.com>
>>> ---
>>> The stack usage was confirmed using the `-fstack-usage`  flag and mannual analysis, which showed:
>>>
>>>     drivers/xen/gntdev.c:953: gntdev_ioctl_grant_copy.isra   1048 bytes
>>>     drivers/xen/gntdev.c:826: gntdev_copy                     56 bytes
>>>
>>> Since `gntdev_ioctl` was calling the inlined `gntdev_ioctl_grant_copy`, the total
>>> frame size exceeded 1024 bytes, triggering the warning.
>>>
>>> This patch addresses the warning and keeps stack usage within acceptable limits.
>>> Is this patch fine or kmalloc may affect performance ?
>>> ---
>>
>> Have you measured the performance impact? gntdev_ioctl_grant_copy is
>> called quite often especially by the backend. I'm afraid calling the
>> allocator here may cause even more slowdown than there already is,
>> especially when memory is tight.
>>
>>>    drivers/xen/gntdev.c | 24 +++++++++++++++---------
>>>    1 file changed, 15 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>>> index 61faea1f0663..9811f3d7c87c 100644
>>> --- a/drivers/xen/gntdev.c
>>> +++ b/drivers/xen/gntdev.c
>>> @@ -953,15 +953,19 @@ static int gntdev_grant_copy_seg(struct gntdev_copy_batch *batch,
>>>    static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
>>>    {
>>>        struct ioctl_gntdev_grant_copy copy;
>>> -     struct gntdev_copy_batch batch;
>>> +     struct gntdev_copy_batch *batch;
>>>        unsigned int i;
>>>        int ret = 0;
>>>
>>>        if (copy_from_user(&copy, u, sizeof(copy)))
>>>                return -EFAULT;
>>> -
>>> -     batch.nr_ops = 0;
>>> -     batch.nr_pages = 0;
>>> +
>>> +     batch = kmalloc(sizeof(*batch), GFP_KERNEL);
>>> +     if (!batch)
>>> +             return -ENOMEM;
>>> +
>>> +     batch->nr_ops = 0;
>>> +     batch->nr_pages = 0;
>>>
>>>        for (i = 0; i < copy.count; i++) {
>>>                struct gntdev_grant_copy_segment seg;
>>> @@ -971,18 +975,20 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
>>>                        goto out;
>>>                }
>>>
>>> -             ret = gntdev_grant_copy_seg(&batch, &seg, &copy.segments[i].status);
>>> +             ret = gntdev_grant_copy_seg(batch, &seg, &copy.segments[i].status);
>>>                if (ret < 0)
>>>                        goto out;
>>>
>>>                cond_resched();
>>>        }
>>> -     if (batch.nr_ops)
>>> -             ret = gntdev_copy(&batch);
>>> -     return ret;
>>> +     if (batch->nr_ops)
>>> +             ret = gntdev_copy(batch);
>>> +     goto free_batch;
>>>
>>>      out:
>>> -     gntdev_put_pages(&batch);
>>> +     gntdev_put_pages(batch);
>>> +  free_batch:
>>> +     kfree(batch);
>>>        return ret;
>>>    }
>>>
>>
>>
>>
>> Ngoc Tu Dinh | Vates XCP-ng Developer
>>
>> XCP-ng & Xen Orchestra - Vates solutions
>>
>> web: https://vates.tech
>>



Ngoc Tu Dinh | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech
Re: [RFC PATCH] xen/gntdev: reduce stack usage by dynamically allocating gntdev_copy_batch
Posted by Jürgen Groß 3 months, 1 week ago
On 03.07.25 00:42, Tu Dinh wrote:
> On 01/07/2025 23:53, Abinash wrote:
>> Hi ,
>>
>> Thanks for pointing that out.
>>
>> I haven’t measured the performance impact yet — my main focus was on
>> getting rid of the stack usage warning triggered by LLVM due to
>> inlining. But you're right, gntdev_ioctl_grant_copy() is on a hot
>> path, and calling kmalloc() there could definitely slow things down,
>> especially under memory pressure.
>>
>> I’ll run some benchmarks to compare the current approach with the
>> dynamic allocation, and also look into alternatives — maybe
>> pre-allocating the struct or limiting inlining instead. If you have
>> any ideas or suggestions on how best to approach this, I’d be happy to
>> hear them.
>>
>> Do you have any suggestions on how to test the performance?
>>
>> Best,
>> Abinash
>>
>>
> 
> Preallocating may work but I'd be wary of synchronization if the
> preallocated struct is shared.
> 
> I'd look at optimizing status[] which should save quite a few bytes.
> 
> Reducing GNTDEV_COPY_BATCH could be a last resort, but that may also
> impact performance.

IMHO the most promising way would be to dynamically allocate the struct, but
don't free it at the end of the ioctl. Instead it could be put into a list
anchored in struct gntdev_priv, so freeing would be done only at close() time.

Synchronization would be minimal (just for taking a free struct from the list
or putting it back again), while memory usage would be basically just as needed,
depending on the number of concurrent threads using the same file descriptor
for the ioctl.

This approach would even allow to raise GNTDEV_COPY_BATCH, maybe resulting even
in a gain of performance.

I'll write a patch implementing the allocation scheme.


Juergen
Re: [RFC PATCH] xen/gntdev: reduce stack usage by dynamically allocating gntdev_copy_batch
Posted by Christophe JAILLET 3 months ago
Le 03/07/2025 à 07:22, Jürgen Groß a écrit :
> On 03.07.25 00:42, Tu Dinh wrote:
>> On 01/07/2025 23:53, Abinash wrote:
>>> Hi ,
>>>
>>> Thanks for pointing that out.
>>>
>>> I haven’t measured the performance impact yet — my main focus was on
>>> getting rid of the stack usage warning triggered by LLVM due to
>>> inlining. But you're right, gntdev_ioctl_grant_copy() is on a hot
>>> path, and calling kmalloc() there could definitely slow things down,
>>> especially under memory pressure.
>>>
>>> I’ll run some benchmarks to compare the current approach with the
>>> dynamic allocation, and also look into alternatives — maybe
>>> pre-allocating the struct or limiting inlining instead. If you have
>>> any ideas or suggestions on how best to approach this, I’d be happy to
>>> hear them.
>>>
>>> Do you have any suggestions on how to test the performance?
>>>
>>> Best,
>>> Abinash
>>>
>>>
>>
>> Preallocating may work but I'd be wary of synchronization if the
>> preallocated struct is shared.
>>
>> I'd look at optimizing status[] which should save quite a few bytes.
>>
>> Reducing GNTDEV_COPY_BATCH could be a last resort, but that may also
>> impact performance.
> 
> IMHO the most promising way would be to dynamically allocate the struct, 
> but
> don't free it at the end of the ioctl. Instead it could be put into a list
> anchored in struct gntdev_priv, so freeing would be done only at close() 
> time.
> 
> Synchronization would be minimal (just for taking a free struct from the 
> list
> or putting it back again), while memory usage would be basically just as 
> needed,
> depending on the number of concurrent threads using the same file 
> descriptor
> for the ioctl.
> 
> This approach would even allow to raise GNTDEV_COPY_BATCH, maybe 
> resulting even
> in a gain of performance.
> 
> I'll write a patch implementing the allocation scheme.
> 
> 
> Juergen

It may be an overkill, but sometimes we see pattern that try to keep the 
best of the 2 worlds. Something like:


static struct gntdev_copy_batch static_batch;
static struct mutex my_mutex;

static long gntdev_ioctl_grant_copy(...)
{
	struct gntdev_copy_batch *dynamic_batch = NULL;
	struct gntdev_copy_batch *batch;

	...

	if (mutex_trylock(&my_mutex)) {
		/*
		 * No concurrent access?
		 * Use a shared static variable to avoid an allocation
		 */
		batch = &static_batch;
	else {
		/* otherwise, we need some fresh memory */
		dynamic_batch = kmalloc(sizeof(*batch), GFP_KERNEL);
		if (!batch)
			return -ENOMEM;

		batch = dynamic_batch;
	}

	/* do stuff with 'batch' */
	...

free_batch:
	if (!dynamic_batch)
		mutex_unlock(&my_mutex);
	else
		kfree(dynamic_batch);
  	return ret;
  }


Just my 2c.

CJ


Re: [RFC PATCH] xen/gntdev: reduce stack usage by dynamically allocating gntdev_copy_batch
Posted by Juergen Gross 3 months ago
On 03.07.25 19:35, Christophe JAILLET wrote:
> Le 03/07/2025 à 07:22, Jürgen Groß a écrit :
>> On 03.07.25 00:42, Tu Dinh wrote:
>>> On 01/07/2025 23:53, Abinash wrote:
>>>> Hi ,
>>>>
>>>> Thanks for pointing that out.
>>>>
>>>> I haven’t measured the performance impact yet — my main focus was on
>>>> getting rid of the stack usage warning triggered by LLVM due to
>>>> inlining. But you're right, gntdev_ioctl_grant_copy() is on a hot
>>>> path, and calling kmalloc() there could definitely slow things down,
>>>> especially under memory pressure.
>>>>
>>>> I’ll run some benchmarks to compare the current approach with the
>>>> dynamic allocation, and also look into alternatives — maybe
>>>> pre-allocating the struct or limiting inlining instead. If you have
>>>> any ideas or suggestions on how best to approach this, I’d be happy to
>>>> hear them.
>>>>
>>>> Do you have any suggestions on how to test the performance?
>>>>
>>>> Best,
>>>> Abinash
>>>>
>>>>
>>>
>>> Preallocating may work but I'd be wary of synchronization if the
>>> preallocated struct is shared.
>>>
>>> I'd look at optimizing status[] which should save quite a few bytes.
>>>
>>> Reducing GNTDEV_COPY_BATCH could be a last resort, but that may also
>>> impact performance.
>>
>> IMHO the most promising way would be to dynamically allocate the struct, but
>> don't free it at the end of the ioctl. Instead it could be put into a list
>> anchored in struct gntdev_priv, so freeing would be done only at close() time.
>>
>> Synchronization would be minimal (just for taking a free struct from the list
>> or putting it back again), while memory usage would be basically just as needed,
>> depending on the number of concurrent threads using the same file descriptor
>> for the ioctl.
>>
>> This approach would even allow to raise GNTDEV_COPY_BATCH, maybe resulting even
>> in a gain of performance.
>>
>> I'll write a patch implementing the allocation scheme.
>>
>>
>> Juergen
> 
> It may be an overkill, but sometimes we see pattern that try to keep the best of 
> the 2 worlds. Something like:
> 
> 
> static struct gntdev_copy_batch static_batch;
> static struct mutex my_mutex;
> 
> static long gntdev_ioctl_grant_copy(...)
> {
>      struct gntdev_copy_batch *dynamic_batch = NULL;
>      struct gntdev_copy_batch *batch;
> 
>      ...
> 
>      if (mutex_trylock(&my_mutex)) {
>          /*
>           * No concurrent access?
>           * Use a shared static variable to avoid an allocation
>           */
>          batch = &static_batch;
>      else {
>          /* otherwise, we need some fresh memory */
>          dynamic_batch = kmalloc(sizeof(*batch), GFP_KERNEL);
>          if (!batch)
>              return -ENOMEM;
> 
>          batch = dynamic_batch;
>      }
> 
>      /* do stuff with 'batch' */
>      ...
> 
> free_batch:
>      if (!dynamic_batch)
>          mutex_unlock(&my_mutex);
>      else
>          kfree(dynamic_batch);
>       return ret;
>   }
> 
> 
> Just my 2c.

Thanks for the remark, but this won't help much. gntdev_ioctl_grant_copy()
is e.g. used by qemu for doing disk I/O on behalf of a Xen guest. This
means that it is very likely to be used concurrently, so just optimizing
for a single thread won't be enough.


Juergen