net/9p/client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
While developing a 9P server (https://github.com/Barre/ZeroFS) and testing it under high-load, I was running into allocation failures. The failures occur even with plenty of free memory available because kmalloc requires contiguous physical memory.
This results in errors like:
ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP)
Signed-off-by: Pierre Barre <pierre@barre.sh>
---
net/9p/client.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/9p/client.c b/net/9p/client.c
index 5c1ca57ccd28..f82b5674057c 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -230,7 +230,7 @@ static int p9_fcall_init(struct p9_client *c, struct p9_fcall *fc,
fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
fc->cache = c->fcall_cache;
} else {
- fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
+ fc->sdata = kvmalloc(alloc_msize, GFP_NOFS);
fc->cache = NULL;
}
if (!fc->sdata)
@@ -252,7 +252,7 @@ void p9_fcall_fini(struct p9_fcall *fc)
if (fc->cache)
kmem_cache_free(fc->cache, fc->sdata);
else
- kfree(fc->sdata);
+ kvfree(fc->sdata);
}
EXPORT_SYMBOL(p9_fcall_fini);
--
2.39.5 (Apple Git-154)
On Wednesday, July 30, 2025 5:08:05 PM CEST Pierre Barre wrote:
> While developing a 9P server (https://github.com/Barre/ZeroFS) and testing it under high-load, I was running into allocation failures. The failures occur even with plenty of free memory available because kmalloc requires contiguous physical memory.
>
> This results in errors like:
> ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP)
What was msize?
> Signed-off-by: Pierre Barre <pierre@barre.sh>
> ---
> net/9p/client.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 5c1ca57ccd28..f82b5674057c 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -230,7 +230,7 @@ static int p9_fcall_init(struct p9_client *c, struct p9_fcall *fc,
> fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
> fc->cache = c->fcall_cache;
> } else {
> - fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> + fc->sdata = kvmalloc(alloc_msize, GFP_NOFS);
That would work with certain transports like fd I guess, but not via
virtio-pci transport for instance, since PCI-DMA requires physical pages. Same
applies to Xen transport I guess.
> fc->cache = NULL;
> }
> if (!fc->sdata)
> @@ -252,7 +252,7 @@ void p9_fcall_fini(struct p9_fcall *fc)
> if (fc->cache)
> kmem_cache_free(fc->cache, fc->sdata);
> else
> - kfree(fc->sdata);
> + kvfree(fc->sdata);
> }
> EXPORT_SYMBOL(p9_fcall_fini);
>
>
Thank you for your email.
> What was msize?
I was mounting the filesystem using:
trans=tcp,port=5564,version=9p2000.L,msize=1048576,cache=mmap,access=user
> That would work with certain transports like fd I guess, but not via
> virtio-pci transport for instance, since PCI-DMA requires physical pages. Same
> applies to Xen transport I guess.
Would it be acceptable to add a mount option (like nocontig or loosealloc?) that enables kvmalloc?
Best,
Pierre
On Wed, Jul 30, 2025, at 18:08, Christian Schoenebeck wrote:
> On Wednesday, July 30, 2025 5:08:05 PM CEST Pierre Barre wrote:
>> While developing a 9P server (https://github.com/Barre/ZeroFS) and testing it under high-load, I was running into allocation failures. The failures occur even with plenty of free memory available because kmalloc requires contiguous physical memory.
>>
>> This results in errors like:
>> ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP)
>
> What was msize?
>
>> Signed-off-by: Pierre Barre <pierre@barre.sh>
>> ---
>> net/9p/client.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/9p/client.c b/net/9p/client.c
>> index 5c1ca57ccd28..f82b5674057c 100644
>> --- a/net/9p/client.c
>> +++ b/net/9p/client.c
>> @@ -230,7 +230,7 @@ static int p9_fcall_init(struct p9_client *c, struct p9_fcall *fc,
>> fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
>> fc->cache = c->fcall_cache;
>> } else {
>> - fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
>> + fc->sdata = kvmalloc(alloc_msize, GFP_NOFS);
>
> That would work with certain transports like fd I guess, but not via
> virtio-pci transport for instance, since PCI-DMA requires physical pages. Same
> applies to Xen transport I guess.
>
>> fc->cache = NULL;
>> }
>> if (!fc->sdata)
>> @@ -252,7 +252,7 @@ void p9_fcall_fini(struct p9_fcall *fc)
>> if (fc->cache)
>> kmem_cache_free(fc->cache, fc->sdata);
>> else
>> - kfree(fc->sdata);
>> + kvfree(fc->sdata);
>> }
>> EXPORT_SYMBOL(p9_fcall_fini);
>>
>>
On Wednesday, July 30, 2025 6:19:37 PM CEST Pierre Barre wrote: > Thank you for your email. > > > What was msize? > > I was mounting the filesystem using: > > trans=tcp,port=5564,version=9p2000.L,msize=1048576,cache=mmap,access=user Yeah, that explains both why you were triggering this issue, as 1M msize will likely cause kmalloc() failures under heavy load, and why your patch was working for you due to chosen tcp transport. > > That would work with certain transports like fd I guess, but not via > > virtio-pci transport for instance, since PCI-DMA requires physical pages. Same > > applies to Xen transport I guess. > > Would it be acceptable to add a mount option (like nocontig or loosealloc?) that enables kvmalloc? Dominique's call obviously, I'm just giving my few cents here. To me it would make sense to fix the root cause instead of shorting a symptom: Right now 9p filesystem code (under fs/9p) requires a linear buffer, whereas some 9p transports (under net/9p) require physical pages, and the latter is not going to change. One solution therefore might be changing fs/9p code to work on a scatter/ gather list instead of a simple linear buffer. But I guess that would be too much work. So a more reasonable solution instead might be using kvmalloc(), as suggested by you, and adjusting the individual transports such that they translate a virtual memory address to a list of physical addresses via e.g. vmalloc_to_page() if needed. /Christian
As a middle-ground, would it be acceptable to add a transport capability flag indicating whether the transport requires contiguous memory for DMA?
1. Add a P9_TRANS_REQUIRES_CONTIGUOUS flag to struct p9_trans_module
2. Set this flag for virtio, xen, and rdma transports
3. Modify p9_fcall_init to check the flag:
if (c->trans_mod->caps & P9_TRANS_REQUIRES_CONTIGUOUS)
fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
else
fc->sdata = kvmalloc(alloc_msize, GFP_NOFS);
I can prepare a patch implementing this approach if you agree.
Best,
Pierre
On Wed, Jul 30, 2025, at 19:28, Christian Schoenebeck wrote:
> On Wednesday, July 30, 2025 6:19:37 PM CEST Pierre Barre wrote:
>> Thank you for your email.
>>
>> > What was msize?
>>
>> I was mounting the filesystem using:
>>
>> trans=tcp,port=5564,version=9p2000.L,msize=1048576,cache=mmap,access=user
>
> Yeah, that explains both why you were triggering this issue, as 1M msize will
> likely cause kmalloc() failures under heavy load, and why your patch was
> working for you due to chosen tcp transport.
>
>> > That would work with certain transports like fd I guess, but not via
>> > virtio-pci transport for instance, since PCI-DMA requires physical pages. Same
>> > applies to Xen transport I guess.
>>
>> Would it be acceptable to add a mount option (like nocontig or loosealloc?) that enables kvmalloc?
>
> Dominique's call obviously, I'm just giving my few cents here. To me it would
> make sense to fix the root cause instead of shorting a symptom:
>
> Right now 9p filesystem code (under fs/9p) requires a linear buffer, whereas
> some 9p transports (under net/9p) require physical pages, and the latter is
> not going to change.
>
> One solution therefore might be changing fs/9p code to work on a scatter/
> gather list instead of a simple linear buffer. But I guess that would be too
> much work.
>
> So a more reasonable solution instead might be using kvmalloc(), as suggested
> by you, and adjusting the individual transports such that they translate a
> virtual memory address to a list of physical addresses via e.g.
> vmalloc_to_page() if needed.
>
> /Christian
(Added Willy in To, if you have time to advise on what's appropriate wrt. memory allocation here to use either as a contiguous virtual memory buffer or a scatterlist for various kind of dma used by transports that'd be appreciated) First - thanks for starting this thread, large contiguous allocations has been a problem reported regularly in the past, and no-one has had the energy to address it yet, but it's definitely something worth tackling. Pierre Barre wrote on Wed, Jul 30, 2025 at 10:16:13PM +0200: > As a middle-ground, would it be acceptable to add a transport > capability flag indicating whether the transport requires contiguous > memory for DMA? I think such a flag will be needed even if we do what Christian suggested: >> So a more reasonable solution instead might be using kvmalloc(), as suggested >> by you, and adjusting the individual transports such that they translate a >> virtual memory address to a list of physical addresses via e.g. >> vmalloc_to_page() if needed. Start with a requires contiguous flag set for all transports except trans_fd (I'm not sure about usb, but I'm pretty sure all the rest need it); then each transport can figure out how to get a scatterlist or something they need from the allocation, so we don't need to have a big bang that breaks everything at the same time. I'm sure virtio's implem will come soon enough, but I don't see anyone working on RDMA or xen so fast. > 1. Add a P9_TRANS_REQUIRES_CONTIGUOUS flag to struct p9_trans_module > 2. Set this flag for virtio, xen, and rdma transports This is a nit but I'd rather the flag go the other way around, e.g. no flag means requires contiguous and it's only set after confirming the transport works (I just looked at usb and struct usb_request makes me think some drivers use dma, so, yeah..) We can always turn it around later if the majority of transports handle it. > 3. Modify p9_fcall_init to check the flag: > if (c->trans_mod->caps & P9_TRANS_REQUIRES_CONTIGUOUS) > fc->sdata = kmalloc(alloc_msize, GFP_NOFS); > else > fc->sdata = kvmalloc(alloc_msize, GFP_NOFS); I'm also curious if there's something more appropriate than kvmalloc wrt using either as a contiguous virtual memory buffer or a scatterlist, but I think this is fine for now unless someone knows of something more appropriate. Thanks, -- Dominique Martinet | Asmadeus
> First - thanks for starting this thread, large contiguous allocations > has been a problem reported regularly in the past, and no-one has had > the energy to address it yet, but it's definitely something worth > tackling. Glad that I didn't just create noise. I implemented 9P in addition to NFS in my server because fsyncing on NFS didn't give me what I wanted (fsync doesn't call commit when write is communicated to be "durable" by the server, but that's a story for an other day...). I am very grateful that this client implementation exists. On Thu, Jul 31, 2025, at 00:07, asmadeus@codewreck.org wrote: > (Added Willy in To, if you have time to advise on what's appropriate > wrt. memory allocation here to use either as a contiguous virtual memory > buffer or a scatterlist for various kind of dma used by transports > that'd be appreciated) > > > First - thanks for starting this thread, large contiguous allocations > has been a problem reported regularly in the past, and no-one has had > the energy to address it yet, but it's definitely something worth > tackling. > > > Pierre Barre wrote on Wed, Jul 30, 2025 at 10:16:13PM +0200: >> As a middle-ground, would it be acceptable to add a transport >> capability flag indicating whether the transport requires contiguous >> memory for DMA? > > I think such a flag will be needed even if we do what Christian suggested: >>> So a more reasonable solution instead might be using kvmalloc(), as suggested >>> by you, and adjusting the individual transports such that they translate a >>> virtual memory address to a list of physical addresses via e.g. >>> vmalloc_to_page() if needed. > > Start with a requires contiguous flag set for all transports except > trans_fd (I'm not sure about usb, but I'm pretty sure all the rest need > it); then each transport can figure out how to get a scatterlist or > something they need from the allocation, so we don't need to have a big > bang that breaks everything at the same time. > > I'm sure virtio's implem will come soon enough, but I don't see anyone > working on RDMA or xen so fast. > >> 1. Add a P9_TRANS_REQUIRES_CONTIGUOUS flag to struct p9_trans_module >> 2. Set this flag for virtio, xen, and rdma transports > > This is a nit but I'd rather the flag go the other way around, e.g. no > flag means requires contiguous and it's only set after confirming the > transport works > (I just looked at usb and struct usb_request makes me think some drivers > use dma, so, yeah..) > > We can always turn it around later if the majority of transports handle > it. > > >> 3. Modify p9_fcall_init to check the flag: >> if (c->trans_mod->caps & P9_TRANS_REQUIRES_CONTIGUOUS) >> fc->sdata = kmalloc(alloc_msize, GFP_NOFS); >> else >> fc->sdata = kvmalloc(alloc_msize, GFP_NOFS); > > I'm also curious if there's something more appropriate than kvmalloc wrt > using either as a contiguous virtual memory buffer or a scatterlist, but > I think this is fine for now unless someone knows of something more > appropriate. > > > Thanks, > -- > Dominique Martinet | Asmadeus
Hi Everyone, If I submit a patch based on what has been discussed above, is it likely to be accepted? Unfortunately, in my current setup, I am encountering this issue quite frequently, and users of my servers are having a hard time making sense of the error. Thank you. Best, Pierre On Thu, Jul 31, 2025, at 02:36, Pierre Barre wrote: >> First - thanks for starting this thread, large contiguous allocations >> has been a problem reported regularly in the past, and no-one has had >> the energy to address it yet, but it's definitely something worth >> tackling. > > Glad that I didn't just create noise. I implemented 9P in addition to > NFS in my server because fsyncing on NFS didn't give me what I wanted > (fsync doesn't call commit when write is communicated to be "durable" > by the server, but that's a story for an other day...). > > I am very grateful that this client implementation exists. > > On Thu, Jul 31, 2025, at 00:07, asmadeus@codewreck.org wrote: >> (Added Willy in To, if you have time to advise on what's appropriate >> wrt. memory allocation here to use either as a contiguous virtual memory >> buffer or a scatterlist for various kind of dma used by transports >> that'd be appreciated) >> >> >> First - thanks for starting this thread, large contiguous allocations >> has been a problem reported regularly in the past, and no-one has had >> the energy to address it yet, but it's definitely something worth >> tackling. >> >> >> Pierre Barre wrote on Wed, Jul 30, 2025 at 10:16:13PM +0200: >>> As a middle-ground, would it be acceptable to add a transport >>> capability flag indicating whether the transport requires contiguous >>> memory for DMA? >> >> I think such a flag will be needed even if we do what Christian suggested: >>>> So a more reasonable solution instead might be using kvmalloc(), as suggested >>>> by you, and adjusting the individual transports such that they translate a >>>> virtual memory address to a list of physical addresses via e.g. >>>> vmalloc_to_page() if needed. >> >> Start with a requires contiguous flag set for all transports except >> trans_fd (I'm not sure about usb, but I'm pretty sure all the rest need >> it); then each transport can figure out how to get a scatterlist or >> something they need from the allocation, so we don't need to have a big >> bang that breaks everything at the same time. >> >> I'm sure virtio's implem will come soon enough, but I don't see anyone >> working on RDMA or xen so fast. >> >>> 1. Add a P9_TRANS_REQUIRES_CONTIGUOUS flag to struct p9_trans_module >>> 2. Set this flag for virtio, xen, and rdma transports >> >> This is a nit but I'd rather the flag go the other way around, e.g. no >> flag means requires contiguous and it's only set after confirming the >> transport works >> (I just looked at usb and struct usb_request makes me think some drivers >> use dma, so, yeah..) >> >> We can always turn it around later if the majority of transports handle >> it. >> >> >>> 3. Modify p9_fcall_init to check the flag: >>> if (c->trans_mod->caps & P9_TRANS_REQUIRES_CONTIGUOUS) >>> fc->sdata = kmalloc(alloc_msize, GFP_NOFS); >>> else >>> fc->sdata = kvmalloc(alloc_msize, GFP_NOFS); >> >> I'm also curious if there's something more appropriate than kvmalloc wrt >> using either as a contiguous virtual memory buffer or a scatterlist, but >> I think this is fine for now unless someone knows of something more >> appropriate. >> >> >> Thanks, >> -- >> Dominique Martinet | Asmadeus
Pierre Barre wrote on Wed, Aug 06, 2025 at 05:50:42PM +0200: > If I submit a patch based on what has been discussed above, is it > likely to be accepted? Unfortunately, in my current setup, I am > encountering this issue quite frequently, and users of my servers are > having a hard time making sense of the error. Yes, sorry it wasn't clear. I still have no idea what's the "best" allocation method that we'll be able to use as either a vmalloc buffer or split into a scatterlist, but there's little point in worrying too much about it, so please go ahead. If it's restricted to trans_fd and there's a chance we can make use of it with (at least) virtio later I think everyone will be happy :) -- Dominique
On Wednesday, August 6, 2025 11:44:34 PM CEST Dominique Martinet wrote: > > Pierre Barre wrote on Wed, Aug 06, 2025 at 05:50:42PM +0200: > > If I submit a patch based on what has been discussed above, is it > > likely to be accepted? Unfortunately, in my current setup, I am > > encountering this issue quite frequently, and users of my servers are > > having a hard time making sense of the error. > > Yes, sorry it wasn't clear. > > I still have no idea what's the "best" allocation method that we'll be > able to use as either a vmalloc buffer or split into a scatterlist, but > there's little point in worrying too much about it, so please go ahead. > > If it's restricted to trans_fd and there's a chance we can make use of > it with (at least) virtio later I think everyone will be happy :) Yes, sounds like a viable plan. Pierre, one more thing to note: kmem_cache_alloc() might still fail though. So maybe it would make sense to add a separate patch that would check the result of kmem_cache_alloc() and if it fails, falling back to your kvmalloc() call (if enabled by the discussed transport mechanism of course). /Christian
While developing a 9P server (https://github.com/Barre/ZeroFS) and
testing it under high-load, I was running into allocation failures.
The failures occur even with plenty of free memory available because
kmalloc requires contiguous physical memory.
This results in errors like:
ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP)
This patch introduces a transport capability flag (supports_vmalloc)
that indicates whether a transport can work with vmalloc'd buffers
(non-physically contiguous memory). Transports requiring DMA should
leave this flag as false.
The fd-based transports (tcp, unix, fd) set this flag to true, and
p9_fcall_init will use kvmalloc instead of kmalloc for these
transports. This allows the allocator to fall back to vmalloc when
contiguous physical memory is not available.
Additionally, if kmem_cache_alloc fails, the code falls back to
kvmalloc for transports that support it.
Signed-off-by: Pierre Barre <pierre@barre.sh>
---
include/net/9p/transport.h | 4 ++++
net/9p/client.c | 11 +++++++++--
net/9p/trans_fd.c | 3 +++
3 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/include/net/9p/transport.h b/include/net/9p/transport.h
index 766ec07c9599..f0981515148d 100644
--- a/include/net/9p/transport.h
+++ b/include/net/9p/transport.h
@@ -24,6 +24,9 @@
* we're less flexible when choosing the response message
* size in this case
* @def: set if this transport should be considered the default
+ * @supports_vmalloc: set if this transport can work with vmalloc'd buffers
+ * (non-physically contiguous memory). Transports requiring
+ * DMA should leave this as false.
* @create: member function to create a new connection on this transport
* @close: member function to discard a connection on this transport
* @request: member function to issue a request to the transport
@@ -44,6 +47,7 @@ struct p9_trans_module {
int maxsize; /* max message size of transport */
bool pooled_rbuffers;
int def; /* this transport should be default */
+ bool supports_vmalloc; /* can work with vmalloc'd buffers */
struct module *owner;
int (*create)(struct p9_client *client,
const char *devname, char *args);
diff --git a/net/9p/client.c b/net/9p/client.c
index 5c1ca57ccd28..2a4884c880c1 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -229,8 +229,15 @@ static int p9_fcall_init(struct p9_client *c, struct p9_fcall *fc,
if (likely(c->fcall_cache) && alloc_msize == c->msize) {
fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
fc->cache = c->fcall_cache;
+ if (!fc->sdata && c->trans_mod->supports_vmalloc) {
+ fc->sdata = kvmalloc(alloc_msize, GFP_NOFS);
+ fc->cache = NULL;
+ }
} else {
- fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
+ if (c->trans_mod->supports_vmalloc)
+ fc->sdata = kvmalloc(alloc_msize, GFP_NOFS);
+ else
+ fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
fc->cache = NULL;
}
if (!fc->sdata)
@@ -252,7 +259,7 @@ void p9_fcall_fini(struct p9_fcall *fc)
if (fc->cache)
kmem_cache_free(fc->cache, fc->sdata);
else
- kfree(fc->sdata);
+ kvfree(fc->sdata);
}
EXPORT_SYMBOL(p9_fcall_fini);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index a516745f732f..e7334033eba5 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -1101,6 +1101,7 @@ static struct p9_trans_module p9_tcp_trans = {
.maxsize = MAX_SOCK_BUF,
.pooled_rbuffers = false,
.def = 0,
+ .supports_vmalloc = true,
.create = p9_fd_create_tcp,
.close = p9_fd_close,
.request = p9_fd_request,
@@ -1115,6 +1116,7 @@ static struct p9_trans_module p9_unix_trans = {
.name = "unix",
.maxsize = MAX_SOCK_BUF,
.def = 0,
+ .supports_vmalloc = true,
.create = p9_fd_create_unix,
.close = p9_fd_close,
.request = p9_fd_request,
@@ -1129,6 +1131,7 @@ static struct p9_trans_module p9_fd_trans = {
.name = "fd",
.maxsize = MAX_SOCK_BUF,
.def = 0,
+ .supports_vmalloc = true,
.create = p9_fd_create,
.close = p9_fd_close,
.request = p9_fd_request,
--
2.39.5 (Apple Git-154)
On Fri, Aug 8, 2025, at 13:12, Christian Schoenebeck wrote:
> On Wednesday, August 6, 2025 11:44:34 PM CEST Dominique Martinet wrote:
>>
>> Pierre Barre wrote on Wed, Aug 06, 2025 at 05:50:42PM +0200:
>> > If I submit a patch based on what has been discussed above, is it
>> > likely to be accepted? Unfortunately, in my current setup, I am
>> > encountering this issue quite frequently, and users of my servers are
>> > having a hard time making sense of the error.
>>
>> Yes, sorry it wasn't clear.
>>
>> I still have no idea what's the "best" allocation method that we'll be
>> able to use as either a vmalloc buffer or split into a scatterlist, but
>> there's little point in worrying too much about it, so please go ahead.
>>
>> If it's restricted to trans_fd and there's a chance we can make use of
>> it with (at least) virtio later I think everyone will be happy :)
>
> Yes, sounds like a viable plan.
>
> Pierre, one more thing to note: kmem_cache_alloc() might still fail though. So
> maybe it would make sense to add a separate patch that would check the result
> of kmem_cache_alloc() and if it fails, falling back to your kvmalloc() call
> (if enabled by the discussed transport mechanism of course).
>
> /Christian
On Thursday, October 16, 2025 9:01:56 AM CEST Pierre Barre wrote:
> While developing a 9P server (https://github.com/Barre/ZeroFS) and
> testing it under high-load, I was running into allocation failures.
> The failures occur even with plenty of free memory available because
> kmalloc requires contiguous physical memory.
>
> This results in errors like:
> ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP)
>
> This patch introduces a transport capability flag (supports_vmalloc)
> that indicates whether a transport can work with vmalloc'd buffers
> (non-physically contiguous memory). Transports requiring DMA should
> leave this flag as false.
>
> The fd-based transports (tcp, unix, fd) set this flag to true, and
> p9_fcall_init will use kvmalloc instead of kmalloc for these
> transports. This allows the allocator to fall back to vmalloc when
> contiguous physical memory is not available.
>
> Additionally, if kmem_cache_alloc fails, the code falls back to
> kvmalloc for transports that support it.
>
> Signed-off-by: Pierre Barre <pierre@barre.sh>
> ---
>
> include/net/9p/transport.h | 4 ++++
> net/9p/client.c | 11 +++++++++--
> net/9p/trans_fd.c | 3 +++
> 3 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/9p/transport.h b/include/net/9p/transport.h
> index 766ec07c9599..f0981515148d 100644
> --- a/include/net/9p/transport.h
> +++ b/include/net/9p/transport.h
> @@ -24,6 +24,9 @@
> * we're less flexible when choosing the response message
> * size in this case
> * @def: set if this transport should be considered the default
> + * @supports_vmalloc: set if this transport can work with vmalloc'd buffers
> + * (non-physically contiguous memory). Transports
> requiring + * DMA should leave this as false.
> * @create: member function to create a new connection on this transport
> * @close: member function to discard a connection on this transport
> * @request: member function to issue a request to the transport
> @@ -44,6 +47,7 @@ struct p9_trans_module {
> int maxsize; /* max message size of transport */
> bool pooled_rbuffers;
> int def; /* this transport should be default */
> + bool supports_vmalloc; /* can work with vmalloc'd buffers */
> struct module *owner;
> int (*create)(struct p9_client *client,
> const char *devname, char *args);
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 5c1ca57ccd28..2a4884c880c1 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -229,8 +229,15 @@ static int p9_fcall_init(struct p9_client *c, struct
> p9_fcall *fc, if (likely(c->fcall_cache) && alloc_msize == c->msize) {
> fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
> fc->cache = c->fcall_cache;
> + if (!fc->sdata && c->trans_mod->supports_vmalloc) {
> + fc->sdata = kvmalloc(alloc_msize, GFP_NOFS);
> + fc->cache = NULL;
> + }
> } else {
> - fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> + if (c->trans_mod->supports_vmalloc)
> + fc->sdata = kvmalloc(alloc_msize, GFP_NOFS);
> + else
> + fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> fc->cache = NULL;
> }
> if (!fc->sdata)
> @@ -252,7 +259,7 @@ void p9_fcall_fini(struct p9_fcall *fc)
> if (fc->cache)
> kmem_cache_free(fc->cache, fc->sdata);
> else
> - kfree(fc->sdata);
> + kvfree(fc->sdata);
> }
> EXPORT_SYMBOL(p9_fcall_fini);
>
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index a516745f732f..e7334033eba5 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -1101,6 +1101,7 @@ static struct p9_trans_module p9_tcp_trans = {
> .maxsize = MAX_SOCK_BUF,
> .pooled_rbuffers = false,
> .def = 0,
> + .supports_vmalloc = true,
> .create = p9_fd_create_tcp,
> .close = p9_fd_close,n
> .request = p9_fd_request,
> @@ -1115,6 +1116,7 @@ static struct p9_trans_module p9_unix_trans = {
> .name = "unix",
> .maxsize = MAX_SOCK_BUF,
> .def = 0,
> + .supports_vmalloc = true,
> .create = p9_fd_create_unix,
> .close = p9_fd_close,
> .request = p9_fd_request,
> @@ -1129,6 +1131,7 @@ static struct p9_trans_module p9_fd_trans = {
> .name = "fd",
> .maxsize = MAX_SOCK_BUF,
> .def = 0,
> + .supports_vmalloc = true,
> .create = p9_fd_create,
> .close = p9_fd_close,
> .request = p9_fd_request,
Just for clarity I would add .supports_vmalloc = false to the other
transports. Except of that:
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Thanks!
/Christian
While developing a 9P server (https://github.com/Barre/ZeroFS) and
testing it under high-load, I was running into allocation failures.
The failures occur even with plenty of free memory available because
kmalloc requires contiguous physical memory.
This results in errors like:
ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP)
This patch introduces a transport capability flag (supports_vmalloc)
that indicates whether a transport can work with vmalloc'd buffers
(non-physically contiguous memory). Transports requiring DMA should
leave this flag as false.
The fd-based transports (tcp, unix, fd) set this flag to true, and
p9_fcall_init will use kvmalloc instead of kmalloc for these
transports. This allows the allocator to fall back to vmalloc when
contiguous physical memory is not available.
Additionally, if kmem_cache_alloc fails, the code falls back to
kvmalloc for transports that support it.
Signed-off-by: Pierre Barre <pierre@barre.sh>
---
Changes in v3:
- Added explicit .supports_vmalloc = false to DMA-based transports
(suggested by Christian Schoenebeck)
include/net/9p/transport.h | 4 ++++
net/9p/client.c | 11 +++++++++--
net/9p/trans_fd.c | 3 +++
net/9p/trans_rdma.c | 1 +
net/9p/trans_usbg.c | 1 +
net/9p/trans_virtio.c | 1 +
net/9p/trans_xen.c | 1 +
7 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/include/net/9p/transport.h b/include/net/9p/transport.h
index 766ec07c9599..f0981515148d 100644
--- a/include/net/9p/transport.h
+++ b/include/net/9p/transport.h
@@ -24,6 +24,9 @@
* we're less flexible when choosing the response message
* size in this case
* @def: set if this transport should be considered the default
+ * @supports_vmalloc: set if this transport can work with vmalloc'd buffers
+ * (non-physically contiguous memory). Transports requiring
+ * DMA should leave this as false.
* @create: member function to create a new connection on this transport
* @close: member function to discard a connection on this transport
* @request: member function to issue a request to the transport
@@ -44,6 +47,7 @@ struct p9_trans_module {
int maxsize; /* max message size of transport */
bool pooled_rbuffers;
int def; /* this transport should be default */
+ bool supports_vmalloc; /* can work with vmalloc'd buffers */
struct module *owner;
int (*create)(struct p9_client *client,
const char *devname, char *args);
diff --git a/net/9p/client.c b/net/9p/client.c
index 5c1ca57ccd28..2a4884c880c1 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -229,8 +229,15 @@ static int p9_fcall_init(struct p9_client *c, struct p9_fcall *fc,
if (likely(c->fcall_cache) && alloc_msize == c->msize) {
fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
fc->cache = c->fcall_cache;
+ if (!fc->sdata && c->trans_mod->supports_vmalloc) {
+ fc->sdata = kvmalloc(alloc_msize, GFP_NOFS);
+ fc->cache = NULL;
+ }
} else {
- fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
+ if (c->trans_mod->supports_vmalloc)
+ fc->sdata = kvmalloc(alloc_msize, GFP_NOFS);
+ else
+ fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
fc->cache = NULL;
}
if (!fc->sdata)
@@ -252,7 +259,7 @@ void p9_fcall_fini(struct p9_fcall *fc)
if (fc->cache)
kmem_cache_free(fc->cache, fc->sdata);
else
- kfree(fc->sdata);
+ kvfree(fc->sdata);
}
EXPORT_SYMBOL(p9_fcall_fini);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index a516745f732f..e7334033eba5 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -1101,6 +1101,7 @@ static struct p9_trans_module p9_tcp_trans = {
.maxsize = MAX_SOCK_BUF,
.pooled_rbuffers = false,
.def = 0,
+ .supports_vmalloc = true,
.create = p9_fd_create_tcp,
.close = p9_fd_close,
.request = p9_fd_request,
@@ -1115,6 +1116,7 @@ static struct p9_trans_module p9_unix_trans = {
.name = "unix",
.maxsize = MAX_SOCK_BUF,
.def = 0,
+ .supports_vmalloc = true,
.create = p9_fd_create_unix,
.close = p9_fd_close,
.request = p9_fd_request,
@@ -1129,6 +1131,7 @@ static struct p9_trans_module p9_fd_trans = {
.name = "fd",
.maxsize = MAX_SOCK_BUF,
.def = 0,
+ .supports_vmalloc = true,
.create = p9_fd_create,
.close = p9_fd_close,
.request = p9_fd_request,
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index b84748baf9cb..6c5ad232c194 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -749,6 +749,7 @@ static struct p9_trans_module p9_rdma_trans = {
.maxsize = P9_RDMA_MAXSIZE,
.pooled_rbuffers = true,
.def = 0,
+ .supports_vmalloc = false,
.owner = THIS_MODULE,
.create = rdma_create_trans,
.close = rdma_close,
diff --git a/net/9p/trans_usbg.c b/net/9p/trans_usbg.c
index 468f7e8f0277..2542ef099233 100644
--- a/net/9p/trans_usbg.c
+++ b/net/9p/trans_usbg.c
@@ -514,6 +514,7 @@ static struct p9_trans_module p9_usbg_trans = {
.close = p9_usbg_close,
.request = p9_usbg_request,
.cancel = p9_usbg_cancel,
+ .supports_vmalloc = false,
.owner = THIS_MODULE,
};
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 0b8086f58ad5..12806207f4f0 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -803,6 +803,7 @@ static struct p9_trans_module p9_virtio_trans = {
.maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),
.pooled_rbuffers = false,
.def = 1,
+ .supports_vmalloc = false,
.owner = THIS_MODULE,
};
diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index b9ff69c7522a..4b1cec0ab829 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -258,6 +258,7 @@ static struct p9_trans_module p9_xen_trans = {
.maxsize = 1 << (XEN_9PFS_RING_ORDER + XEN_PAGE_SHIFT - 2),
.pooled_rbuffers = false,
.def = 1,
+ .supports_vmalloc = false,
.create = p9_xen_create,
.close = p9_xen_close,
.request = p9_xen_request,
--
2.39.5 (Apple Git-154)
On Thu, Oct 16, 2025, at 15:06, Christian Schoenebeck wrote:
> On Thursday, October 16, 2025 9:01:56 AM CEST Pierre Barre wrote:
>> While developing a 9P server (https://github.com/Barre/ZeroFS) and
>> testing it under high-load, I was running into allocation failures.
>> The failures occur even with plenty of free memory available because
>> kmalloc requires contiguous physical memory.
>>
>> This results in errors like:
>> ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP)
>>
>> This patch introduces a transport capability flag (supports_vmalloc)
>> that indicates whether a transport can work with vmalloc'd buffers
>> (non-physically contiguous memory). Transports requiring DMA should
>> leave this flag as false.
>>
>> The fd-based transports (tcp, unix, fd) set this flag to true, and
>> p9_fcall_init will use kvmalloc instead of kmalloc for these
>> transports. This allows the allocator to fall back to vmalloc when
>> contiguous physical memory is not available.
>>
>> Additionally, if kmem_cache_alloc fails, the code falls back to
>> kvmalloc for transports that support it.
>>
>> Signed-off-by: Pierre Barre <pierre@barre.sh>
>> ---
>>
>> include/net/9p/transport.h | 4 ++++
>> net/9p/client.c | 11 +++++++++--
>> net/9p/trans_fd.c | 3 +++
>> 3 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/net/9p/transport.h b/include/net/9p/transport.h
>> index 766ec07c9599..f0981515148d 100644
>> --- a/include/net/9p/transport.h
>> +++ b/include/net/9p/transport.h
>> @@ -24,6 +24,9 @@
>> * we're less flexible when choosing the response message
>> * size in this case
>> * @def: set if this transport should be considered the default
>> + * @supports_vmalloc: set if this transport can work with vmalloc'd buffers
>> + * (non-physically contiguous memory). Transports
>> requiring + * DMA should leave this as false.
>> * @create: member function to create a new connection on this transport
>> * @close: member function to discard a connection on this transport
>> * @request: member function to issue a request to the transport
>> @@ -44,6 +47,7 @@ struct p9_trans_module {
>> int maxsize; /* max message size of transport */
>> bool pooled_rbuffers;
>> int def; /* this transport should be default */
>> + bool supports_vmalloc; /* can work with vmalloc'd buffers */
>> struct module *owner;
>> int (*create)(struct p9_client *client,
>> const char *devname, char *args);
>> diff --git a/net/9p/client.c b/net/9p/client.c
>> index 5c1ca57ccd28..2a4884c880c1 100644
>> --- a/net/9p/client.c
>> +++ b/net/9p/client.c
>> @@ -229,8 +229,15 @@ static int p9_fcall_init(struct p9_client *c, struct
>> p9_fcall *fc, if (likely(c->fcall_cache) && alloc_msize == c->msize) {
>> fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
>> fc->cache = c->fcall_cache;
>> + if (!fc->sdata && c->trans_mod->supports_vmalloc) {
>> + fc->sdata = kvmalloc(alloc_msize, GFP_NOFS);
>> + fc->cache = NULL;
>> + }
>> } else {
>> - fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
>> + if (c->trans_mod->supports_vmalloc)
>> + fc->sdata = kvmalloc(alloc_msize, GFP_NOFS);
>> + else
>> + fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
>> fc->cache = NULL;
>> }
>> if (!fc->sdata)
>> @@ -252,7 +259,7 @@ void p9_fcall_fini(struct p9_fcall *fc)
>> if (fc->cache)
>> kmem_cache_free(fc->cache, fc->sdata);
>> else
>> - kfree(fc->sdata);
>> + kvfree(fc->sdata);
>> }
>> EXPORT_SYMBOL(p9_fcall_fini);
>>
>> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
>> index a516745f732f..e7334033eba5 100644
>> --- a/net/9p/trans_fd.c
>> +++ b/net/9p/trans_fd.c
>> @@ -1101,6 +1101,7 @@ static struct p9_trans_module p9_tcp_trans = {
>> .maxsize = MAX_SOCK_BUF,
>> .pooled_rbuffers = false,
>> .def = 0,
>> + .supports_vmalloc = true,
>> .create = p9_fd_create_tcp,
>> .close = p9_fd_close,n
>> .request = p9_fd_request,
>> @@ -1115,6 +1116,7 @@ static struct p9_trans_module p9_unix_trans = {
>> .name = "unix",
>> .maxsize = MAX_SOCK_BUF,
>> .def = 0,
>> + .supports_vmalloc = true,
>> .create = p9_fd_create_unix,
>> .close = p9_fd_close,
>> .request = p9_fd_request,
>> @@ -1129,6 +1131,7 @@ static struct p9_trans_module p9_fd_trans = {
>> .name = "fd",
>> .maxsize = MAX_SOCK_BUF,
>> .def = 0,
>> + .supports_vmalloc = true,
>> .create = p9_fd_create,
>> .close = p9_fd_close,
>> .request = p9_fd_request,
>
> Just for clarity I would add .supports_vmalloc = false to the other
> transports. Except of that:
>
> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
>
> Thanks!
>
> /Christian
Pierre Barre wrote on Thu, Oct 16, 2025 at 03:58:36PM +0200: > While developing a 9P server (https://github.com/Barre/ZeroFS) and > testing it under high-load, I was running into allocation failures. > The failures occur even with plenty of free memory available because > kmalloc requires contiguous physical memory. > > This results in errors like: > ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP) > > This patch introduces a transport capability flag (supports_vmalloc) > that indicates whether a transport can work with vmalloc'd buffers > (non-physically contiguous memory). Transports requiring DMA should > leave this flag as false. > > The fd-based transports (tcp, unix, fd) set this flag to true, and > p9_fcall_init will use kvmalloc instead of kmalloc for these > transports. This allows the allocator to fall back to vmalloc when > contiguous physical memory is not available. > > Additionally, if kmem_cache_alloc fails, the code falls back to > kvmalloc for transports that support it. > > Signed-off-by: Pierre Barre <pierre@barre.sh> Thanks, it's now picked up and queued in -next -- will send to Linus in a couple of months. FWIW, I prefer (and I think it's the norm in the linux world) if patches new versions aren't sent as a reply to previous version, it confuses tools like b4 that fetch the patch thread for version operations. If you send patches again please just send later versions without using --in-reply-to :) Cheers, -- Dominique Martinet | Asmadeus
> On 3 Nov 2025, at 08:52, asmadeus <asmadeus@codewreck.org> wrote: > > Pierre Barre wrote on Thu, Oct 16, 2025 at 03:58:36PM +0200: >> While developing a 9P server (https://github.com/Barre/ZeroFS) and >> testing it under high-load, I was running into allocation failures. >> The failures occur even with plenty of free memory available because >> kmalloc requires contiguous physical memory. >> >> This results in errors like: >> ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP) >> >> This patch introduces a transport capability flag (supports_vmalloc) >> that indicates whether a transport can work with vmalloc'd buffers >> (non-physically contiguous memory). Transports requiring DMA should >> leave this flag as false. >> >> The fd-based transports (tcp, unix, fd) set this flag to true, and >> p9_fcall_init will use kvmalloc instead of kmalloc for these >> transports. This allows the allocator to fall back to vmalloc when >> contiguous physical memory is not available. >> >> Additionally, if kmem_cache_alloc fails, the code falls back to >> kvmalloc for transports that support it. >> >> Signed-off-by: Pierre Barre <pierre@barre.sh> > > Thanks, it's now picked up and queued in -next -- will send to Linus in > a couple of months. > > FWIW, I prefer (and I think it's the norm in the linux world) if patches > new versions aren't sent as a reply to previous version, it confuses > tools like b4 that fetch the patch thread for version operations. > If you send patches again please just send later versions without > using --in-reply-to :) Got it, thank you! Best, Pierre > > Cheers, > -- > Dominique Martinet | Asmadeus
Le 16/10/2025 à 15:58, Pierre Barre a écrit :
> While developing a 9P server (https://github.com/Barre/ZeroFS) and
> testing it under high-load, I was running into allocation failures.
> The failures occur even with plenty of free memory available because
> kmalloc requires contiguous physical memory.
>
> This results in errors like:
> ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP)
>
> This patch introduces a transport capability flag (supports_vmalloc)
> that indicates whether a transport can work with vmalloc'd buffers
> (non-physically contiguous memory). Transports requiring DMA should
> leave this flag as false.
>
> The fd-based transports (tcp, unix, fd) set this flag to true, and
> p9_fcall_init will use kvmalloc instead of kmalloc for these
> transports. This allows the allocator to fall back to vmalloc when
> contiguous physical memory is not available.
>
> Additionally, if kmem_cache_alloc fails, the code falls back to
> kvmalloc for transports that support it.
>
> Signed-off-by: Pierre Barre <pierre@barre.sh>
> ---
> Changes in v3:
> - Added explicit .supports_vmalloc = false to DMA-based transports
> (suggested by Christian Schoenebeck)
>
> include/net/9p/transport.h | 4 ++++
> net/9p/client.c | 11 +++++++++--
> net/9p/trans_fd.c | 3 +++
> net/9p/trans_rdma.c | 1 +
> net/9p/trans_usbg.c | 1 +
> net/9p/trans_virtio.c | 1 +
> net/9p/trans_xen.c | 1 +
> 7 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/9p/transport.h b/include/net/9p/transport.h
> index 766ec07c9599..f0981515148d 100644
> --- a/include/net/9p/transport.h
> +++ b/include/net/9p/transport.h
...
> @@ -44,6 +47,7 @@ struct p9_trans_module {
> int maxsize; /* max message size of transport */
> bool pooled_rbuffers;
> int def; /* this transport should be default */
Unrelated to this patch, but it looks like 'def' could be a bool.
This would save a few bytes in the structure, should it matter.
CJ
> + bool supports_vmalloc; /* can work with vmalloc'd buffers */
> struct module *owner;
> int (*create)(struct p9_client *client,
> const char *devname, char *args);
Pierre Barre wrote on Thu, Oct 16, 2025 at 09:01:56AM +0200: > While developing a 9P server (https://github.com/Barre/ZeroFS) and > testing it under high-load, I was running into allocation failures. > The failures occur even with plenty of free memory available because > kmalloc requires contiguous physical memory. > > This results in errors like: > ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP) > > This patch introduces a transport capability flag (supports_vmalloc) > that indicates whether a transport can work with vmalloc'd buffers > (non-physically contiguous memory). Transports requiring DMA should > leave this flag as false. > > The fd-based transports (tcp, unix, fd) set this flag to true, and > p9_fcall_init will use kvmalloc instead of kmalloc for these > transports. This allows the allocator to fall back to vmalloc when > contiguous physical memory is not available. > > Additionally, if kmem_cache_alloc fails, the code falls back to > kvmalloc for transports that support it. Looks good to me, I'll pick this up for -next when time allows and it'll get in 6.19 in a few months Now to get someone to make this work with virtio... :) > @@ -252,7 +259,7 @@ void p9_fcall_fini(struct p9_fcall *fc) > if (fc->cache) > kmem_cache_free(fc->cache, fc->sdata); > else > - kfree(fc->sdata); > + kvfree(fc->sdata); (Oh, it's ok to kvfree() something allocated with either kmalloc or kvmalloc, neat.) -- Dominique Martinet | Asmadeus
© 2016 - 2026 Red Hat, Inc.