net/9p/client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
While developing a 9P server (https://github.com/Barre/ZeroFS) and testing it under high-load, I was running into allocation failures. The failures occur even with plenty of free memory available because kmalloc requires contiguous physical memory.
This results in errors like:
ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP)
Signed-off-by: Pierre Barre <pierre@barre.sh>
---
net/9p/client.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/9p/client.c b/net/9p/client.c
index 5c1ca57ccd28..f82b5674057c 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -230,7 +230,7 @@ static int p9_fcall_init(struct p9_client *c, struct p9_fcall *fc,
fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
fc->cache = c->fcall_cache;
} else {
- fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
+ fc->sdata = kvmalloc(alloc_msize, GFP_NOFS);
fc->cache = NULL;
}
if (!fc->sdata)
@@ -252,7 +252,7 @@ void p9_fcall_fini(struct p9_fcall *fc)
if (fc->cache)
kmem_cache_free(fc->cache, fc->sdata);
else
- kfree(fc->sdata);
+ kvfree(fc->sdata);
}
EXPORT_SYMBOL(p9_fcall_fini);
--
2.39.5 (Apple Git-154)
On Wednesday, July 30, 2025 5:08:05 PM CEST Pierre Barre wrote: > While developing a 9P server (https://github.com/Barre/ZeroFS) and testing it under high-load, I was running into allocation failures. The failures occur even with plenty of free memory available because kmalloc requires contiguous physical memory. > > This results in errors like: > ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP) What was msize? > Signed-off-by: Pierre Barre <pierre@barre.sh> > --- > net/9p/client.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/9p/client.c b/net/9p/client.c > index 5c1ca57ccd28..f82b5674057c 100644 > --- a/net/9p/client.c > +++ b/net/9p/client.c > @@ -230,7 +230,7 @@ static int p9_fcall_init(struct p9_client *c, struct p9_fcall *fc, > fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS); > fc->cache = c->fcall_cache; > } else { > - fc->sdata = kmalloc(alloc_msize, GFP_NOFS); > + fc->sdata = kvmalloc(alloc_msize, GFP_NOFS); That would work with certain transports like fd I guess, but not via virtio-pci transport for instance, since PCI-DMA requires physical pages. Same applies to Xen transport I guess. > fc->cache = NULL; > } > if (!fc->sdata) > @@ -252,7 +252,7 @@ void p9_fcall_fini(struct p9_fcall *fc) > if (fc->cache) > kmem_cache_free(fc->cache, fc->sdata); > else > - kfree(fc->sdata); > + kvfree(fc->sdata); > } > EXPORT_SYMBOL(p9_fcall_fini); > >
Thank you for your email. > What was msize? I was mounting the filesystem using: trans=tcp,port=5564,version=9p2000.L,msize=1048576,cache=mmap,access=user > That would work with certain transports like fd I guess, but not via > virtio-pci transport for instance, since PCI-DMA requires physical pages. Same > applies to Xen transport I guess. Would it be acceptable to add a mount option (like nocontig or loosealloc?) that enables kvmalloc? Best, Pierre On Wed, Jul 30, 2025, at 18:08, Christian Schoenebeck wrote: > On Wednesday, July 30, 2025 5:08:05 PM CEST Pierre Barre wrote: >> While developing a 9P server (https://github.com/Barre/ZeroFS) and testing it under high-load, I was running into allocation failures. The failures occur even with plenty of free memory available because kmalloc requires contiguous physical memory. >> >> This results in errors like: >> ls: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP) > > What was msize? > >> Signed-off-by: Pierre Barre <pierre@barre.sh> >> --- >> net/9p/client.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/net/9p/client.c b/net/9p/client.c >> index 5c1ca57ccd28..f82b5674057c 100644 >> --- a/net/9p/client.c >> +++ b/net/9p/client.c >> @@ -230,7 +230,7 @@ static int p9_fcall_init(struct p9_client *c, struct p9_fcall *fc, >> fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS); >> fc->cache = c->fcall_cache; >> } else { >> - fc->sdata = kmalloc(alloc_msize, GFP_NOFS); >> + fc->sdata = kvmalloc(alloc_msize, GFP_NOFS); > > That would work with certain transports like fd I guess, but not via > virtio-pci transport for instance, since PCI-DMA requires physical pages. Same > applies to Xen transport I guess. > >> fc->cache = NULL; >> } >> if (!fc->sdata) >> @@ -252,7 +252,7 @@ void p9_fcall_fini(struct p9_fcall *fc) >> if (fc->cache) >> kmem_cache_free(fc->cache, fc->sdata); >> else >> - kfree(fc->sdata); >> + kvfree(fc->sdata); >> } >> EXPORT_SYMBOL(p9_fcall_fini); >> >>
On Wednesday, July 30, 2025 6:19:37 PM CEST Pierre Barre wrote: > Thank you for your email. > > > What was msize? > > I was mounting the filesystem using: > > trans=tcp,port=5564,version=9p2000.L,msize=1048576,cache=mmap,access=user Yeah, that explains both why you were triggering this issue, as 1M msize will likely cause kmalloc() failures under heavy load, and why your patch was working for you due to chosen tcp transport. > > That would work with certain transports like fd I guess, but not via > > virtio-pci transport for instance, since PCI-DMA requires physical pages. Same > > applies to Xen transport I guess. > > Would it be acceptable to add a mount option (like nocontig or loosealloc?) that enables kvmalloc? Dominique's call obviously, I'm just giving my few cents here. To me it would make sense to fix the root cause instead of shorting a symptom: Right now 9p filesystem code (under fs/9p) requires a linear buffer, whereas some 9p transports (under net/9p) require physical pages, and the latter is not going to change. One solution therefore might be changing fs/9p code to work on a scatter/ gather list instead of a simple linear buffer. But I guess that would be too much work. So a more reasonable solution instead might be using kvmalloc(), as suggested by you, and adjusting the individual transports such that they translate a virtual memory address to a list of physical addresses via e.g. vmalloc_to_page() if needed. /Christian
As a middle-ground, would it be acceptable to add a transport capability flag indicating whether the transport requires contiguous memory for DMA? 1. Add a P9_TRANS_REQUIRES_CONTIGUOUS flag to struct p9_trans_module 2. Set this flag for virtio, xen, and rdma transports 3. Modify p9_fcall_init to check the flag: if (c->trans_mod->caps & P9_TRANS_REQUIRES_CONTIGUOUS) fc->sdata = kmalloc(alloc_msize, GFP_NOFS); else fc->sdata = kvmalloc(alloc_msize, GFP_NOFS); I can prepare a patch implementing this approach if you agree. Best, Pierre On Wed, Jul 30, 2025, at 19:28, Christian Schoenebeck wrote: > On Wednesday, July 30, 2025 6:19:37 PM CEST Pierre Barre wrote: >> Thank you for your email. >> >> > What was msize? >> >> I was mounting the filesystem using: >> >> trans=tcp,port=5564,version=9p2000.L,msize=1048576,cache=mmap,access=user > > Yeah, that explains both why you were triggering this issue, as 1M msize will > likely cause kmalloc() failures under heavy load, and why your patch was > working for you due to chosen tcp transport. > >> > That would work with certain transports like fd I guess, but not via >> > virtio-pci transport for instance, since PCI-DMA requires physical pages. Same >> > applies to Xen transport I guess. >> >> Would it be acceptable to add a mount option (like nocontig or loosealloc?) that enables kvmalloc? > > Dominique's call obviously, I'm just giving my few cents here. To me it would > make sense to fix the root cause instead of shorting a symptom: > > Right now 9p filesystem code (under fs/9p) requires a linear buffer, whereas > some 9p transports (under net/9p) require physical pages, and the latter is > not going to change. > > One solution therefore might be changing fs/9p code to work on a scatter/ > gather list instead of a simple linear buffer. But I guess that would be too > much work. > > So a more reasonable solution instead might be using kvmalloc(), as suggested > by you, and adjusting the individual transports such that they translate a > virtual memory address to a list of physical addresses via e.g. > vmalloc_to_page() if needed. > > /Christian
(Added Willy in To, if you have time to advise on what's appropriate wrt. memory allocation here to use either as a contiguous virtual memory buffer or a scatterlist for various kind of dma used by transports that'd be appreciated) First - thanks for starting this thread, large contiguous allocations has been a problem reported regularly in the past, and no-one has had the energy to address it yet, but it's definitely something worth tackling. Pierre Barre wrote on Wed, Jul 30, 2025 at 10:16:13PM +0200: > As a middle-ground, would it be acceptable to add a transport > capability flag indicating whether the transport requires contiguous > memory for DMA? I think such a flag will be needed even if we do what Christian suggested: >> So a more reasonable solution instead might be using kvmalloc(), as suggested >> by you, and adjusting the individual transports such that they translate a >> virtual memory address to a list of physical addresses via e.g. >> vmalloc_to_page() if needed. Start with a requires contiguous flag set for all transports except trans_fd (I'm not sure about usb, but I'm pretty sure all the rest need it); then each transport can figure out how to get a scatterlist or something they need from the allocation, so we don't need to have a big bang that breaks everything at the same time. I'm sure virtio's implem will come soon enough, but I don't see anyone working on RDMA or xen so fast. > 1. Add a P9_TRANS_REQUIRES_CONTIGUOUS flag to struct p9_trans_module > 2. Set this flag for virtio, xen, and rdma transports This is a nit but I'd rather the flag go the other way around, e.g. no flag means requires contiguous and it's only set after confirming the transport works (I just looked at usb and struct usb_request makes me think some drivers use dma, so, yeah..) We can always turn it around later if the majority of transports handle it. > 3. Modify p9_fcall_init to check the flag: > if (c->trans_mod->caps & P9_TRANS_REQUIRES_CONTIGUOUS) > fc->sdata = kmalloc(alloc_msize, GFP_NOFS); > else > fc->sdata = kvmalloc(alloc_msize, GFP_NOFS); I'm also curious if there's something more appropriate than kvmalloc wrt using either as a contiguous virtual memory buffer or a scatterlist, but I think this is fine for now unless someone knows of something more appropriate. Thanks, -- Dominique Martinet | Asmadeus
> First - thanks for starting this thread, large contiguous allocations > has been a problem reported regularly in the past, and no-one has had > the energy to address it yet, but it's definitely something worth > tackling. Glad that I didn't just create noise. I implemented 9P in addition to NFS in my server because fsyncing on NFS didn't give me what I wanted (fsync doesn't call commit when write is communicated to be "durable" by the server, but that's a story for an other day...). I am very grateful that this client implementation exists. On Thu, Jul 31, 2025, at 00:07, asmadeus@codewreck.org wrote: > (Added Willy in To, if you have time to advise on what's appropriate > wrt. memory allocation here to use either as a contiguous virtual memory > buffer or a scatterlist for various kind of dma used by transports > that'd be appreciated) > > > First - thanks for starting this thread, large contiguous allocations > has been a problem reported regularly in the past, and no-one has had > the energy to address it yet, but it's definitely something worth > tackling. > > > Pierre Barre wrote on Wed, Jul 30, 2025 at 10:16:13PM +0200: >> As a middle-ground, would it be acceptable to add a transport >> capability flag indicating whether the transport requires contiguous >> memory for DMA? > > I think such a flag will be needed even if we do what Christian suggested: >>> So a more reasonable solution instead might be using kvmalloc(), as suggested >>> by you, and adjusting the individual transports such that they translate a >>> virtual memory address to a list of physical addresses via e.g. >>> vmalloc_to_page() if needed. > > Start with a requires contiguous flag set for all transports except > trans_fd (I'm not sure about usb, but I'm pretty sure all the rest need > it); then each transport can figure out how to get a scatterlist or > something they need from the allocation, so we don't need to have a big > bang that breaks everything at the same time. > > I'm sure virtio's implem will come soon enough, but I don't see anyone > working on RDMA or xen so fast. > >> 1. Add a P9_TRANS_REQUIRES_CONTIGUOUS flag to struct p9_trans_module >> 2. Set this flag for virtio, xen, and rdma transports > > This is a nit but I'd rather the flag go the other way around, e.g. no > flag means requires contiguous and it's only set after confirming the > transport works > (I just looked at usb and struct usb_request makes me think some drivers > use dma, so, yeah..) > > We can always turn it around later if the majority of transports handle > it. > > >> 3. Modify p9_fcall_init to check the flag: >> if (c->trans_mod->caps & P9_TRANS_REQUIRES_CONTIGUOUS) >> fc->sdata = kmalloc(alloc_msize, GFP_NOFS); >> else >> fc->sdata = kvmalloc(alloc_msize, GFP_NOFS); > > I'm also curious if there's something more appropriate than kvmalloc wrt > using either as a contiguous virtual memory buffer or a scatterlist, but > I think this is fine for now unless someone knows of something more > appropriate. > > > Thanks, > -- > Dominique Martinet | Asmadeus
Hi Everyone, If I submit a patch based on what has been discussed above, is it likely to be accepted? Unfortunately, in my current setup, I am encountering this issue quite frequently, and users of my servers are having a hard time making sense of the error. Thank you. Best, Pierre On Thu, Jul 31, 2025, at 02:36, Pierre Barre wrote: >> First - thanks for starting this thread, large contiguous allocations >> has been a problem reported regularly in the past, and no-one has had >> the energy to address it yet, but it's definitely something worth >> tackling. > > Glad that I didn't just create noise. I implemented 9P in addition to > NFS in my server because fsyncing on NFS didn't give me what I wanted > (fsync doesn't call commit when write is communicated to be "durable" > by the server, but that's a story for an other day...). > > I am very grateful that this client implementation exists. > > On Thu, Jul 31, 2025, at 00:07, asmadeus@codewreck.org wrote: >> (Added Willy in To, if you have time to advise on what's appropriate >> wrt. memory allocation here to use either as a contiguous virtual memory >> buffer or a scatterlist for various kind of dma used by transports >> that'd be appreciated) >> >> >> First - thanks for starting this thread, large contiguous allocations >> has been a problem reported regularly in the past, and no-one has had >> the energy to address it yet, but it's definitely something worth >> tackling. >> >> >> Pierre Barre wrote on Wed, Jul 30, 2025 at 10:16:13PM +0200: >>> As a middle-ground, would it be acceptable to add a transport >>> capability flag indicating whether the transport requires contiguous >>> memory for DMA? >> >> I think such a flag will be needed even if we do what Christian suggested: >>>> So a more reasonable solution instead might be using kvmalloc(), as suggested >>>> by you, and adjusting the individual transports such that they translate a >>>> virtual memory address to a list of physical addresses via e.g. >>>> vmalloc_to_page() if needed. >> >> Start with a requires contiguous flag set for all transports except >> trans_fd (I'm not sure about usb, but I'm pretty sure all the rest need >> it); then each transport can figure out how to get a scatterlist or >> something they need from the allocation, so we don't need to have a big >> bang that breaks everything at the same time. >> >> I'm sure virtio's implem will come soon enough, but I don't see anyone >> working on RDMA or xen so fast. >> >>> 1. Add a P9_TRANS_REQUIRES_CONTIGUOUS flag to struct p9_trans_module >>> 2. Set this flag for virtio, xen, and rdma transports >> >> This is a nit but I'd rather the flag go the other way around, e.g. no >> flag means requires contiguous and it's only set after confirming the >> transport works >> (I just looked at usb and struct usb_request makes me think some drivers >> use dma, so, yeah..) >> >> We can always turn it around later if the majority of transports handle >> it. >> >> >>> 3. Modify p9_fcall_init to check the flag: >>> if (c->trans_mod->caps & P9_TRANS_REQUIRES_CONTIGUOUS) >>> fc->sdata = kmalloc(alloc_msize, GFP_NOFS); >>> else >>> fc->sdata = kvmalloc(alloc_msize, GFP_NOFS); >> >> I'm also curious if there's something more appropriate than kvmalloc wrt >> using either as a contiguous virtual memory buffer or a scatterlist, but >> I think this is fine for now unless someone knows of something more >> appropriate. >> >> >> Thanks, >> -- >> Dominique Martinet | Asmadeus
Pierre Barre wrote on Wed, Aug 06, 2025 at 05:50:42PM +0200: > If I submit a patch based on what has been discussed above, is it > likely to be accepted? Unfortunately, in my current setup, I am > encountering this issue quite frequently, and users of my servers are > having a hard time making sense of the error. Yes, sorry it wasn't clear. I still have no idea what's the "best" allocation method that we'll be able to use as either a vmalloc buffer or split into a scatterlist, but there's little point in worrying too much about it, so please go ahead. If it's restricted to trans_fd and there's a chance we can make use of it with (at least) virtio later I think everyone will be happy :) -- Dominique
On Wednesday, August 6, 2025 11:44:34 PM CEST Dominique Martinet wrote: > > Pierre Barre wrote on Wed, Aug 06, 2025 at 05:50:42PM +0200: > > If I submit a patch based on what has been discussed above, is it > > likely to be accepted? Unfortunately, in my current setup, I am > > encountering this issue quite frequently, and users of my servers are > > having a hard time making sense of the error. > > Yes, sorry it wasn't clear. > > I still have no idea what's the "best" allocation method that we'll be > able to use as either a vmalloc buffer or split into a scatterlist, but > there's little point in worrying too much about it, so please go ahead. > > If it's restricted to trans_fd and there's a chance we can make use of > it with (at least) virtio later I think everyone will be happy :) Yes, sounds like a viable plan. Pierre, one more thing to note: kmem_cache_alloc() might still fail though. So maybe it would make sense to add a separate patch that would check the result of kmem_cache_alloc() and if it fails, falling back to your kvmalloc() call (if enabled by the discussed transport mechanism of course). /Christian
© 2016 - 2025 Red Hat, Inc.