net/9p/trans_xen.c | 1 + 1 file changed, 1 insertion(+)
We need to do this so that we can signal to the other end that the
device is being removed, so that it will release its claim on the
underlying memory allocation. Otherwise releasing the grant-table
entries is deferred resulting in a kernel oops since the pages have
already been freed.
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Fixes: 71ebd71921e45 ("xen/9pfs: connect to the backend")
Signed-off-by: Ariadne Conill <ariadne@ariadne.space>
Signed-off-by: Alex Zenla <alex@edera.dev>
---
net/9p/trans_xen.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index b9ff69c7522a..cde283c42dc6 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -312,6 +312,7 @@ static void xen_9pfs_front_remove(struct xenbus_device *dev)
{
struct xen_9pfs_front_priv *priv = dev_get_drvdata(&dev->dev);
+ xenbus_switch_state(dev, XenbusStateClosing);
dev_set_drvdata(&dev->dev, NULL);
xen_9pfs_front_free(priv);
}
--
2.52.0
On Mon, Dec 08, 2025 at 11:51:55AM -0800, Ariadne Conill wrote:
> We need to do this so that we can signal to the other end that the
> device is being removed, so that it will release its claim on the
> underlying memory allocation. Otherwise releasing the grant-table
> entries is deferred resulting in a kernel oops since the pages have
> already been freed.
>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Fixes: 71ebd71921e45 ("xen/9pfs: connect to the backend")
> Signed-off-by: Ariadne Conill <ariadne@ariadne.space>
> Signed-off-by: Alex Zenla <alex@edera.dev>
> ---
> net/9p/trans_xen.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
> index b9ff69c7522a..cde283c42dc6 100644
> --- a/net/9p/trans_xen.c
> +++ b/net/9p/trans_xen.c
> @@ -312,6 +312,7 @@ static void xen_9pfs_front_remove(struct xenbus_device *dev)
> {
> struct xen_9pfs_front_priv *priv = dev_get_drvdata(&dev->dev);
>
> + xenbus_switch_state(dev, XenbusStateClosing);
I think you need to wait for the backend to acknowledge it by switching
its state, before continuing. See for example
xennet_remove()->xennet_bus_close() in drivers/net/xen-netfront.c.
> dev_set_drvdata(&dev->dev, NULL);
> xen_9pfs_front_free(priv);
> }
> --
> 2.52.0
>
>
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
On 12/8/25 14:51, Ariadne Conill wrote:
> We need to do this so that we can signal to the other end that the
> device is being removed, so that it will release its claim on the
> underlying memory allocation. Otherwise releasing the grant-table
> entries is deferred resulting in a kernel oops since the pages have
> already been freed.
I don't think this is sufficient. The backend can simply refuse
to release the grants. The frontend needs to ensure that the pages
are not freed until the grant table entries are freed. Right now,
the backend can cause a use-after-free in the frontend, and my
understanding of the Xen Project's security policy is that this is
a security vulnerability in the frontend code.
My instinct is that the core Xen code should take a reference on
each page before granting it to another domain, and not release that
reference until the pages are no longer granted. This should prevent
any use-after-free problems if I understand Linux core MM correctly.
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Fixes: 71ebd71921e45 ("xen/9pfs: connect to the backend")
> Signed-off-by: Ariadne Conill <ariadne@ariadne.space>
> Signed-off-by: Alex Zenla <alex@edera.dev>
> ---
> net/9p/trans_xen.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
> index b9ff69c7522a..cde283c42dc6 100644
> --- a/net/9p/trans_xen.c
> +++ b/net/9p/trans_xen.c
> @@ -312,6 +312,7 @@ static void xen_9pfs_front_remove(struct xenbus_device *dev)
> {
> struct xen_9pfs_front_priv *priv = dev_get_drvdata(&dev->dev);
>
> + xenbus_switch_state(dev, XenbusStateClosing);
> dev_set_drvdata(&dev->dev, NULL);
> xen_9pfs_front_free(priv);
> }
--
Sincerely,
Demi Marie Obenour (she/her/hers)
On 09.12.25 11:41, Demi Marie Obenour wrote: > On 12/8/25 14:51, Ariadne Conill wrote: >> We need to do this so that we can signal to the other end that the >> device is being removed, so that it will release its claim on the >> underlying memory allocation. Otherwise releasing the grant-table >> entries is deferred resulting in a kernel oops since the pages have >> already been freed. > > I don't think this is sufficient. The backend can simply refuse > to release the grants. The frontend needs to ensure that the pages > are not freed until the grant table entries are freed. Right now, > the backend can cause a use-after-free in the frontend, and my > understanding of the Xen Project's security policy is that this is > a security vulnerability in the frontend code. > > My instinct is that the core Xen code should take a reference on > each page before granting it to another domain, and not release that > reference until the pages are no longer granted. This should prevent > any use-after-free problems if I understand Linux core MM correctly. I looked at this in detail now. I don't think we have a security bug right now, but the interfaces regarding granting pages to other domains should probably be reworked like you suggest. Currently it is the caller who needs to handle page references correctly, while this should be done by the grant handling (having to either issue get_page() or to pass NULL for the page pointer, in case you don't want the underlying page to be freed by gnttab_end_foreign_access(), is far from intuitive). Juergen
On 12/18/25 03:14, Jürgen Groß wrote: > On 09.12.25 11:41, Demi Marie Obenour wrote: >> On 12/8/25 14:51, Ariadne Conill wrote: >>> We need to do this so that we can signal to the other end that the >>> device is being removed, so that it will release its claim on the >>> underlying memory allocation. Otherwise releasing the grant-table >>> entries is deferred resulting in a kernel oops since the pages have >>> already been freed. >> >> I don't think this is sufficient. The backend can simply refuse >> to release the grants. The frontend needs to ensure that the pages >> are not freed until the grant table entries are freed. Right now, >> the backend can cause a use-after-free in the frontend, and my >> understanding of the Xen Project's security policy is that this is >> a security vulnerability in the frontend code. >> >> My instinct is that the core Xen code should take a reference on >> each page before granting it to another domain, and not release that >> reference until the pages are no longer granted. This should prevent >> any use-after-free problems if I understand Linux core MM correctly. > > I looked at this in detail now. > > I don't think we have a security bug right now, but the interfaces regarding > granting pages to other domains should probably be reworked like you suggest. > > Currently it is the caller who needs to handle page references correctly, > while this should be done by the grant handling (having to either issue > get_page() or to pass NULL for the page pointer, in case you don't want the > underlying page to be freed by gnttab_end_foreign_access(), is far from > intuitive). Unfortunately, I don't think this is going to work. Page refcounts are going away. Taking a reference on kmalloc()'d memory already runs into a VM_BUG_ON_FOLIO(). So the only reasonable approach is a callback that is run when the grant is no longer accessible by the backend. It could well be that this is another example of a bug in 9p that also affects virtio. More details about the memory management and virtio-9p situation can be found in the thread at <https://lore.kernel.org/lkml/20251210-virtio_trans_iter-1-1-92eee6d8b6db@codewreck.org/t/#u>. A much better API would allow the grantor of a page to revoke the grantee's access, with subsequent accesses by the grantee redirected to a scratch page. For PVH/HVM guests I expect this to be possible, but for PV guests it would require Xen to modify the guest's own page tables. My understanding is that this isn't allowed as it would break the guest. -- Sincerely, Demi Marie Obenour (she/her/hers)
Hello, > On Dec 9, 2025, at 2:43 AM, Demi Marie Obenour <demiobenour@gmail.com> wrote: > > On 12/8/25 14:51, Ariadne Conill wrote: >> We need to do this so that we can signal to the other end that the >> device is being removed, so that it will release its claim on the >> underlying memory allocation. Otherwise releasing the grant-table >> entries is deferred resulting in a kernel oops since the pages have >> already been freed. > > I don't think this is sufficient. The backend can simply refuse > to release the grants. The frontend needs to ensure that the pages > are not freed until the grant table entries are freed. Right now, > the backend can cause a use-after-free in the frontend, and my > understanding of the Xen Project's security policy is that this is > a security vulnerability in the frontend code. > > My instinct is that the core Xen code should take a reference on > each page before granting it to another domain, and not release that > reference until the pages are no longer granted. This should prevent > any use-after-free problems if I understand Linux core MM correctly. Yes, there are other issues in the 9p transport that are likely in play here. In our internal testing, we confirm this is not a full fix for hotplugging 9p transport devices, but no such claim of a complete fix has been made here or in the Matrix thread. However, this is one defect that is contributing to the overall hotplugging problem and should be merged regardless: if the driver isn’t telling the other side to disconnect, the other side will never release the grants to begin with. Ariadne
On 12/9/25 12:09, Ariadne Conill wrote: > Hello, > >> On Dec 9, 2025, at 2:43 AM, Demi Marie Obenour <demiobenour@gmail.com> wrote: >> >> On 12/8/25 14:51, Ariadne Conill wrote: >>> We need to do this so that we can signal to the other end that the >>> device is being removed, so that it will release its claim on the >>> underlying memory allocation. Otherwise releasing the grant-table >>> entries is deferred resulting in a kernel oops since the pages have >>> already been freed. >> >> I don't think this is sufficient. The backend can simply refuse >> to release the grants. The frontend needs to ensure that the pages >> are not freed until the grant table entries are freed. Right now, >> the backend can cause a use-after-free in the frontend, and my >> understanding of the Xen Project's security policy is that this is >> a security vulnerability in the frontend code. >> >> My instinct is that the core Xen code should take a reference on >> each page before granting it to another domain, and not release that >> reference until the pages are no longer granted. This should prevent >> any use-after-free problems if I understand Linux core MM correctly. > > Yes, there are other issues in the 9p transport that are likely in play here. In our internal testing, we confirm this is not a full fix for hotplugging 9p transport devices, but no such claim of a complete fix has been made here or in the Matrix thread. > > However, this is one defect that is contributing to the overall hotplugging problem and should be merged regardless: if the driver isn’t telling the other side to disconnect, the other side will never release the grants to begin with. > > Ariadne I definitely agree that this should be merged! Is this code-path triggerable by the backend at will, or only during teardown by the toolstack? -- Sincerely, Demi Marie Obenour (she/her/hers)
Hello, > On Dec 9, 2025, at 9:12 AM, Demi Marie Obenour <demiobenour@gmail.com> wrote: > > On 12/9/25 12:09, Ariadne Conill wrote: >> Hello, >> >>>> On Dec 9, 2025, at 2:43 AM, Demi Marie Obenour <demiobenour@gmail.com> wrote: >>> >>> On 12/8/25 14:51, Ariadne Conill wrote: >>>> We need to do this so that we can signal to the other end that the >>>> device is being removed, so that it will release its claim on the >>>> underlying memory allocation. Otherwise releasing the grant-table >>>> entries is deferred resulting in a kernel oops since the pages have >>>> already been freed. >>> >>> I don't think this is sufficient. The backend can simply refuse >>> to release the grants. The frontend needs to ensure that the pages >>> are not freed until the grant table entries are freed. Right now, >>> the backend can cause a use-after-free in the frontend, and my >>> understanding of the Xen Project's security policy is that this is >>> a security vulnerability in the frontend code. >>> >>> My instinct is that the core Xen code should take a reference on >>> each page before granting it to another domain, and not release that >>> reference until the pages are no longer granted. This should prevent >>> any use-after-free problems if I understand Linux core MM correctly. >> >> Yes, there are other issues in the 9p transport that are likely in play here. In our internal testing, we confirm this is not a full fix for hotplugging 9p transport devices, but no such claim of a complete fix has been made here or in the Matrix thread. >> >> However, this is one defect that is contributing to the overall hotplugging problem and should be merged regardless: if the driver isn’t telling the other side to disconnect, the other side will never release the grants to begin with. >> >> Ariadne > > I definitely agree that this should be merged! > > Is this code-path triggerable by the backend at will, or only during > teardown by the toolstack? In practice it only happens when the toolstack tears down a 9p transport. In theory it can happen in any situation where the backend decides to tear down the transport, which may or may not actually be mediated by the toolstack. Ariadne
© 2016 - 2026 Red Hat, Inc.