xattr: rework simple xattrs and support user.* xattrs on sockets

[PATCH 00/14] xattr: rework simple xattrs and support user.* xattrs on sockets

Posted by Christian Brauner 1 month ago

Hey,

This reworks the simple_xattr infrastructure and adds support for
user.* extended attributes on sockets.

The simple_xattr subsystem currently uses an rbtree protected by a
reader-writer spinlock. This series replaces the rbtree with an
rhashtable giving O(1) average-case lookup with RCU-based lockless
reads. This sped up concurrent access patterns on tmpfs quite a bit and
it's an overall easy enough conversion to do and gets rid or rwlock_t.

The conversion is done incrementally: a new rhashtable path is added
alongside the existing rbtree, consumers are migrated one at a time
(shmem, kernfs, pidfs), and then the rbtree code is removed. All three
consumers switch from embedded structs to pointer-based lazy allocation
so the rhashtable overhead is only paid for inodes that actually use
xattrs.

With this infrastructure in place the series adds support for user.*
xattrs on sockets. Path-based AF_UNIX sockets inherit xattr support
from the underlying filesystem (e.g. tmpfs) but sockets in sockfs -
that is everything created via socket() including abstract namespace
AF_UNIX sockets - had no xattr support at all.

The xattr_permission() checks are reworked to allow user.* xattrs on
S_IFSOCK inodes. Sockfs sockets get per-inode limits of 128 xattrs and
128KB total value size matching the limits already in use for kernfs.

The practical motivation comes from several directions. systemd and
GNOME are expanding their use of Varlink as an IPC mechanism. For D-Bus
there are tools like dbus-monitor that can observe IPC traffic across
the system but this only works because D-Bus has a central broker. For
Varlink there is no broker and there is currently no way to identify
which sockets speak Varlink. With user.* xattrs on sockets a service
can label its socket with the IPC protocol it speaks (e.g.,
user.varlink=1) and an eBPF program can then selectively capture
traffic on those sockets. Enumerating bound sockets via netlink combined
with these xattr labels gives a way to discover all Varlink IPC
entrypoints for debugging and introspection.

Similarly, systemd-journald wants to use xattrs on the /dev/log socket
for protocol negotiation to indicate whether RFC 5424 structured syslog
is supported or whether only the legacy RFC 3164 format should be used.

In containers these labels are particularly useful as high-privilege or
more complicated solutions for socket identification aren't available.

The series comes with comprehensive selftests covering path-based
AF_UNIX sockets, sockfs socket operations, per-inode limit enforcement,
and xattr operations across multiple address families (AF_INET,
AF_INET6, AF_NETLINK, AF_PACKET).

Christian

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Christian Brauner (14):
      xattr: add rcu_head and rhash_head to struct simple_xattr
      xattr: add rhashtable-based simple_xattr infrastructure
      shmem: adapt to rhashtable-based simple_xattrs with lazy allocation
      kernfs: adapt to rhashtable-based simple_xattrs with lazy allocation
      pidfs: adapt to rhashtable-based simple_xattrs
      xattr: remove rbtree-based simple_xattr infrastructure
      xattr: add xattr_permission_error()
      xattr: switch xattr_permission() to switch statement
      xattr: move user limits for xattrs to generic infra
      xattr,net: support limited amount of extended attributes on sockfs sockets
      xattr: support extended attributes on sockets
      selftests/xattr: path-based AF_UNIX socket xattr tests
      selftests/xattr: sockfs socket xattr tests
      selftests/xattr: test xattrs on various socket families

 fs/kernfs/dir.c                                    |  15 +-
 fs/kernfs/inode.c                                  |  99 +----
 fs/kernfs/kernfs-internal.h                        |   5 +-
 fs/pidfs.c                                         |  65 +--
 fs/xattr.c                                         | 423 +++++++++++++------
 include/linux/kernfs.h                             |   2 -
 include/linux/shmem_fs.h                           |   2 +-
 include/linux/xattr.h                              |  47 ++-
 mm/shmem.c                                         |  46 +-
 net/socket.c                                       | 119 ++++--
 .../testing/selftests/filesystems/xattr/.gitignore |   3 +
 tools/testing/selftests/filesystems/xattr/Makefile |   6 +
 .../filesystems/xattr/xattr_socket_test.c          | 470 +++++++++++++++++++++
 .../filesystems/xattr/xattr_socket_types_test.c    | 177 ++++++++
 .../filesystems/xattr/xattr_sockfs_test.c          | 363 ++++++++++++++++
 15 files changed, 1547 insertions(+), 295 deletions(-)
---
base-commit: 72c395024dac5e215136cbff793455f065603b06
change-id: 20260211-work-xattr-socket-c85f4d3b8847

Re: [PATCH 00/14] xattr: rework simple xattrs and support user.* xattrs on sockets

Posted by Jeff Layton 2 weeks, 1 day ago

On Mon, 2026-02-16 at 14:31 +0100, Christian Brauner wrote:
> Hey,
> 
> This reworks the simple_xattr infrastructure and adds support for
> user.* extended attributes on sockets.
> 
> The simple_xattr subsystem currently uses an rbtree protected by a
> reader-writer spinlock. This series replaces the rbtree with an
> rhashtable giving O(1) average-case lookup with RCU-based lockless
> reads. This sped up concurrent access patterns on tmpfs quite a bit and
> it's an overall easy enough conversion to do and gets rid or rwlock_t.
> 
> The conversion is done incrementally: a new rhashtable path is added
> alongside the existing rbtree, consumers are migrated one at a time
> (shmem, kernfs, pidfs), and then the rbtree code is removed. All three
> consumers switch from embedded structs to pointer-based lazy allocation
> so the rhashtable overhead is only paid for inodes that actually use
> xattrs.
> 
> With this infrastructure in place the series adds support for user.*
> xattrs on sockets. Path-based AF_UNIX sockets inherit xattr support
> from the underlying filesystem (e.g. tmpfs) but sockets in sockfs -
> that is everything created via socket() including abstract namespace
> AF_UNIX sockets - had no xattr support at all.
> 
> The xattr_permission() checks are reworked to allow user.* xattrs on
> S_IFSOCK inodes. Sockfs sockets get per-inode limits of 128 xattrs and
> 128KB total value size matching the limits already in use for kernfs.
> 
> The practical motivation comes from several directions. systemd and
> GNOME are expanding their use of Varlink as an IPC mechanism. For D-Bus
> there are tools like dbus-monitor that can observe IPC traffic across
> the system but this only works because D-Bus has a central broker. For
> Varlink there is no broker and there is currently no way to identify
> which sockets speak Varlink. With user.* xattrs on sockets a service
> can label its socket with the IPC protocol it speaks (e.g.,
> user.varlink=1) and an eBPF program can then selectively capture
> traffic on those sockets. Enumerating bound sockets via netlink combined
> with these xattr labels gives a way to discover all Varlink IPC
> entrypoints for debugging and introspection.
> 
> Similarly, systemd-journald wants to use xattrs on the /dev/log socket
> for protocol negotiation to indicate whether RFC 5424 structured syslog
> is supported or whether only the legacy RFC 3164 format should be used.
> 
> In containers these labels are particularly useful as high-privilege or
> more complicated solutions for socket identification aren't available.
> 
> The series comes with comprehensive selftests covering path-based
> AF_UNIX sockets, sockfs socket operations, per-inode limit enforcement,
> and xattr operations across multiple address families (AF_INET,
> AF_INET6, AF_NETLINK, AF_PACKET).
> 
> Christian
> 
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---
> Christian Brauner (14):
>       xattr: add rcu_head and rhash_head to struct simple_xattr
>       xattr: add rhashtable-based simple_xattr infrastructure
>       shmem: adapt to rhashtable-based simple_xattrs with lazy allocation
>       kernfs: adapt to rhashtable-based simple_xattrs with lazy allocation
>       pidfs: adapt to rhashtable-based simple_xattrs
>       xattr: remove rbtree-based simple_xattr infrastructure
>       xattr: add xattr_permission_error()
>       xattr: switch xattr_permission() to switch statement
>       xattr: move user limits for xattrs to generic infra
>       xattr,net: support limited amount of extended attributes on sockfs sockets
>       xattr: support extended attributes on sockets
>       selftests/xattr: path-based AF_UNIX socket xattr tests
>       selftests/xattr: sockfs socket xattr tests
>       selftests/xattr: test xattrs on various socket families
> 
>  fs/kernfs/dir.c                                    |  15 +-
>  fs/kernfs/inode.c                                  |  99 +----
>  fs/kernfs/kernfs-internal.h                        |   5 +-
>  fs/pidfs.c                                         |  65 +--
>  fs/xattr.c                                         | 423 +++++++++++++------
>  include/linux/kernfs.h                             |   2 -
>  include/linux/shmem_fs.h                           |   2 +-
>  include/linux/xattr.h                              |  47 ++-
>  mm/shmem.c                                         |  46 +-
>  net/socket.c                                       | 119 ++++--
>  .../testing/selftests/filesystems/xattr/.gitignore |   3 +
>  tools/testing/selftests/filesystems/xattr/Makefile |   6 +
>  .../filesystems/xattr/xattr_socket_test.c          | 470 +++++++++++++++++++++
>  .../filesystems/xattr/xattr_socket_types_test.c    | 177 ++++++++
>  .../filesystems/xattr/xattr_sockfs_test.c          | 363 ++++++++++++++++
>  15 files changed, 1547 insertions(+), 295 deletions(-)
> ---
> base-commit: 72c395024dac5e215136cbff793455f065603b06
> change-id: 20260211-work-xattr-socket-c85f4d3b8847

Reviewed-by: Jeff Layton <jlayton@kernel.org>

Re: [PATCH 00/14] xattr: rework simple xattrs and support user.* xattrs on sockets

Posted by Darrick J. Wong 3 weeks, 6 days ago

On Mon, Feb 16, 2026 at 02:31:56PM +0100, Christian Brauner wrote:
> Hey,
> 
> This reworks the simple_xattr infrastructure and adds support for
> user.* extended attributes on sockets.
> 
> The simple_xattr subsystem currently uses an rbtree protected by a
> reader-writer spinlock. This series replaces the rbtree with an
> rhashtable giving O(1) average-case lookup with RCU-based lockless
> reads. This sped up concurrent access patterns on tmpfs quite a bit and
> it's an overall easy enough conversion to do and gets rid or rwlock_t.
> 
> The conversion is done incrementally: a new rhashtable path is added
> alongside the existing rbtree, consumers are migrated one at a time
> (shmem, kernfs, pidfs), and then the rbtree code is removed. All three
> consumers switch from embedded structs to pointer-based lazy allocation
> so the rhashtable overhead is only paid for inodes that actually use
> xattrs.

Patches 1-6 look ok to me, at least in the sense that nothing stood out
to me as obviously wrong, so
Acked-by: "Darrick J. Wong" <djwong@kernel.org>

> With this infrastructure in place the series adds support for user.*
> xattrs on sockets. Path-based AF_UNIX sockets inherit xattr support
> from the underlying filesystem (e.g. tmpfs) but sockets in sockfs -
> that is everything created via socket() including abstract namespace
> AF_UNIX sockets - had no xattr support at all.
> 
> The xattr_permission() checks are reworked to allow user.* xattrs on
> S_IFSOCK inodes. Sockfs sockets get per-inode limits of 128 xattrs and
> 128KB total value size matching the limits already in use for kernfs.
> 
> The practical motivation comes from several directions. systemd and
> GNOME are expanding their use of Varlink as an IPC mechanism. For D-Bus
> there are tools like dbus-monitor that can observe IPC traffic across
> the system but this only works because D-Bus has a central broker. For
> Varlink there is no broker and there is currently no way to identify

Hum.  I suppose there's never going to be a central varlink broker, is
there?  That doesn't sound great for discoverability, unless the plan is
to try to concentrate them in (say) /run/varlink?  But even then, could
you have N services that share the same otherwise private tmpfs in order
to talk to each other via a varlink socket?  I suppose in that case, the
N services probably don't care/want others to discover their socket.

> which sockets speak Varlink. With user.* xattrs on sockets a service
> can label its socket with the IPC protocol it speaks (e.g.,
> user.varlink=1) and an eBPF program can then selectively capture

Who gets to set xattrs?  Can a malicious varlink socket user who has
connect() abilities also delete user.varlink to mess with everyone who
comes afterwards?

--D

> traffic on those sockets. Enumerating bound sockets via netlink combined
> with these xattr labels gives a way to discover all Varlink IPC
> entrypoints for debugging and introspection.
> 
> Similarly, systemd-journald wants to use xattrs on the /dev/log socket
> for protocol negotiation to indicate whether RFC 5424 structured syslog
> is supported or whether only the legacy RFC 3164 format should be used.
> 
> In containers these labels are particularly useful as high-privilege or
> more complicated solutions for socket identification aren't available.
> 
> The series comes with comprehensive selftests covering path-based
> AF_UNIX sockets, sockfs socket operations, per-inode limit enforcement,
> and xattr operations across multiple address families (AF_INET,
> AF_INET6, AF_NETLINK, AF_PACKET).
> 
> Christian
> 
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---
> Christian Brauner (14):
>       xattr: add rcu_head and rhash_head to struct simple_xattr
>       xattr: add rhashtable-based simple_xattr infrastructure
>       shmem: adapt to rhashtable-based simple_xattrs with lazy allocation
>       kernfs: adapt to rhashtable-based simple_xattrs with lazy allocation
>       pidfs: adapt to rhashtable-based simple_xattrs
>       xattr: remove rbtree-based simple_xattr infrastructure
>       xattr: add xattr_permission_error()
>       xattr: switch xattr_permission() to switch statement
>       xattr: move user limits for xattrs to generic infra
>       xattr,net: support limited amount of extended attributes on sockfs sockets
>       xattr: support extended attributes on sockets
>       selftests/xattr: path-based AF_UNIX socket xattr tests
>       selftests/xattr: sockfs socket xattr tests
>       selftests/xattr: test xattrs on various socket families
> 
>  fs/kernfs/dir.c                                    |  15 +-
>  fs/kernfs/inode.c                                  |  99 +----
>  fs/kernfs/kernfs-internal.h                        |   5 +-
>  fs/pidfs.c                                         |  65 +--
>  fs/xattr.c                                         | 423 +++++++++++++------
>  include/linux/kernfs.h                             |   2 -
>  include/linux/shmem_fs.h                           |   2 +-
>  include/linux/xattr.h                              |  47 ++-
>  mm/shmem.c                                         |  46 +-
>  net/socket.c                                       | 119 ++++--
>  .../testing/selftests/filesystems/xattr/.gitignore |   3 +
>  tools/testing/selftests/filesystems/xattr/Makefile |   6 +
>  .../filesystems/xattr/xattr_socket_test.c          | 470 +++++++++++++++++++++
>  .../filesystems/xattr/xattr_socket_types_test.c    | 177 ++++++++
>  .../filesystems/xattr/xattr_sockfs_test.c          | 363 ++++++++++++++++
>  15 files changed, 1547 insertions(+), 295 deletions(-)
> ---
> base-commit: 72c395024dac5e215136cbff793455f065603b06
> change-id: 20260211-work-xattr-socket-c85f4d3b8847
> 
>

Re: [PATCH 00/14] xattr: rework simple xattrs and support user.* xattrs on sockets

Posted by Christian Brauner 3 weeks, 5 days ago

On Thu, Feb 19, 2026 at 04:44:54PM -0800, Darrick J. Wong wrote:
> On Mon, Feb 16, 2026 at 02:31:56PM +0100, Christian Brauner wrote:
> > Hey,
> > 
> > This reworks the simple_xattr infrastructure and adds support for
> > user.* extended attributes on sockets.
> > 
> > The simple_xattr subsystem currently uses an rbtree protected by a
> > reader-writer spinlock. This series replaces the rbtree with an
> > rhashtable giving O(1) average-case lookup with RCU-based lockless
> > reads. This sped up concurrent access patterns on tmpfs quite a bit and
> > it's an overall easy enough conversion to do and gets rid or rwlock_t.
> > 
> > The conversion is done incrementally: a new rhashtable path is added
> > alongside the existing rbtree, consumers are migrated one at a time
> > (shmem, kernfs, pidfs), and then the rbtree code is removed. All three
> > consumers switch from embedded structs to pointer-based lazy allocation
> > so the rhashtable overhead is only paid for inodes that actually use
> > xattrs.
> 
> Patches 1-6 look ok to me, at least in the sense that nothing stood out
> to me as obviously wrong, so
> Acked-by: "Darrick J. Wong" <djwong@kernel.org>
> 
> > With this infrastructure in place the series adds support for user.*
> > xattrs on sockets. Path-based AF_UNIX sockets inherit xattr support
> > from the underlying filesystem (e.g. tmpfs) but sockets in sockfs -
> > that is everything created via socket() including abstract namespace
> > AF_UNIX sockets - had no xattr support at all.
> > 
> > The xattr_permission() checks are reworked to allow user.* xattrs on
> > S_IFSOCK inodes. Sockfs sockets get per-inode limits of 128 xattrs and
> > 128KB total value size matching the limits already in use for kernfs.
> > 
> > The practical motivation comes from several directions. systemd and
> > GNOME are expanding their use of Varlink as an IPC mechanism. For D-Bus
> > there are tools like dbus-monitor that can observe IPC traffic across
> > the system but this only works because D-Bus has a central broker. For
> > Varlink there is no broker and there is currently no way to identify
> 
> Hum.  I suppose there's never going to be a central varlink broker, is
> there?  That doesn't sound great for discoverability, unless the plan is

Varlink was explicitly designed to avoid having to have a broker.
Practically it would have been one option to have a a central registry
maintained as a bpf socket map. My naive take had always been something
like: systemd can have a global socket map. sockets are picked up
whenver the appropriate xattr is set and deleted from the map once the
socket goes away (or the xattr is unset). Right now this is something
that would require capabilities. Once signed bpf is more common it is
easy to load that on per-container basis. But...

> to try to concentrate them in (say) /run/varlink?  But even then, could

... the future is already here :)

  https://github.com/systemd/systemd/pull/40590

All public varlink services that are supposed to be announced are now
symlinked into:

  /run/varlink/registry

There are of-course non-public interfaces such as the interface
between PID 1 and oomd. Such interfaces are not exposed.

It's also possible to have per user registries at e.g.:

  /run/user/1000/varlink/registry/

Such varlink services can now also be listed via:

  valinkctl list-services

This then ties very neatly into the varlink bridge we're currently
building:

  https://github.com/mvo5/varlink-http-bridge

It takes a directory with varlink sockets (or symlinks to varlink
sockets) like /run/varlink/registry as the argument and will serve
whatever it finds in there. Sockets can be added or removed dynamically
in the dir as needed:

  curl -s http://localhost:8080/sockets | jq
  {
    "sockets": [
      "io.systemd.Login",
      "io.systemd.Hostname",
      "io.systemd.sysext",
      "io.systemd.BootControl",
      "io.systemd.Import",
      "io.systemd.Repart",
      "io.systemd.MuteConsole",
      "io.systemd.FactoryReset",
      "io.systemd.Credentials",
      "io.systemd.AskPassword",
      "io.systemd.Manager",
      "io.systemd.ManagedOOM"
    ]
  }

The xattrs allow to have a completely global view of such services and
the per-user sessions all have their own sub-view.

> you have N services that share the same otherwise private tmpfs in order
> to talk to each other via a varlink socket?  I suppose in that case, the

Yeah sure that's one way.

> N services probably don't care/want others to discover their socket.
> 
> > which sockets speak Varlink. With user.* xattrs on sockets a service
> > can label its socket with the IPC protocol it speaks (e.g.,
> > user.varlink=1) and an eBPF program can then selectively capture
> 
> Who gets to set xattrs?  Can a malicious varlink socket user who has
> connect() abilities also delete user.varlink to mess with everyone who
> comes afterwards?

The main focus is AF_UNIX sockets of course so a varlink service does:

  fd = socket(AF_UNIX)
  umask(0117);
  bind(fd, "/run/foobar");
  umask(original_umask);
  chown("/run/foobar", -1, MYACCESSGID);
  setxattr("/run/foobar", "user.varlink", "1");

For non-path based sockets the inodes for client and server are
inherently distinct so they cannot interfer with each other. But even
then a chmod() + chown(-1, MYACCESSGID) on the sockfs socket fd will
protect this.

Thanks for the review. Please keep going. :)

Re: [PATCH 00/14] xattr: rework simple xattrs and support user.* xattrs on sockets

Posted by Darrick J. Wong 3 weeks, 5 days ago

On Fri, Feb 20, 2026 at 10:23:55AM +0100, Christian Brauner wrote:
> On Thu, Feb 19, 2026 at 04:44:54PM -0800, Darrick J. Wong wrote:
> > On Mon, Feb 16, 2026 at 02:31:56PM +0100, Christian Brauner wrote:
> > > Hey,
> > > 
> > > This reworks the simple_xattr infrastructure and adds support for
> > > user.* extended attributes on sockets.
> > > 
> > > The simple_xattr subsystem currently uses an rbtree protected by a
> > > reader-writer spinlock. This series replaces the rbtree with an
> > > rhashtable giving O(1) average-case lookup with RCU-based lockless
> > > reads. This sped up concurrent access patterns on tmpfs quite a bit and
> > > it's an overall easy enough conversion to do and gets rid or rwlock_t.
> > > 
> > > The conversion is done incrementally: a new rhashtable path is added
> > > alongside the existing rbtree, consumers are migrated one at a time
> > > (shmem, kernfs, pidfs), and then the rbtree code is removed. All three
> > > consumers switch from embedded structs to pointer-based lazy allocation
> > > so the rhashtable overhead is only paid for inodes that actually use
> > > xattrs.
> > 
> > Patches 1-6 look ok to me, at least in the sense that nothing stood out
> > to me as obviously wrong, so
> > Acked-by: "Darrick J. Wong" <djwong@kernel.org>
> > 
> > > With this infrastructure in place the series adds support for user.*
> > > xattrs on sockets. Path-based AF_UNIX sockets inherit xattr support
> > > from the underlying filesystem (e.g. tmpfs) but sockets in sockfs -
> > > that is everything created via socket() including abstract namespace
> > > AF_UNIX sockets - had no xattr support at all.
> > > 
> > > The xattr_permission() checks are reworked to allow user.* xattrs on
> > > S_IFSOCK inodes. Sockfs sockets get per-inode limits of 128 xattrs and
> > > 128KB total value size matching the limits already in use for kernfs.
> > > 
> > > The practical motivation comes from several directions. systemd and
> > > GNOME are expanding their use of Varlink as an IPC mechanism. For D-Bus
> > > there are tools like dbus-monitor that can observe IPC traffic across
> > > the system but this only works because D-Bus has a central broker. For
> > > Varlink there is no broker and there is currently no way to identify
> > 
> > Hum.  I suppose there's never going to be a central varlink broker, is
> > there?  That doesn't sound great for discoverability, unless the plan is
> 
> Varlink was explicitly designed to avoid having to have a broker.
> Practically it would have been one option to have a a central registry
> maintained as a bpf socket map. My naive take had always been something
> like: systemd can have a global socket map. sockets are picked up
> whenver the appropriate xattr is set and deleted from the map once the
> socket goes away (or the xattr is unset). Right now this is something
> that would require capabilities. Once signed bpf is more common it is
> easy to load that on per-container basis. But...
> 
> > to try to concentrate them in (say) /run/varlink?  But even then, could
> 
> ... the future is already here :)
> 
>   https://github.com/systemd/systemd/pull/40590
> 
> All public varlink services that are supposed to be announced are now
> symlinked into:
> 
>   /run/varlink/registry
> 
> There are of-course non-public interfaces such as the interface
> between PID 1 and oomd. Such interfaces are not exposed.
> 
> It's also possible to have per user registries at e.g.:
> 
>   /run/user/1000/varlink/registry/
> 
> Such varlink services can now also be listed via:
> 
>   valinkctl list-services
> 
> This then ties very neatly into the varlink bridge we're currently
> building:
> 
>   https://github.com/mvo5/varlink-http-bridge
> 
> It takes a directory with varlink sockets (or symlinks to varlink
> sockets) like /run/varlink/registry as the argument and will serve
> whatever it finds in there. Sockets can be added or removed dynamically
> in the dir as needed:
> 
>   curl -s http://localhost:8080/sockets | jq
>   {
>     "sockets": [
>       "io.systemd.Login",
>       "io.systemd.Hostname",
>       "io.systemd.sysext",
>       "io.systemd.BootControl",
>       "io.systemd.Import",
>       "io.systemd.Repart",
>       "io.systemd.MuteConsole",
>       "io.systemd.FactoryReset",
>       "io.systemd.Credentials",
>       "io.systemd.AskPassword",
>       "io.systemd.Manager",
>       "io.systemd.ManagedOOM"
>     ]
>   }
> 
> The xattrs allow to have a completely global view of such services and
> the per-user sessions all have their own sub-view.
> 
> > you have N services that share the same otherwise private tmpfs in order
> > to talk to each other via a varlink socket?  I suppose in that case, the
> 
> Yeah sure that's one way.
> 
> > N services probably don't care/want others to discover their socket.
> > 
> > > which sockets speak Varlink. With user.* xattrs on sockets a service
> > > can label its socket with the IPC protocol it speaks (e.g.,
> > > user.varlink=1) and an eBPF program can then selectively capture
> > 
> > Who gets to set xattrs?  Can a malicious varlink socket user who has
> > connect() abilities also delete user.varlink to mess with everyone who
> > comes afterwards?
> 
> The main focus is AF_UNIX sockets of course so a varlink service does:
> 
>   fd = socket(AF_UNIX)
>   umask(0117);
>   bind(fd, "/run/foobar");
>   umask(original_umask);
>   chown("/run/foobar", -1, MYACCESSGID);
>   setxattr("/run/foobar", "user.varlink", "1");
> 
> For non-path based sockets the inodes for client and server are
> inherently distinct so they cannot interfer with each other. But even
> then a chmod() + chown(-1, MYACCESSGID) on the sockfs socket fd will
> protect this.
> 
> Thanks for the review. Please keep going. :)

The rest look fine too, modulo my comments about the fixed limits.

Acked-by: "Darrick J. Wong" <djwong@kernel.org>

--D