BLOG: per-task logging contexts with Ceph consumer

[RFC PATCH 0/5] BLOG: per-task logging contexts with Ceph consumer

Posted by Alex Markuze 3 months, 2 weeks ago

Motivation: improve observability in production by providing subsystemsawith 
a logger that keeps up with their verbouse unstructured logs and aggregating
logs at the process context level, akin to userspace TLS. 

Binary LOGging (BLOG) introduces a task-local logging context: each context
owns a single 512 KiB fragment that cycles through “ready → in use → queued for
readers → reset → ready” without re-entering the allocator. Writers copy the
raw parameters they already have; readers format them later when the log is
inspected.

BLOG borrows ideas from ftrace (captureabinary  data now, format later) but 
unlike ftrace there is no global ring. Each module registers its own logger,
manages its own buffers, and keeps the state small enough for production use.

To host the per-module pointers we extend `struct task_struct` with one
additional `void *`, in line with other task extensions already in the kernel.
Each module keeps independent batches: `alloc_batch` for contexts with
refcount 0 and `log_batch` for contexts that have been filled and are waiting
for readers. The batching layer and buffer management were migrated from the
existing Ceph SAN logging code, so the behaviour is battle-tested; we simply
made the buffer inline so every composite stays within a single 512 KiB
allocation.

The patch series lands the BLOG library first, then wires the task lifecycle,
and finally switches Ceph’s `bout*` logging macros to BLOG so we exercise the
new path.

Patch summary:
  1. sched, fork: wire BLOG contexts into task lifecycle
     - Adds `struct blog_tls_ctx *blog_contexts[BLOG_MAX_MODULES]` to
       `struct task_struct`.
     - Fork/exit paths initialise and recycle contexts automatically.

  2. lib: introduce BLOG (Binary LOGging) subsystem
     - Adds `lib/blog/` sources and headers under `include/linux/blog/`.
     - Each composite (`struct blog_tls_pagefrag`) consists of the TLS
       metadata, the pagefrag state, and an inline buffer sized at
       `BLOG_PAGEFRAG_SIZE - sizeof(struct blog_tls_pagefrag)`.

  3. ceph: add BLOG scaffolding
     - Introduces `include/linux/ceph/ceph_blog.h` and `fs/ceph/blog_client.c`.
     - Ceph registers a logger and maintains a client-ID map for the reader
       callback.

  4. ceph: add BLOG debugfs support
     - Adds `fs/ceph/blog_debugfs.c` so filled contexts can be drained.

  5. ceph: activate BLOG logging
     - Switches `bout*` macros to BLOG, making Ceph the first consumer.

With these patches, Ceph now writes its verbose logging to task-local buffers
managed by BLOG, and the infrastructure is ready for other subsystems that need
allocation-free, module-owned logging.

Alex Markuze (5):
  sched, fork: Wire BLOG contexts into task lifecycle
  lib: Introduce BLOG (Binary LOGging) subsystem
  ceph: Add BLOG scaffolding
  ceph: Add BLOG debugfs support
  ceph: Activate BLOG logging

 fs/ceph/Makefile                   |   2 +
 fs/ceph/addr.c                     | 130 ++---
 fs/ceph/blog_client.c              | 244 +++++++++
 fs/ceph/blog_debugfs.c             | 361 +++++++++++++
 fs/ceph/caps.c                     | 242 ++++-----
 fs/ceph/crypto.c                   |  18 +-
 fs/ceph/debugfs.c                  |  33 +-
 fs/ceph/dir.c                      |  88 ++--
 fs/ceph/export.c                   |  20 +-
 fs/ceph/file.c                     | 130 ++---
 fs/ceph/inode.c                    | 182 +++----
 fs/ceph/ioctl.c                    |   6 +-
 fs/ceph/locks.c                    |  22 +-
 fs/ceph/mds_client.c               | 278 +++++-----
 fs/ceph/mdsmap.c                   |   8 +-
 fs/ceph/quota.c                    |   2 +-
 fs/ceph/snap.c                     |  66 +--
 fs/ceph/super.c                    |  82 +--
 fs/ceph/xattr.c                    |  42 +-
 include/linux/blog/blog.h          | 515 +++++++++++++++++++
 include/linux/blog/blog_batch.h    |  54 ++
 include/linux/blog/blog_des.h      |  46 ++
 include/linux/blog/blog_module.h   | 329 ++++++++++++
 include/linux/blog/blog_pagefrag.h |  33 ++
 include/linux/blog/blog_ser.h      | 275 ++++++++++
 include/linux/ceph/ceph_blog.h     | 124 +++++
 include/linux/ceph/ceph_debug.h    |   6 +-
 include/linux/sched.h              |   7 +
 kernel/fork.c                      |  37 ++
 lib/Kconfig                        |   2 +
 lib/Makefile                       |   2 +
 lib/blog/Kconfig                   |  56 +++
 lib/blog/Makefile                  |  15 +
 lib/blog/blog_batch.c              | 311 ++++++++++++
 lib/blog/blog_core.c               | 772 ++++++++++++++++++++++++++++
 lib/blog/blog_des.c                | 385 ++++++++++++++
 lib/blog/blog_module.c             | 781 +++++++++++++++++++++++++++++
 lib/blog/blog_pagefrag.c           | 124 +++++
 38 files changed, 5163 insertions(+), 667 deletions(-)
 create mode 100644 fs/ceph/blog_client.c
 create mode 100644 fs/ceph/blog_debugfs.c
 create mode 100644 include/linux/blog/blog.h
 create mode 100644 include/linux/blog/blog_batch.h
 create mode 100644 include/linux/blog/blog_des.h
 create mode 100644 include/linux/blog/blog_module.h
 create mode 100644 include/linux/blog/blog_pagefrag.h
 create mode 100644 include/linux/blog/blog_ser.h
 create mode 100644 include/linux/ceph/ceph_blog.h
 create mode 100644 lib/blog/Kconfig
 create mode 100644 lib/blog/Makefile
 create mode 100644 lib/blog/blog_batch.c
 create mode 100644 lib/blog/blog_core.c
 create mode 100644 lib/blog/blog_des.c
 create mode 100644 lib/blog/blog_module.c
 create mode 100644 lib/blog/blog_pagefrag.c

-- 
2.34.1

Re: [RFC PATCH 0/5] BLOG: per-task logging contexts with Ceph consumer

Posted by Steven Rostedt 3 months, 2 weeks ago

On Fri, 24 Oct 2025 08:42:54 +0000
Alex Markuze <amarkuze@redhat.com> wrote:

> Motivation: improve observability in production by providing subsystemsawith 
> a logger that keeps up with their verbouse unstructured logs and aggregating
> logs at the process context level, akin to userspace TLS. 
> 

I still don't understand the motivation behind this.

What exactly is this doing that the current tracing infrastructure can't do?

-- Steve

Re: [RFC PATCH 0/5] BLOG: per-task logging contexts with Ceph consumer

Posted by Alex Markuze 3 months, 2 weeks ago

First of all, Ftrace is for debugging and development; you won't see
components or kernel modules run in production with ftrace enabled.
The main motivation is to have verbose logging that is usable for
production systems.
The second improvement is that the logs have a struct task hook which
facilitates better logging association between the kernel log and the
user process.
It's especially handy when debugging FS systems.

Specifically we had several bugs reported from the field that we could
not make progress on without additional logs.

Re: MM folks, apologies for including unrelated people, the only
change is the addition of a field in struct task.

On Fri, Oct 24, 2025 at 8:52 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 24 Oct 2025 08:42:54 +0000
> Alex Markuze <amarkuze@redhat.com> wrote:
>
> > Motivation: improve observability in production by providing subsystemsawith
> > a logger that keeps up with their verbouse unstructured logs and aggregating
> > logs at the process context level, akin to userspace TLS.
> >
>
> I still don't understand the motivation behind this.
>
> What exactly is this doing that the current tracing infrastructure can't do?
>
> -- Steve
>

Re: [RFC PATCH 0/5] BLOG: per-task logging contexts with Ceph consumer

Posted by Steven Rostedt 3 months, 2 weeks ago

On Sat, 25 Oct 2025 13:50:39 +0300
Alex Markuze <amarkuze@redhat.com> wrote:

> First of all, Ftrace is for debugging and development; you won't see
> components or kernel modules run in production with ftrace enabled.
> The main motivation is to have verbose logging that is usable for
> production systems.

That is totally untrue. Several production environments use ftrace. We
have it enabled and used in Chromebooks and in Android. Google servers
also have it enabled.

> The second improvement is that the logs have a struct task hook which
> facilitates better logging association between the kernel log and the
> user process.
> It's especially handy when debugging FS systems.

So this is for use with debugging too?

> 
> Specifically we had several bugs reported from the field that we could
> not make progress on without additional logs.

This still doesn't answer my question about not using ftrace. Heck,
when I worked for Red Hat, we used ftrace to debug production
environments. Did that change?

-- Steve

Re: [RFC PATCH 0/5] BLOG: per-task logging contexts with Ceph consumer

Posted by Alex Markuze 3 months, 2 weeks ago

Please correct me if I am wrong, I was not aware that ftrace is used
by any kernel component as the default unstructured logger.
This is the point of BLog, having a low impact unstructured logger,
it's not always possible or easy to provide a debug kernel where
ftarce is both enabled and used for dumping logs.
Having an always-on binary logger facilitates better debuggability.
When anything happens, a client with BLog has the option to send a
large log file with their report.
An additional benefit is that each logging buffer is attached to the
associated tasks and the whole module has its own separate cyclical
log buffer.

On Sat, Oct 25, 2025 at 5:59 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Sat, 25 Oct 2025 13:50:39 +0300
> Alex Markuze <amarkuze@redhat.com> wrote:
>
> > First of all, Ftrace is for debugging and development; you won't see
> > components or kernel modules run in production with ftrace enabled.
> > The main motivation is to have verbose logging that is usable for
> > production systems.
>
> That is totally untrue. Several production environments use ftrace. We
> have it enabled and used in Chromebooks and in Android. Google servers
> also have it enabled.
>
>
> > The second improvement is that the logs have a struct task hook which
> > facilitates better logging association between the kernel log and the
> > user process.
> > It's especially handy when debugging FS systems.
>
> So this is for use with debugging too?
>
> >
> > Specifically we had several bugs reported from the field that we could
> > not make progress on without additional logs.
>
> This still doesn't answer my question about not using ftrace. Heck,
> when I worked for Red Hat, we used ftrace to debug production
> environments. Did that change?
>
> -- Steve
>

Re: [RFC PATCH 0/5] BLOG: per-task logging contexts with Ceph consumer

Posted by Steven Rostedt 3 months, 1 week ago

On Sat, 25 Oct 2025 20:54:00 +0300
Alex Markuze <amarkuze@redhat.com> wrote:

> Please correct me if I am wrong, I was not aware that ftrace is used
> by any kernel component as the default unstructured logger.
> This is the point of BLog, having a low impact unstructured logger,
> it's not always possible or easy to provide a debug kernel where
> ftarce is both enabled and used for dumping logs.
> Having an always-on binary logger facilitates better debuggability.
> When anything happens, a client with BLog has the option to send a
> large log file with their report.
> An additional benefit is that each logging buffer is attached to the
> associated tasks and the whole module has its own separate cyclical
> log buffer.

This looks like a very specific solution trying to be a little more generic.

The more generic a solution becomes, the more "bloated" it becomes as well.
That's the nature of tracers and loggers. Ftrace was designed to be very
generic, and yes, it can be more bloated because of that. But it is also
designed to be tuned down to be a highly efficient tracer. One that can be
used in a production environment. Sure, if you enable every event, it will
cause a noticeable overhead, but ftrace is designed to surgically pick
which events should be enabled or not, keeping the overhead within the
noise.

Ftrace is more of an "infrastructure" than a tool. It provides access to
trace almost every function , but you can use that same code to implement
live kernel patching or BPF hooks to functions. The trace event and
tracepoints are part of ftrace, and are used for things other than tracing.

Perhaps it may be more suitable to make BLOG use the tracefs interface,
then to create an entirely new interface based on debugfs (which BTW, a lot
of production systems do not enable debugfs which is why ftrace uses its
own tracefs that does not depend on it).

-- Steve

Re: [RFC PATCH 0/5] BLOG: per-task logging contexts with Ceph consumer

Posted by David Hildenbrand 3 months, 2 weeks ago

On 24.10.25 10:42, Alex Markuze wrote:
> Motivation: improve observability in production by providing subsystemsawith
> a logger that keeps up with their verbouse unstructured logs and aggregating
> logs at the process context level, akin to userspace TLS.
> 
> Binary LOGging (BLOG) introduces a task-local logging context: each context
> owns a single 512 KiB fragment that cycles through “ready → in use → queued for
> readers → reset → ready” without re-entering the allocator. Writers copy the
> raw parameters they already have; readers format them later when the log is
> inspected.
> 
> BLOG borrows ideas from ftrace (captureabinary  data now, format later) but
> unlike ftrace there is no global ring. Each module registers its own logger,
> manages its own buffers, and keeps the state small enough for production use.
> 
> To host the per-module pointers we extend `struct task_struct` with one
> additional `void *`, in line with other task extensions already in the kernel.
> Each module keeps independent batches: `alloc_batch` for contexts with
> refcount 0 and `log_batch` for contexts that have been filled and are waiting
> for readers. The batching layer and buffer management were migrated from the
> existing Ceph SAN logging code, so the behaviour is battle-tested; we simply
> made the buffer inline so every composite stays within a single 512 KiB
> allocation.
> 
> The patch series lands the BLOG library first, then wires the task lifecycle,
> and finally switches Ceph’s `bout*` logging macros to BLOG so we exercise the
> new path.
> 
> Patch summary:
>    1. sched, fork: wire BLOG contexts into task lifecycle
>       - Adds `struct blog_tls_ctx *blog_contexts[BLOG_MAX_MODULES]` to
>         `struct task_struct`.
>       - Fork/exit paths initialise and recycle contexts automatically.
> 
>    2. lib: introduce BLOG (Binary LOGging) subsystem
>       - Adds `lib/blog/` sources and headers under `include/linux/blog/`.
>       - Each composite (`struct blog_tls_pagefrag`) consists of the TLS
>         metadata, the pagefrag state, and an inline buffer sized at
>         `BLOG_PAGEFRAG_SIZE - sizeof(struct blog_tls_pagefrag)`.
> 
>    3. ceph: add BLOG scaffolding
>       - Introduces `include/linux/ceph/ceph_blog.h` and `fs/ceph/blog_client.c`.
>       - Ceph registers a logger and maintains a client-ID map for the reader
>         callback.
> 
>    4. ceph: add BLOG debugfs support
>       - Adds `fs/ceph/blog_debugfs.c` so filled contexts can be drained.
> 
>    5. ceph: activate BLOG logging
>       - Switches `bout*` macros to BLOG, making Ceph the first consumer.

Hi!

You CCed plenty of MM folks, and I am sure most of them observe "this 
doesn't seem to touch any core-mm files" and wonder "what's hiding in 
there that requires a pair of MM eyes".

Is there anything specific we should be looking at (and if so, in which 
patch)?

-- 
Cheers

David / dhildenb

Re: [RFC PATCH 0/5] BLOG: per-task logging contexts with Ceph consumer

Posted by Viacheslav Dubeyko 3 months, 1 week ago

On Fri, 2025-10-24 at 08:42 +0000, Alex Markuze wrote:

Probably, it make sense to consider it as a topic for LSF/MM/BPF conference.
Because, it could be not easy to convince people.

As far as I can see, from my point of view, the motivation doesn't contain
enough explanation of benefits, benchmarking results and comparison with already
existing infrastructures. The clear explanation of these points could be a good
step to convince people to try and to adopt the new infrastructure.

> Motivation: improve observability in production by providing subsystemsawith 

"subsystemsawith" -> subsystem with?

> a logger that keeps up with their verbouse unstructured logs and aggregating
> logs at the process context level, akin to userspace TLS. 
> 
> Binary LOGging (BLOG) introduces a task-local logging context: each context
> owns a single 512 KiB fragment that cycles through “ready → in use → queued for

Why exactly 512 KiB? Could it be increased/decreased? Any available optimization
parameters of infrastructure?

Could the infrastructure "eat" the whole memory if we have a lot tasks/cores? Do
we have any danger of introducing the system crashes because of BLOG subsystem's
memory requirements?

I assume that BLOG's 512 KiB fragment works as a circular buffer. Am I right
here? So, how long could be the recorded history of operations? Could new
records overwrite the information that needs for the issue analysis?

> readers → reset → ready” without re-entering the allocator. Writers copy the
> raw parameters they already have; readers format them later when the log is
> inspected.
> 
> BLOG borrows ideas from ftrace (captureabinary  data now, format later) but 

"captureabinary" -> capture a binary?

> unlike ftrace there is no global ring. Each module registers its own logger,
> manages its own buffers, and keeps the state small enough for production use.
> 
> To host the per-module pointers we extend `struct task_struct` with one
> additional `void *`, in line with other task extensions already in the kernel.
> Each module keeps independent batches: `alloc_batch` for contexts with
> refcount 0 and `log_batch` for contexts that have been filled and are waiting
> for readers. The batching layer and buffer management were migrated from the
> existing Ceph SAN logging code, so the behaviour is battle-tested; we simply

I am not completely following what do you mean by Ceph SAN logging code. Maybe,
it makes to share the link on it? 

> made the buffer inline so every composite stays within a single 512 KiB
> allocation.
> 
> The patch series lands the BLOG library first, then wires the task lifecycle,
> and finally switches Ceph’s `bout*` logging macros to BLOG so we exercise the

What do you mean by Ceph’s `bout*` logging macros? Do you mean 'dout*' here?

Thanks,
Slava.

> new path.
> 
> Patch summary:
>   1. sched, fork: wire BLOG contexts into task lifecycle
>      - Adds `struct blog_tls_ctx *blog_contexts[BLOG_MAX_MODULES]` to
>        `struct task_struct`.
>      - Fork/exit paths initialise and recycle contexts automatically.
> 
>   2. lib: introduce BLOG (Binary LOGging) subsystem
>      - Adds `lib/blog/` sources and headers under `include/linux/blog/`.
>      - Each composite (`struct blog_tls_pagefrag`) consists of the TLS
>        metadata, the pagefrag state, and an inline buffer sized at
>        `BLOG_PAGEFRAG_SIZE - sizeof(struct blog_tls_pagefrag)`.
> 
>   3. ceph: add BLOG scaffolding
>      - Introduces `include/linux/ceph/ceph_blog.h` and `fs/ceph/blog_client.c`.
>      - Ceph registers a logger and maintains a client-ID map for the reader
>        callback.
> 
>   4. ceph: add BLOG debugfs support
>      - Adds `fs/ceph/blog_debugfs.c` so filled contexts can be drained.
> 
>   5. ceph: activate BLOG logging
>      - Switches `bout*` macros to BLOG, making Ceph the first consumer.
> 
> With these patches, Ceph now writes its verbose logging to task-local buffers
> managed by BLOG, and the infrastructure is ready for other subsystems that need
> allocation-free, module-owned logging.
> 
> Alex Markuze (5):
>   sched, fork: Wire BLOG contexts into task lifecycle
>   lib: Introduce BLOG (Binary LOGging) subsystem
>   ceph: Add BLOG scaffolding
>   ceph: Add BLOG debugfs support
>   ceph: Activate BLOG logging
> 
>  fs/ceph/Makefile                   |   2 +
>  fs/ceph/addr.c                     | 130 ++---
>  fs/ceph/blog_client.c              | 244 +++++++++
>  fs/ceph/blog_debugfs.c             | 361 +++++++++++++
>  fs/ceph/caps.c                     | 242 ++++-----
>  fs/ceph/crypto.c                   |  18 +-
>  fs/ceph/debugfs.c                  |  33 +-
>  fs/ceph/dir.c                      |  88 ++--
>  fs/ceph/export.c                   |  20 +-
>  fs/ceph/file.c                     | 130 ++---
>  fs/ceph/inode.c                    | 182 +++----
>  fs/ceph/ioctl.c                    |   6 +-
>  fs/ceph/locks.c                    |  22 +-
>  fs/ceph/mds_client.c               | 278 +++++-----
>  fs/ceph/mdsmap.c                   |   8 +-
>  fs/ceph/quota.c                    |   2 +-
>  fs/ceph/snap.c                     |  66 +--
>  fs/ceph/super.c                    |  82 +--
>  fs/ceph/xattr.c                    |  42 +-
>  include/linux/blog/blog.h          | 515 +++++++++++++++++++
>  include/linux/blog/blog_batch.h    |  54 ++
>  include/linux/blog/blog_des.h      |  46 ++
>  include/linux/blog/blog_module.h   | 329 ++++++++++++
>  include/linux/blog/blog_pagefrag.h |  33 ++
>  include/linux/blog/blog_ser.h      | 275 ++++++++++
>  include/linux/ceph/ceph_blog.h     | 124 +++++
>  include/linux/ceph/ceph_debug.h    |   6 +-
>  include/linux/sched.h              |   7 +
>  kernel/fork.c                      |  37 ++
>  lib/Kconfig                        |   2 +
>  lib/Makefile                       |   2 +
>  lib/blog/Kconfig                   |  56 +++
>  lib/blog/Makefile                  |  15 +
>  lib/blog/blog_batch.c              | 311 ++++++++++++
>  lib/blog/blog_core.c               | 772 ++++++++++++++++++++++++++++
>  lib/blog/blog_des.c                | 385 ++++++++++++++
>  lib/blog/blog_module.c             | 781 +++++++++++++++++++++++++++++
>  lib/blog/blog_pagefrag.c           | 124 +++++
>  38 files changed, 5163 insertions(+), 667 deletions(-)
>  create mode 100644 fs/ceph/blog_client.c
>  create mode 100644 fs/ceph/blog_debugfs.c
>  create mode 100644 include/linux/blog/blog.h
>  create mode 100644 include/linux/blog/blog_batch.h
>  create mode 100644 include/linux/blog/blog_des.h
>  create mode 100644 include/linux/blog/blog_module.h
>  create mode 100644 include/linux/blog/blog_pagefrag.h
>  create mode 100644 include/linux/blog/blog_ser.h
>  create mode 100644 include/linux/ceph/ceph_blog.h
>  create mode 100644 lib/blog/Kconfig
>  create mode 100644 lib/blog/Makefile
>  create mode 100644 lib/blog/blog_batch.c
>  create mode 100644 lib/blog/blog_core.c
>  create mode 100644 lib/blog/blog_des.c
>  create mode 100644 lib/blog/blog_module.c
>  create mode 100644 lib/blog/blog_pagefrag.c