[RFC PATCH 0/2] livepatch: Add support for hybrid mode

Yafang Shao posted 2 patches 1 year ago
include/linux/livepatch.h |  2 ++
kernel/livepatch/core.c   | 50 +++++++++++++++++++++++++++++++++++++++
2 files changed, 52 insertions(+)
[RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Yafang Shao 1 year ago
The atomic replace livepatch mechanism was introduced to handle scenarios
where we want to unload a specific livepatch without unloading others.
However, its current implementation has significant shortcomings, making
it less than ideal in practice. Below are the key downsides:

- It is expensive

  During testing with frequent replacements of an old livepatch, random RCU
  warnings were observed:

  [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
  [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
  [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
  [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
  [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
  [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
  [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
  [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
  [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
  [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
  [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
  
  This indicates that atomic replacement can cause performance issues,
  particularly with RCU synchronization under frequent use.

- Potential Risks During Replacement 

  One known issue involves replacing livepatched versions of critical
  functions such as do_exit(). During the replacement process, a panic
  might occur, as highlighted in [0]. Other potential risks may also arise
  due to inconsistencies or race conditions during transitions.

- Temporary Loss of Patching 

  During the replacement process, the old patch is set to a NOP (no-operation)
  before the new patch is fully applied. This creates a window where the
  function temporarily reverts to its original, unpatched state. If the old
  patch fixed a critical issue (e.g., one that prevented a system panic), the
  system could become vulnerable to that issue during the transition.

The current atomic replacement approach replaces all old livepatches,
even when such a sweeping change is unnecessary. This can be improved
by introducing a hybrid mode, which allows the coexistence of both
atomic replace and non atomic replace livepatches.

In the hybrid mode:

- Specific livepatches can be marked as "non-replaceable" to ensure they
  remain active and unaffected during replacements.

- Other livepatches can be marked as "replaceable", allowing targeted
  replacements of only those patches.

This selective approach would reduce unnecessary transitions, lower the
risk of temporary patch loss, and mitigate performance issues during
livepatch replacement.


Future work:
- Support it in kpatch[1]

Link: https://lore.kernel.org/live-patching/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com/ [0]
Link: https://github.com/dynup/kpatch [1]

Yafang Shao (2):
  livepatch: Add replaceable attribute
  livepatch: Implement livepatch hybrid mode

 include/linux/livepatch.h |  2 ++
 kernel/livepatch/core.c   | 50 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

-- 
2.43.5
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Petr Mladek 1 year ago
On Mon 2025-01-27 14:35:24, Yafang Shao wrote:
> The atomic replace livepatch mechanism was introduced to handle scenarios
> where we want to unload a specific livepatch without unloading others.
> However, its current implementation has significant shortcomings, making
> it less than ideal in practice. Below are the key downsides:
> 
> - It is expensive
> 
>   During testing with frequent replacements of an old livepatch, random RCU
>   warnings were observed:
> 
>   [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
>   [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
>   [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
>   [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
>   [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
>   [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
>   [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
>   [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
>   [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
>   [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
>   [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
>   
>   This indicates that atomic replacement can cause performance issues,
>   particularly with RCU synchronization under frequent use.

Please, provide more details about the test:

  + List of patched functions.

  + What exactly is meant by frequent replacements (busy loop?, once a minute?)

  + Size of the systems (number of CPUs, number of running processes)

  + Were there any extra changes in the livepatch code code,
    e.g. debugging messages?


> - Potential Risks During Replacement 
> 
>   One known issue involves replacing livepatched versions of critical
>   functions such as do_exit(). During the replacement process, a panic
>   might occur, as highlighted in [0].

I would rather prefer to make the atomic replace safe. I mean to
block the transition until all exiting processes are not gone.

Josh made a good point. We should look how this is handled by
RCU, tracing, or another subsystems which might have similar
problems.


> Other potential risks may also arise
>   due to inconsistencies or race conditions during transitions.

What inconsistencies and race conditions you have in mind, please?


> - Temporary Loss of Patching 
> 
>   During the replacement process, the old patch is set to a NOP (no-operation)
>   before the new patch is fully applied. This creates a window where the
>   function temporarily reverts to its original, unpatched state. If the old
>   patch fixed a critical issue (e.g., one that prevented a system panic), the
>   system could become vulnerable to that issue during the transition.

This is not true!

Please, look where klp_patch_object() and klp_unpatch_objects() is
called. Also look at how ops->func_stack is handled in
klp_ftrace_handler().

Also you might want to read Documentation/livepatch/livepatch.rst


> The current atomic replacement approach replaces all old livepatches,
> even when such a sweeping change is unnecessary. This can be improved
> by introducing a hybrid mode, which allows the coexistence of both
> atomic replace and non atomic replace livepatches.
> 
> In the hybrid mode:
> 
> - Specific livepatches can be marked as "non-replaceable" to ensure they
>   remain active and unaffected during replacements.
> 
> - Other livepatches can be marked as "replaceable", allowing targeted
>   replacements of only those patches.

Honestly, I consider this as a workaround for a problem which should
be fixed a proper way.

The main advantage of the atomic replace is simplify the maintenance
and debugging. It reduces the amount of possible combinations. The
hybrid mode brings back the jungle.

Best Regards,
Petr
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Yafang Shao 1 year ago
On Mon, Jan 27, 2025 at 9:46 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2025-01-27 14:35:24, Yafang Shao wrote:
> > The atomic replace livepatch mechanism was introduced to handle scenarios
> > where we want to unload a specific livepatch without unloading others.
> > However, its current implementation has significant shortcomings, making
> > it less than ideal in practice. Below are the key downsides:
> >
> > - It is expensive
> >
> >   During testing with frequent replacements of an old livepatch, random RCU
> >   warnings were observed:
> >
> >   [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
> >   [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
> >   [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
> >   [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
> >   [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
> >   [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
> >   [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
> >   [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
> >   [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
> >   [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
> >   [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
> >
> >   This indicates that atomic replacement can cause performance issues,
> >   particularly with RCU synchronization under frequent use.
>
> Please, provide more details about the test:
>
>   + List of patched functions.

$ ls /sys/kernel/livepatch/livepatch_61_release12/vmlinux/
blk_throtl_cancel_bios,1  __d_move,1  do_iter_readv_writev,1  patched
           update_rq_clock,1
blk_mq_handle_expired,1  d_delete,1                do_exit,1
high_work_func,1        try_charge_memcg,1

$ ls /sys/kernel/livepatch/livepatch_61_release12/xfs/
patched  xfs_extent_busy_flush,1  xfs_iget,1  xfs_inode_mark_reclaimable,1

$ ls /sys/kernel/livepatch/livepatch_61_release12/fuse/
fuse_send_init,1  patched  process_init_reply,1

$ ls /sys/kernel/livepatch/livepatch_61_release12/nf_tables/
nft_data_init,1  patched

>
>   + What exactly is meant by frequent replacements (busy loop?, once a minute?)

The script:

#!/bin/bash
while true; do
        yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
        ./apply_livepatch_61.sh # it will sleep 5s
        yum erase -y kernel-livepatch-6.1.12-0.x86_64
        yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
        ./apply_livepatch_61.sh  # it will sleep 5s
done


>
>   + Size of the systems (number of CPUs, number of running processes)

top - 22:08:17 up 227 days,  3:29,  1 user,  load average: 50.66, 32.92, 20.77
Tasks: 1349 total,   2 running, 543 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.4 us,  0.9 sy,  0.0 ni, 88.1 id,  2.9 wa,  0.3 hi,  0.4 si,  0.0 st
KiB Mem : 39406304+total,  8899864 free, 43732568 used, 34143062+buff/cache
KiB Swap:        0 total,        0 free,        0 used. 32485065+avail Mem

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    2
Core(s) per socket:    64
Socket(s):             1
NUMA node(s):          1
Vendor ID:             AuthenticAMD
CPU family:            25
Model:                 1
Model name:            AMD EPYC 7Q83 64-Core Processor
Stepping:              1
CPU MHz:               3234.573
CPU max MHz:           3854.4431
CPU min MHz:           1500.0000
BogoMIPS:              4890.66
Virtualization:        AMD-V
L1d cache:             32K
L1i cache:             32K
L2 cache:              512K
L3 cache:              32768K
NUMA node0 CPU(s):     0-127

>
>   + Were there any extra changes in the livepatch code code,
>     e.g. debugging messages?

Not at all.

>
>
> > - Potential Risks During Replacement
> >
> >   One known issue involves replacing livepatched versions of critical
> >   functions such as do_exit(). During the replacement process, a panic
> >   might occur, as highlighted in [0].
>
> I would rather prefer to make the atomic replace safe. I mean to
> block the transition until all exiting processes are not gone.
>
> Josh made a good point. We should look how this is handled by
> RCU, tracing, or another subsystems which might have similar
> problems.

I don't against this fix.

>
>
> > Other potential risks may also arise
> >   due to inconsistencies or race conditions during transitions.
>
> What inconsistencies and race conditions you have in mind, please?

I have explained it at
https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354

 klp_ftrace_handler
      if (unlikely(func->transition)) {
          WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
  }

Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
that led to the decision to add this warning?

>
>
> > - Temporary Loss of Patching
> >
> >   During the replacement process, the old patch is set to a NOP (no-operation)
> >   before the new patch is fully applied. This creates a window where the
> >   function temporarily reverts to its original, unpatched state. If the old
> >   patch fixed a critical issue (e.g., one that prevented a system panic), the
> >   system could become vulnerable to that issue during the transition.
>
> This is not true!
>
> Please, look where klp_patch_object() and klp_unpatch_objects() is
> called. Also look at how ops->func_stack is handled in
> klp_ftrace_handler().
>
> Also you might want to read Documentation/livepatch/livepatch.rst

Thanks for your guidance.

>
>
> > The current atomic replacement approach replaces all old livepatches,
> > even when such a sweeping change is unnecessary. This can be improved
> > by introducing a hybrid mode, which allows the coexistence of both
> > atomic replace and non atomic replace livepatches.
> >
> > In the hybrid mode:
> >
> > - Specific livepatches can be marked as "non-replaceable" to ensure they
> >   remain active and unaffected during replacements.
> >
> > - Other livepatches can be marked as "replaceable", allowing targeted
> >   replacements of only those patches.
>
> Honestly, I consider this as a workaround for a problem which should
> be fixed a proper way.

This is not a workaround. We should avoid replacing functions that do not
differ between the two versions. Since automatic handling is not
possible for now, we can provide a sysfs interface
to allow users to perform it manually.

>
> The main advantage of the atomic replace is simplify the maintenance
> and debugging.

Is it worth the high overhead on production servers?
Can you provide examples of companies that use atomic replacement at
scale in their production environments?

>  It reduces the amount of possible combinations. The
> hybrid mode brings back the jungle.

-- 
Regards
Yafang
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Miroslav Benes 1 year ago
> >
> >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> 
> The script:
> 
> #!/bin/bash
> while true; do
>         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
>         ./apply_livepatch_61.sh # it will sleep 5s
>         yum erase -y kernel-livepatch-6.1.12-0.x86_64
>         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
>         ./apply_livepatch_61.sh  # it will sleep 5s
> done
 
A live patch application is a slowpath. It is expected not to run 
frequently (in a relative sense). If you stress it like this, it is quite 
expected that it will have an impact. Especially on a large busy system.

> >
> > > Other potential risks may also arise
> > >   due to inconsistencies or race conditions during transitions.
> >
> > What inconsistencies and race conditions you have in mind, please?
> 
> I have explained it at
> https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
> 
>  klp_ftrace_handler
>       if (unlikely(func->transition)) {
>           WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
>   }
> 
> Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
> that led to the decision to add this warning?

A safety measure for something which really should not happen.

> > The main advantage of the atomic replace is simplify the maintenance
> > and debugging.
> 
> Is it worth the high overhead on production servers?

Yes, because the overhead once a live patch is applied is negligible.

> Can you provide examples of companies that use atomic replacement at
> scale in their production environments?

At least SUSE uses it as a solution for its customers. No many problems 
have been reported since we started ~10 years ago.

Regards,
Miroslav
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Yafang Shao 1 year ago
On Fri, Jan 31, 2025 at 9:18 PM Miroslav Benes <mbenes@suse.cz> wrote:
>
> > >
> > >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> >
> > The script:
> >
> > #!/bin/bash
> > while true; do
> >         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
> >         ./apply_livepatch_61.sh # it will sleep 5s
> >         yum erase -y kernel-livepatch-6.1.12-0.x86_64
> >         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
> >         ./apply_livepatch_61.sh  # it will sleep 5s
> > done
>
> A live patch application is a slowpath. It is expected not to run
> frequently (in a relative sense).

The frequency isn’t the main concern here; _scalability_ is the key issue.
Running livepatches once per day (a relatively low frequency) across all of our
production servers (hundreds of thousands) isn’t feasible. Instead, we need to
periodically run tests on a subset of test servers.


> If you stress it like this, it is quite
> expected that it will have an impact. Especially on a large busy system.

It seems you agree that the current atomic-replace process lacks scalability.
When deploying a livepatch across a large fleet of servers, it’s impossible to
ensure that the servers are idle, as their workloads are constantly varying and
are not deterministic.

The challenges are very different when managing 1K servers versus 1M servers.
Similarly, the issues differ significantly between patching a single
function and
patching 100 functions, especially when some of those functions are critical.
That’s what scalability is all about.

Since we transitioned from the old livepatch mode to the new
atomic-replace mode,
our SREs have consistently reported that one or more servers become
stalled during
the upgrade (replacement).

>
> > >
> > > > Other potential risks may also arise
> > > >   due to inconsistencies or race conditions during transitions.
> > >
> > > What inconsistencies and race conditions you have in mind, please?
> >
> > I have explained it at
> > https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
> >
> >  klp_ftrace_handler
> >       if (unlikely(func->transition)) {
> >           WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
> >   }
> >
> > Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
> > that led to the decision to add this warning?
>
> A safety measure for something which really should not happen.

Unfortunately, this issue occurs during my stress tests.

>
> > > The main advantage of the atomic replace is simplify the maintenance
> > > and debugging.
> >
> > Is it worth the high overhead on production servers?
>
> Yes, because the overhead once a live patch is applied is negligible.

If you’re managing a large fleet of servers, this issue is far from negligible.

>
> > Can you provide examples of companies that use atomic replacement at
> > scale in their production environments?
>
> At least SUSE uses it as a solution for its customers. No many problems
> have been reported since we started ~10 years ago.

Perhaps we’re running different workloads.
Going back to the original purpose of livepatching: is it designed to address
security vulnerabilities, or to deploy new features?
If it’s the latter, then there’s definitely a lot of room for improvement.

-- 
Regards
Yafang
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Petr Mladek 1 year ago
On Mon 2025-02-03 17:44:52, Yafang Shao wrote:
> On Fri, Jan 31, 2025 at 9:18 PM Miroslav Benes <mbenes@suse.cz> wrote:
> >
> > > >
> > > >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> > >
> > > The script:
> > >
> > > #!/bin/bash
> > > while true; do
> > >         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
> > >         ./apply_livepatch_61.sh # it will sleep 5s
> > >         yum erase -y kernel-livepatch-6.1.12-0.x86_64
> > >         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
> > >         ./apply_livepatch_61.sh  # it will sleep 5s
> > > done
> >
> > A live patch application is a slowpath. It is expected not to run
> > frequently (in a relative sense).
> 
> The frequency isn’t the main concern here; _scalability_ is the key issue.
> Running livepatches once per day (a relatively low frequency) across all of our
> production servers (hundreds of thousands) isn’t feasible. Instead, we need to
> periodically run tests on a subset of test servers.

I am confused. The original problem was a system crash when
livepatching do_exit() function, see
https://lore.kernel.org/r/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com

The rcu watchdog warning was first mentioned in this patchset.
Do you see rcu watchdog warning in production or just
with this artificial test, please?


> > If you stress it like this, it is quite
> > expected that it will have an impact. Especially on a large busy system.
> 
> It seems you agree that the current atomic-replace process lacks scalability.
> When deploying a livepatch across a large fleet of servers, it’s impossible to
> ensure that the servers are idle, as their workloads are constantly varying and
> are not deterministic.

Do you see the scalability problem in production, please?
And could you prove that it was caused by livepatching, please?


> The challenges are very different when managing 1K servers versus 1M servers.
> Similarly, the issues differ significantly between patching a single
> function and
> patching 100 functions, especially when some of those functions are critical.
> That’s what scalability is all about.
> 
> Since we transitioned from the old livepatch mode to the new
> atomic-replace mode,

What do you mean with the old livepatch mode, please?

Did you allow to install more livepatches in parallel?
What was the motivation to switch to the atomic replace, please?

> our SREs have consistently reported that one or more servers become
> stalled during
> the upgrade (replacement).

What is SRE, please?
Could you please show some log from a production system?


> > > > > Other potential risks may also arise
> > > > >   due to inconsistencies or race conditions during transitions.
> > > >
> > > > What inconsistencies and race conditions you have in mind, please?
> > >
> > > I have explained it at
> > > https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
> > >
> > >  klp_ftrace_handler
> > >       if (unlikely(func->transition)) {
> > >           WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
> > >   }
> > >
> > > Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
> > > that led to the decision to add this warning?
> >
> > A safety measure for something which really should not happen.
> 
> Unfortunately, this issue occurs during my stress tests.

I am confused. Do you see the above WARN_ON_ONCE() during your
stress test? Could you please provide a log?

> > > > The main advantage of the atomic replace is simplify the maintenance
> > > > and debugging.
> > >
> > > Is it worth the high overhead on production servers?
> >
> > Yes, because the overhead once a live patch is applied is negligible.
> 
> If you’re managing a large fleet of servers, this issue is far from negligible.
> 
> >
> > > Can you provide examples of companies that use atomic replacement at
> > > scale in their production environments?
> >
> > At least SUSE uses it as a solution for its customers. No many problems
> > have been reported since we started ~10 years ago.
> 
> Perhaps we’re running different workloads.
> Going back to the original purpose of livepatching: is it designed to address
> security vulnerabilities, or to deploy new features?

We (SUSE) use livepatches only for fixing CVEs and serious bugs.


> If it’s the latter, then there’s definitely a lot of room for improvement.

You might be right. I am just not sure whether the hybrid mode would
be the right solution.

If you have problems with the atomic replace then you might stop using
it completely and just install more livepatches in parallel.


My view:

More livepatches installed in parallel are more prone to
inconsistencies. A good example is the thread about introducing
stack order sysfs interface, see
https://lore.kernel.org/all/AAD198C9-210E-4E31-8FD7-270C39A974A8@gmail.com/

The atomic replace helps to keep the livepatched functions consistent.

The hybrid model would allow to install more livepatches in parallel except
that one livepatch could be replaced atomically. It would create even
more scenarios than allowing all livepatches in parallel.

What would be the rules, please?

Which functionality will be livepatched by the atomic replace, please?

Which functionality will be handled by the extra non-replaceable
livepatches, please?

How would you keep the livepatches consistent, please?

How would you manage dependencies between livepatches, please?

What is the advantage of the hybrid model over allowing
all livepatches in parallel, please?

Best Regards,
Petr
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Yafang Shao 1 year ago
On Tue, Feb 4, 2025 at 9:05 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2025-02-03 17:44:52, Yafang Shao wrote:
> > On Fri, Jan 31, 2025 at 9:18 PM Miroslav Benes <mbenes@suse.cz> wrote:
> > >
> > > > >
> > > > >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> > > >
> > > > The script:
> > > >
> > > > #!/bin/bash
> > > > while true; do
> > > >         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
> > > >         ./apply_livepatch_61.sh # it will sleep 5s
> > > >         yum erase -y kernel-livepatch-6.1.12-0.x86_64
> > > >         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
> > > >         ./apply_livepatch_61.sh  # it will sleep 5s
> > > > done
> > >
> > > A live patch application is a slowpath. It is expected not to run
> > > frequently (in a relative sense).
> >
> > The frequency isn’t the main concern here; _scalability_ is the key issue.
> > Running livepatches once per day (a relatively low frequency) across all of our
> > production servers (hundreds of thousands) isn’t feasible. Instead, we need to
> > periodically run tests on a subset of test servers.
>
> I am confused. The original problem was a system crash when
> livepatching do_exit() function, see
> https://lore.kernel.org/r/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com

Why do you view this patchset as a solution to the original problem?

>
> The rcu watchdog warning was first mentioned in this patchset.
> Do you see rcu watchdog warning in production or just
> with this artificial test, please?

So, we shouldn’t run any artificial tests on the livepatch, correct?
What exactly is the issue with these test cases?

>
>
> > > If you stress it like this, it is quite
> > > expected that it will have an impact. Especially on a large busy system.
> >
> > It seems you agree that the current atomic-replace process lacks scalability.
> > When deploying a livepatch across a large fleet of servers, it’s impossible to
> > ensure that the servers are idle, as their workloads are constantly varying and
> > are not deterministic.
>
> Do you see the scalability problem in production, please?

Yes, the livepatch transition was stalled.


> And could you prove that it was caused by livepatching, please?

When the livepatch transition is stalled, running `kpatch list` will
display the stalled information.

>
>
> > The challenges are very different when managing 1K servers versus 1M servers.
> > Similarly, the issues differ significantly between patching a single
> > function and
> > patching 100 functions, especially when some of those functions are critical.
> > That’s what scalability is all about.
> >
> > Since we transitioned from the old livepatch mode to the new
> > atomic-replace mode,
>
> What do you mean with the old livepatch mode, please?

$ kpatch-build -R

>
> Did you allow to install more livepatches in parallel?

No.

> What was the motivation to switch to the atomic replace, please?

This is the default behavior of kpatch [1] after upgrading to a new version.

[1].  https://github.com/dynup/kpatch/tree/master

>
> > our SREs have consistently reported that one or more servers become
> > stalled during
> > the upgrade (replacement).
>
> What is SRE, please?

From the wikipedia : https://en.wikipedia.org/wiki/Site_reliability_engineering

> Could you please show some log from a production system?

When the SREs initially reported that the livepatch transition was
stalled, I simply advised them to try again. However, after
experiencing these crashes, I dug deeper into the livepatch code and
realized that scalability is a concern. As a result, periodically
replacing an old livepatch triggers RCU warnings on our production
servers.

[Wed Feb  5 10:56:10 2025] livepatch: enabling patch 'livepatch_61_release6'
[Wed Feb  5 10:56:10 2025] livepatch: 'livepatch_61_release6':
starting patching transition
[Wed Feb  5 10:56:24 2025] rcu_tasks_wait_gp: rcu_tasks grace period
1126113 is 10078 jiffies old.
[Wed Feb  5 10:56:38 2025] rcu_tasks_wait_gp: rcu_tasks grace period
1126117 is 10199 jiffies old.
[Wed Feb  5 10:56:52 2025] rcu_tasks_wait_gp: rcu_tasks grace period
1126121 is 10047 jiffies old.
[Wed Feb  5 10:56:57 2025] livepatch: 'livepatch_61_release6': patching complete

PS: You might argue again about the frequency. If you believe this is
just a frequency issue, please suggest a suitable frequency.

>
>
> > > > > > Other potential risks may also arise
> > > > > >   due to inconsistencies or race conditions during transitions.
> > > > >
> > > > > What inconsistencies and race conditions you have in mind, please?
> > > >
> > > > I have explained it at
> > > > https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
> > > >
> > > >  klp_ftrace_handler
> > > >       if (unlikely(func->transition)) {
> > > >           WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
> > > >   }
> > > >
> > > > Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
> > > > that led to the decision to add this warning?
> > >
> > > A safety measure for something which really should not happen.
> >
> > Unfortunately, this issue occurs during my stress tests.
>
> I am confused. Do you see the above WARN_ON_ONCE() during your
> stress test? Could you please provide a log?

Could you pls read my replyment seriously ?
https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354

>
> > > > > The main advantage of the atomic replace is simplify the maintenance
> > > > > and debugging.
> > > >
> > > > Is it worth the high overhead on production servers?
> > >
> > > Yes, because the overhead once a live patch is applied is negligible.
> >
> > If you’re managing a large fleet of servers, this issue is far from negligible.
> >
> > >
> > > > Can you provide examples of companies that use atomic replacement at
> > > > scale in their production environments?
> > >
> > > At least SUSE uses it as a solution for its customers. No many problems
> > > have been reported since we started ~10 years ago.
> >
> > Perhaps we’re running different workloads.
> > Going back to the original purpose of livepatching: is it designed to address
> > security vulnerabilities, or to deploy new features?
>
> We (SUSE) use livepatches only for fixing CVEs and serious bugs.
>
>
> > If it’s the latter, then there’s definitely a lot of room for improvement.
>
> You might be right. I am just not sure whether the hybrid mode would
> be the right solution.
>
> If you have problems with the atomic replace then you might stop using
> it completely and just install more livepatches in parallel.

Why do we need to install livepatches in parallel if atomic replace is disabled?
We only need to install the additional new livepatch. Parallel
installation is only necessary at boot time.

>
>
> My view:
>
> More livepatches installed in parallel are more prone to

I’m confused as to why you consider this a parallel installation issue.

> inconsistencies. A good example is the thread about introducing
> stack order sysfs interface, see
> https://lore.kernel.org/all/AAD198C9-210E-4E31-8FD7-270C39A974A8@gmail.com/
>
> The atomic replace helps to keep the livepatched functions consistent.
>
> The hybrid model would allow to install more livepatches in parallel except
> that one livepatch could be replaced atomically. It would create even
> more scenarios than allowing all livepatches in parallel.
>
> What would be the rules, please?
>
> Which functionality will be livepatched by the atomic replace, please?
>
> Which functionality will be handled by the extra non-replaceable
> livepatches, please?
>
> How would you keep the livepatches consistent, please?
>
> How would you manage dependencies between livepatches, please?
>
> What is the advantage of the hybrid model over allowing
> all livepatches in parallel, please?

I can’t answer your questions if you insist on framing this as a
parallel installation issue.

--
Regards

Yafang
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Petr Mladek 1 year ago
On Wed 2025-02-05 14:16:42, Yafang Shao wrote:
> On Tue, Feb 4, 2025 at 9:05 PM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Mon 2025-02-03 17:44:52, Yafang Shao wrote:
> > > On Fri, Jan 31, 2025 at 9:18 PM Miroslav Benes <mbenes@suse.cz> wrote:
> > > >
> > > > > >
> > > > > >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> > > > >
> > > > > The script:
> > > > >
> > > > > #!/bin/bash
> > > > > while true; do
> > > > >         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
> > > > >         ./apply_livepatch_61.sh # it will sleep 5s
> > > > >         yum erase -y kernel-livepatch-6.1.12-0.x86_64
> > > > >         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
> > > > >         ./apply_livepatch_61.sh  # it will sleep 5s
> > > > > done
> > > >
> > > > A live patch application is a slowpath. It is expected not to run
> > > > frequently (in a relative sense).
> > >
> > > The frequency isn’t the main concern here; _scalability_ is the key issue.
> > > Running livepatches once per day (a relatively low frequency) across all of our
> > > production servers (hundreds of thousands) isn’t feasible. Instead, we need to
> > > periodically run tests on a subset of test servers.
> >
> > I am confused. The original problem was a system crash when
> > livepatching do_exit() function, see
> > https://lore.kernel.org/r/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com
> 
> Why do you view this patchset as a solution to the original problem?

You discovered the hardlockup warnings when trying to reproduce the
original problem. At least, you mentioned this at
https://lore.kernel.org/r/20250122085146.41553-1-laoar.shao@gmail.com

And using the hybrid module would allow to livepatch do_exit() only
once and do not touch it any longer. It is not the right solution
but it would be a workaround.


> > The rcu watchdog warning was first mentioned in this patchset.
> > Do you see rcu watchdog warning in production or just
> > with this artificial test, please?
> 
> So, we shouldn’t run any artificial tests on the livepatch, correct?
> What exactly is the issue with these test cases?

Some tests might be too artificial. They might find problems which
do not exist in practice.

Disclaimer: I do not say that this is the case. You actually prove
	later in this mail that the hardlockup warning happen
	even in production.

Anyway, if an artificial test finds a potential problem and the fix is
complicated then we need to decide if it is worth it.

It does not make sense to complicate the code too much when
the fixed problem does not happen in practice.

  + Too complicated code might introduce regressions which are
    more serious than the original problem.

  + Too complicated code increases the maintenance cost. It is
    more complicated to add new features or fix bugs.


> > > > If you stress it like this, it is quite
> > > > expected that it will have an impact. Especially on a large busy system.
> > >
> > > It seems you agree that the current atomic-replace process lacks scalability.
> > > When deploying a livepatch across a large fleet of servers, it’s impossible to
> > > ensure that the servers are idle, as their workloads are constantly varying and
> > > are not deterministic.
> >
> > Do you see the scalability problem in production, please?
> 
> Yes, the livepatch transition was stalled.

Good to know.

> 
> > And could you prove that it was caused by livepatching, please?
> 
> When the livepatch transition is stalled, running `kpatch list` will
> display the stalled information.

OK.

> > > The challenges are very different when managing 1K servers versus 1M servers.
> > > Similarly, the issues differ significantly between patching a single
> > > function and
> > > patching 100 functions, especially when some of those functions are critical.
> > > That’s what scalability is all about.
> > >
> > > Since we transitioned from the old livepatch mode to the new
> > > atomic-replace mode,
> >
> > What do you mean with the old livepatch mode, please?
> 
> $ kpatch-build -R

I am not familiar with kpatch-build. OK, I see the following at
https://github.com/dynup/kpatch/blob/master/kpatch-build/kpatch-build

echo "		-R, --non-replace       Disable replace patch (replace is on by default)" >&2

> >
> > Did you allow to install more livepatches in parallel?
> 
> No.

I guess that there is a misunderstanding. I am sorry my wording was
not clear enough.

By "installing" more livepatches in parallel I meant to "have enabled"
more livepatches in parallel. It is possible only when you do not
use the atomic replace.

By other words, if you use "kpatch-build -R" then you could have
enabled more livepatches in parallel.


> > What was the motivation to switch to the atomic replace, please?
> 
> This is the default behavior of kpatch [1] after upgrading to a new version.
> 
> [1].  https://github.com/dynup/kpatch/tree/master

OK. I wonder if the atomic replace simplified some actions for you.

I ask because the proposed "hybrid" model is very similar to the "old"
model which did not use the atomic replace.

What are the advantages of the "hybrid" model over the "old" model, please?


> > > our SREs have consistently reported that one or more servers become
> > > stalled during
> > > the upgrade (replacement).
> >
> > What is SRE, please?
> 
> >From the wikipedia : https://en.wikipedia.org/wiki/Site_reliability_engineering

Good to know.

> > Could you please show some log from a production system?
> 
> When the SREs initially reported that the livepatch transition was
> stalled, I simply advised them to try again. However, after
> experiencing these crashes, I dug deeper into the livepatch code and
> realized that scalability is a concern. As a result, periodically
> replacing an old livepatch triggers RCU warnings on our production
> servers.
> 
> [Wed Feb  5 10:56:10 2025] livepatch: enabling patch 'livepatch_61_release6'
> [Wed Feb  5 10:56:10 2025] livepatch: 'livepatch_61_release6':
> starting patching transition
> [Wed Feb  5 10:56:24 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> 1126113 is 10078 jiffies old.
> [Wed Feb  5 10:56:38 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> 1126117 is 10199 jiffies old.
> [Wed Feb  5 10:56:52 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> 1126121 is 10047 jiffies old.
> [Wed Feb  5 10:56:57 2025] livepatch: 'livepatch_61_release6': patching complete

I guess that this happens primary because there are many processes
running in kernel code:

       + many processes => long task list
       + in kernel code => need to check stack

I wondered how much it is caused by livepatching do_exit() which
takes tasklist_lock. The idea was:

	+ The processes calling do_exit() are blocked at
	  write_lock_irq(&tasklist_lock) when
	  klp_try_complete_transition() takes the tasklist_lock.

	+ These processes can't be transitioned because do_exit()
	  is on the stack => more klp_try_complete_transition()
	  rounds is needed.

	  => livepatching do_exit() reducess the chance of
	     klp_try_complete_transition() succcess.

	Well, it should not be that big problem because the next
	 klp_try_complete_transition() should be much faster.
	 It will skip already transitioned processes quickly.

       Resume: I think that livepatching of do_exit() should not be the main
	 	problem for the stall.


> PS: You might argue again about the frequency. If you believe this is
> just a frequency issue, please suggest a suitable frequency.

I do not know. The livepatch transition might block some processes.
It is a kind of stress for the system. Similar to another
housekeeping operations.

It depends on the load and the amount and type of livepatched
functions. It might take some time until the system recovers
from the stress and the system load drops back to normal.

If you create the stress (livepatch transition) too frequently
and the system does not get chance to recover in between the
stress situations then the bad effects might accumulate
and might be much worse.

I have no idea if it is the case here. The rule of thumb would be:

  + If you see the hardlockup warning _only_ when running the stress
    test "while true: do apply_livepatch ; done;" then
    the problem might be rather theoretical.

  + If you see the hardlockup warning on production systems where
    the you apply a livepatch only occasionally (one per day)
    then the problem is real and we should fix it.


> > > > > > > Other potential risks may also arise
> > > > > > >   due to inconsistencies or race conditions during transitions.
> > > > > >
> > > > > > What inconsistencies and race conditions you have in mind, please?
> > > > >
> > > > > I have explained it at
> > > > > https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
> > > > >
> > > > >  klp_ftrace_handler
> > > > >       if (unlikely(func->transition)) {
> > > > >           WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
> > > > >   }
> > > > >
> > > > > Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
> > > > > that led to the decision to add this warning?
> > > >
> > > > A safety measure for something which really should not happen.
> > >
> > > Unfortunately, this issue occurs during my stress tests.
> >
> > I am confused. Do you see the above WARN_ON_ONCE() during your
> > stress test? Could you please provide a log?
> 
> Could you pls read my replyment seriously ?

This is pretty hars and offending after so many details I have already
provided!

It is easy to miss a detail in a flood of long mails. Also I am
working on many other things in parallel.

> https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354

Ah, I have missed that you triggered this exact WARNING. It is great.
It confirms the theory about the race in do_exit(). I mean that
the transition finishes early because the processes in do_exit()
are not longer visible in the tasklist.


> > > > > > The main advantage of the atomic replace is simplify the maintenance
> > > > > > and debugging.
> >
> > If you have problems with the atomic replace then you might stop using
> > it completely and just install more livepatches in parallel.
> 
> Why do we need to install livepatches in parallel if atomic replace is disabled?
> We only need to install the additional new livepatch. Parallel
> installation is only necessary at boot time.

This is misunderstanding. By "installed" livepatches in parallel I
mean "enabled" livepatches in parallel, aka, without atomic replace.

If you have problems with atomic replace, you might stop using it.
Honestly, I do not see that big advantage in the hybrid model
over the non-atomic-replace model.

That said, I think that the hybrid mode will not prevent the
hardlockup warning. It seems that you have reproduced the hardlockup
even with a relatively simple livepatch, see
https://lore.kernel.org/all/CALOAHbBZc6ORGzXwBRwe+rD2=YGf1jub5TEr989_GpK54P2o1A@mail.gmail.com/

IMHO, we should rather detect and break the stall in
klp_try_complete_transition(). I mean to go the way explored in
the thread
https://lore.kernel.org/all/20250122085146.41553-1-laoar.shao@gmail.com/

Best Regards,
Petr
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Yafang Shao 12 months ago
On Fri, Feb 7, 2025 at 7:01 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Wed 2025-02-05 14:16:42, Yafang Shao wrote:
> > On Tue, Feb 4, 2025 at 9:05 PM Petr Mladek <pmladek@suse.com> wrote:
> > >
> > > On Mon 2025-02-03 17:44:52, Yafang Shao wrote:
> > > > On Fri, Jan 31, 2025 at 9:18 PM Miroslav Benes <mbenes@suse.cz> wrote:
> > > > >
> > > > > > >
> > > > > > >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> > > > > >
> > > > > > The script:
> > > > > >
> > > > > > #!/bin/bash
> > > > > > while true; do
> > > > > >         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
> > > > > >         ./apply_livepatch_61.sh # it will sleep 5s
> > > > > >         yum erase -y kernel-livepatch-6.1.12-0.x86_64
> > > > > >         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
> > > > > >         ./apply_livepatch_61.sh  # it will sleep 5s
> > > > > > done
> > > > >
> > > > > A live patch application is a slowpath. It is expected not to run
> > > > > frequently (in a relative sense).
> > > >
> > > > The frequency isn’t the main concern here; _scalability_ is the key issue.
> > > > Running livepatches once per day (a relatively low frequency) across all of our
> > > > production servers (hundreds of thousands) isn’t feasible. Instead, we need to
> > > > periodically run tests on a subset of test servers.
> > >
> > > I am confused. The original problem was a system crash when
> > > livepatching do_exit() function, see
> > > https://lore.kernel.org/r/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com
> >
> > Why do you view this patchset as a solution to the original problem?
>
> You discovered the hardlockup warnings when trying to reproduce the
> original problem. At least, you mentioned this at
> https://lore.kernel.org/r/20250122085146.41553-1-laoar.shao@gmail.com
>
> And using the hybrid module would allow to livepatch do_exit() only
> once and do not touch it any longer. It is not the right solution
> but it would be a workaround.
>
>
> > > The rcu watchdog warning was first mentioned in this patchset.
> > > Do you see rcu watchdog warning in production or just
> > > with this artificial test, please?
> >
> > So, we shouldn’t run any artificial tests on the livepatch, correct?
> > What exactly is the issue with these test cases?
>
> Some tests might be too artificial. They might find problems which
> do not exist in practice.
>
> Disclaimer: I do not say that this is the case. You actually prove
>         later in this mail that the hardlockup warning happen
>         even in production.
>
> Anyway, if an artificial test finds a potential problem and the fix is
> complicated then we need to decide if it is worth it.
>
> It does not make sense to complicate the code too much when
> the fixed problem does not happen in practice.
>
>   + Too complicated code might introduce regressions which are
>     more serious than the original problem.
>
>   + Too complicated code increases the maintenance cost. It is
>     more complicated to add new features or fix bugs.
>
>
> > > > > If you stress it like this, it is quite
> > > > > expected that it will have an impact. Especially on a large busy system.
> > > >
> > > > It seems you agree that the current atomic-replace process lacks scalability.
> > > > When deploying a livepatch across a large fleet of servers, it’s impossible to
> > > > ensure that the servers are idle, as their workloads are constantly varying and
> > > > are not deterministic.
> > >
> > > Do you see the scalability problem in production, please?
> >
> > Yes, the livepatch transition was stalled.
>
> Good to know.
>
> >
> > > And could you prove that it was caused by livepatching, please?
> >
> > When the livepatch transition is stalled, running `kpatch list` will
> > display the stalled information.
>
> OK.
>
> > > > The challenges are very different when managing 1K servers versus 1M servers.
> > > > Similarly, the issues differ significantly between patching a single
> > > > function and
> > > > patching 100 functions, especially when some of those functions are critical.
> > > > That’s what scalability is all about.
> > > >
> > > > Since we transitioned from the old livepatch mode to the new
> > > > atomic-replace mode,
> > >
> > > What do you mean with the old livepatch mode, please?
> >
> > $ kpatch-build -R
>
> I am not familiar with kpatch-build. OK, I see the following at
> https://github.com/dynup/kpatch/blob/master/kpatch-build/kpatch-build
>
> echo "          -R, --non-replace       Disable replace patch (replace is on by default)" >&2
>
> > >
> > > Did you allow to install more livepatches in parallel?
> >
> > No.
>
> I guess that there is a misunderstanding. I am sorry my wording was
> not clear enough.
>
> By "installing" more livepatches in parallel I meant to "have enabled"
> more livepatches in parallel. It is possible only when you do not
> use the atomic replace.
>
> By other words, if you use "kpatch-build -R" then you could have
> enabled more livepatches in parallel.
>
>
> > > What was the motivation to switch to the atomic replace, please?
> >
> > This is the default behavior of kpatch [1] after upgrading to a new version.
> >
> > [1].  https://github.com/dynup/kpatch/tree/master
>
> OK. I wonder if the atomic replace simplified some actions for you.

Actually, it simplifies the livepatch deployment since it only
involves a single livepatch.

>
> I ask because the proposed "hybrid" model is very similar to the "old"
> model which did not use the atomic replace.

It’s essentially a hybrid of the old model and the atomic replace model.

>
> What are the advantages of the "hybrid" model over the "old" model, please?

- Advantages compared to the old model
  In the old model, it’s not possible to replace a specific livepatch
with a new one. In the hybrid model, however, you can replace specific
livepatches as needed.

- Advantages and disadvantages compared to the atomic replace model
   - Advantage
     In the atomic replace model, you must replace all old
livepatches, regardless of whether they are relevant to the new
livepatch. In the hybrid model, you only need to replace the relevant
livepatches.

   - Disadvantage
     In the atomic replace model, you only need to maintain a single
livepatch. However, in the hybrid model, you need to manage multiple
livepatches.

>
>
> > > > our SREs have consistently reported that one or more servers become
> > > > stalled during
> > > > the upgrade (replacement).
> > >
> > > What is SRE, please?
> >
> > >From the wikipedia : https://en.wikipedia.org/wiki/Site_reliability_engineering
>
> Good to know.
>
> > > Could you please show some log from a production system?
> >
> > When the SREs initially reported that the livepatch transition was
> > stalled, I simply advised them to try again. However, after
> > experiencing these crashes, I dug deeper into the livepatch code and
> > realized that scalability is a concern. As a result, periodically
> > replacing an old livepatch triggers RCU warnings on our production
> > servers.
> >
> > [Wed Feb  5 10:56:10 2025] livepatch: enabling patch 'livepatch_61_release6'
> > [Wed Feb  5 10:56:10 2025] livepatch: 'livepatch_61_release6':
> > starting patching transition
> > [Wed Feb  5 10:56:24 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> > 1126113 is 10078 jiffies old.
> > [Wed Feb  5 10:56:38 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> > 1126117 is 10199 jiffies old.
> > [Wed Feb  5 10:56:52 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> > 1126121 is 10047 jiffies old.
> > [Wed Feb  5 10:56:57 2025] livepatch: 'livepatch_61_release6': patching complete
>
> I guess that this happens primary because there are many processes
> running in kernel code:
>
>        + many processes => long task list
>        + in kernel code => need to check stack
>
> I wondered how much it is caused by livepatching do_exit() which
> takes tasklist_lock. The idea was:

I’ll drop the changes in do_exit() and run the test again.

>
>         + The processes calling do_exit() are blocked at
>           write_lock_irq(&tasklist_lock) when
>           klp_try_complete_transition() takes the tasklist_lock.
>
>         + These processes can't be transitioned because do_exit()
>           is on the stack => more klp_try_complete_transition()
>           rounds is needed.
>
>           => livepatching do_exit() reducess the chance of
>              klp_try_complete_transition() succcess.
>
>         Well, it should not be that big problem because the next
>          klp_try_complete_transition() should be much faster.
>          It will skip already transitioned processes quickly.
>
>        Resume: I think that livepatching of do_exit() should not be the main
>                 problem for the stall.
>
>
> > PS: You might argue again about the frequency. If you believe this is
> > just a frequency issue, please suggest a suitable frequency.
>
> I do not know. The livepatch transition might block some processes.
> It is a kind of stress for the system. Similar to another
> housekeeping operations.
>
> It depends on the load and the amount and type of livepatched
> functions. It might take some time until the system recovers
> from the stress and the system load drops back to normal.
>
> If you create the stress (livepatch transition) too frequently
> and the system does not get chance to recover in between the
> stress situations then the bad effects might accumulate
> and might be much worse.
>
> I have no idea if it is the case here. The rule of thumb would be:

Note that I’ve just deployed the test to a few of our production
servers. The test runs at a relatively low frequency, so it doesn’t
introduce significant side effects to the original workload, aside
from some latency spikes during deployment. These spikes are
short-lived and disappear quickly.

>
>   + If you see the hardlockup warning _only_ when running the stress
>     test "while true: do apply_livepatch ; done;" then
>     the problem might be rather theoretical.
>
>   + If you see the hardlockup warning on production systems where
>     the you apply a livepatch only occasionally (one per day)
>     then the problem is real and we should fix it.

I can't run it once per day, as it might take too long to reproduce
the issue. If I want to reproduce it quickly, I’d need to deploy the
test to more production servers, which would likely be complained
about by our sys admins. Notably, the RCU warning even occurs when I
run the test once every 4 hours. The reason I run the test
periodically is that the workloads change throughout the day, so
periodically testing them helps monitor the system's behavior across
different workloads.

>
>
> > > > > > > > Other potential risks may also arise
> > > > > > > >   due to inconsistencies or race conditions during transitions.
> > > > > > >
> > > > > > > What inconsistencies and race conditions you have in mind, please?
> > > > > >
> > > > > > I have explained it at
> > > > > > https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
> > > > > >
> > > > > >  klp_ftrace_handler
> > > > > >       if (unlikely(func->transition)) {
> > > > > >           WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
> > > > > >   }
> > > > > >
> > > > > > Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
> > > > > > that led to the decision to add this warning?
> > > > >
> > > > > A safety measure for something which really should not happen.
> > > >
> > > > Unfortunately, this issue occurs during my stress tests.
> > >
> > > I am confused. Do you see the above WARN_ON_ONCE() during your
> > > stress test? Could you please provide a log?
> >
> > Could you pls read my replyment seriously ?
>
> This is pretty hars and offending after so many details I have already
> provided!

Sorry about that. You really help me so much.

>
> It is easy to miss a detail in a flood of long mails. Also I am
> working on many other things in parallel.
>
> > https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
>
> Ah, I have missed that you triggered this exact WARNING. It is great.
> It confirms the theory about the race in do_exit(). I mean that
> the transition finishes early because the processes in do_exit()
> are not longer visible in the tasklist.

It seems like that.

>
>
> > > > > > > The main advantage of the atomic replace is simplify the maintenance
> > > > > > > and debugging.
> > >
> > > If you have problems with the atomic replace then you might stop using
> > > it completely and just install more livepatches in parallel.
> >
> > Why do we need to install livepatches in parallel if atomic replace is disabled?
> > We only need to install the additional new livepatch. Parallel
> > installation is only necessary at boot time.
>
> This is misunderstanding. By "installed" livepatches in parallel I
> mean "enabled" livepatches in parallel, aka, without atomic replace.
>
> If you have problems with atomic replace, you might stop using it.
> Honestly, I do not see that big advantage in the hybrid model
> over the non-atomic-replace model.

I believe the ability to selectively replace a specific livepatch is a
significant advantage over both the non-atomic-replace and
atomic-replace models.

>
> That said, I think that the hybrid mode will not prevent the
> hardlockup warning. It seems that you have reproduced the hardlockup
> even with a relatively simple livepatch, see
> https://lore.kernel.org/all/CALOAHbBZc6ORGzXwBRwe+rD2=YGf1jub5TEr989_GpK54P2o1A@mail.gmail.com/
>
> IMHO, we should rather detect and break the stall in
> klp_try_complete_transition(). I mean to go the way explored in
> the thread
> https://lore.kernel.org/all/20250122085146.41553-1-laoar.shao@gmail.com/

That’s a separate issue, which we can discuss it seperately.

--
Regards
Yafang
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Yafang Shao 12 months ago
On Sat, Feb 8, 2025 at 10:49 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Fri, Feb 7, 2025 at 7:01 PM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Wed 2025-02-05 14:16:42, Yafang Shao wrote:
> > > On Tue, Feb 4, 2025 at 9:05 PM Petr Mladek <pmladek@suse.com> wrote:
> > > >
> > > > On Mon 2025-02-03 17:44:52, Yafang Shao wrote:
> > > > > On Fri, Jan 31, 2025 at 9:18 PM Miroslav Benes <mbenes@suse.cz> wrote:
> > > > > >
> > > > > > > >
> > > > > > > >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> > > > > > >
> > > > > > > The script:
> > > > > > >
> > > > > > > #!/bin/bash
> > > > > > > while true; do
> > > > > > >         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
> > > > > > >         ./apply_livepatch_61.sh # it will sleep 5s
> > > > > > >         yum erase -y kernel-livepatch-6.1.12-0.x86_64
> > > > > > >         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
> > > > > > >         ./apply_livepatch_61.sh  # it will sleep 5s
> > > > > > > done
> > > > > >
> > > > > > A live patch application is a slowpath. It is expected not to run
> > > > > > frequently (in a relative sense).
> > > > >
> > > > > The frequency isn’t the main concern here; _scalability_ is the key issue.
> > > > > Running livepatches once per day (a relatively low frequency) across all of our
> > > > > production servers (hundreds of thousands) isn’t feasible. Instead, we need to
> > > > > periodically run tests on a subset of test servers.
> > > >
> > > > I am confused. The original problem was a system crash when
> > > > livepatching do_exit() function, see
> > > > https://lore.kernel.org/r/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com
> > >
> > > Why do you view this patchset as a solution to the original problem?
> >
> > You discovered the hardlockup warnings when trying to reproduce the
> > original problem. At least, you mentioned this at
> > https://lore.kernel.org/r/20250122085146.41553-1-laoar.shao@gmail.com
> >
> > And using the hybrid module would allow to livepatch do_exit() only
> > once and do not touch it any longer. It is not the right solution
> > but it would be a workaround.
> >
> >
> > > > The rcu watchdog warning was first mentioned in this patchset.
> > > > Do you see rcu watchdog warning in production or just
> > > > with this artificial test, please?
> > >
> > > So, we shouldn’t run any artificial tests on the livepatch, correct?
> > > What exactly is the issue with these test cases?
> >
> > Some tests might be too artificial. They might find problems which
> > do not exist in practice.
> >
> > Disclaimer: I do not say that this is the case. You actually prove
> >         later in this mail that the hardlockup warning happen
> >         even in production.
> >
> > Anyway, if an artificial test finds a potential problem and the fix is
> > complicated then we need to decide if it is worth it.
> >
> > It does not make sense to complicate the code too much when
> > the fixed problem does not happen in practice.
> >
> >   + Too complicated code might introduce regressions which are
> >     more serious than the original problem.
> >
> >   + Too complicated code increases the maintenance cost. It is
> >     more complicated to add new features or fix bugs.
> >
> >
> > > > > > If you stress it like this, it is quite
> > > > > > expected that it will have an impact. Especially on a large busy system.
> > > > >
> > > > > It seems you agree that the current atomic-replace process lacks scalability.
> > > > > When deploying a livepatch across a large fleet of servers, it’s impossible to
> > > > > ensure that the servers are idle, as their workloads are constantly varying and
> > > > > are not deterministic.
> > > >
> > > > Do you see the scalability problem in production, please?
> > >
> > > Yes, the livepatch transition was stalled.
> >
> > Good to know.
> >
> > >
> > > > And could you prove that it was caused by livepatching, please?
> > >
> > > When the livepatch transition is stalled, running `kpatch list` will
> > > display the stalled information.
> >
> > OK.
> >
> > > > > The challenges are very different when managing 1K servers versus 1M servers.
> > > > > Similarly, the issues differ significantly between patching a single
> > > > > function and
> > > > > patching 100 functions, especially when some of those functions are critical.
> > > > > That’s what scalability is all about.
> > > > >
> > > > > Since we transitioned from the old livepatch mode to the new
> > > > > atomic-replace mode,
> > > >
> > > > What do you mean with the old livepatch mode, please?
> > >
> > > $ kpatch-build -R
> >
> > I am not familiar with kpatch-build. OK, I see the following at
> > https://github.com/dynup/kpatch/blob/master/kpatch-build/kpatch-build
> >
> > echo "          -R, --non-replace       Disable replace patch (replace is on by default)" >&2
> >
> > > >
> > > > Did you allow to install more livepatches in parallel?
> > >
> > > No.
> >
> > I guess that there is a misunderstanding. I am sorry my wording was
> > not clear enough.
> >
> > By "installing" more livepatches in parallel I meant to "have enabled"
> > more livepatches in parallel. It is possible only when you do not
> > use the atomic replace.
> >
> > By other words, if you use "kpatch-build -R" then you could have
> > enabled more livepatches in parallel.
> >
> >
> > > > What was the motivation to switch to the atomic replace, please?
> > >
> > > This is the default behavior of kpatch [1] after upgrading to a new version.
> > >
> > > [1].  https://github.com/dynup/kpatch/tree/master
> >
> > OK. I wonder if the atomic replace simplified some actions for you.
>
> Actually, it simplifies the livepatch deployment since it only
> involves a single livepatch.
>
> >
> > I ask because the proposed "hybrid" model is very similar to the "old"
> > model which did not use the atomic replace.
>
> It’s essentially a hybrid of the old model and the atomic replace model.
>
> >
> > What are the advantages of the "hybrid" model over the "old" model, please?
>
> - Advantages compared to the old model
>   In the old model, it’s not possible to replace a specific livepatch
> with a new one. In the hybrid model, however, you can replace specific
> livepatches as needed.
>
> - Advantages and disadvantages compared to the atomic replace model
>    - Advantage
>      In the atomic replace model, you must replace all old
> livepatches, regardless of whether they are relevant to the new
> livepatch. In the hybrid model, you only need to replace the relevant
> livepatches.
>
>    - Disadvantage
>      In the atomic replace model, you only need to maintain a single
> livepatch. However, in the hybrid model, you need to manage multiple
> livepatches.
>
> >
> >
> > > > > our SREs have consistently reported that one or more servers become
> > > > > stalled during
> > > > > the upgrade (replacement).
> > > >
> > > > What is SRE, please?
> > >
> > > >From the wikipedia : https://en.wikipedia.org/wiki/Site_reliability_engineering
> >
> > Good to know.
> >
> > > > Could you please show some log from a production system?
> > >
> > > When the SREs initially reported that the livepatch transition was
> > > stalled, I simply advised them to try again. However, after
> > > experiencing these crashes, I dug deeper into the livepatch code and
> > > realized that scalability is a concern. As a result, periodically
> > > replacing an old livepatch triggers RCU warnings on our production
> > > servers.
> > >
> > > [Wed Feb  5 10:56:10 2025] livepatch: enabling patch 'livepatch_61_release6'
> > > [Wed Feb  5 10:56:10 2025] livepatch: 'livepatch_61_release6':
> > > starting patching transition
> > > [Wed Feb  5 10:56:24 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> > > 1126113 is 10078 jiffies old.
> > > [Wed Feb  5 10:56:38 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> > > 1126117 is 10199 jiffies old.
> > > [Wed Feb  5 10:56:52 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> > > 1126121 is 10047 jiffies old.
> > > [Wed Feb  5 10:56:57 2025] livepatch: 'livepatch_61_release6': patching complete
> >
> > I guess that this happens primary because there are many processes
> > running in kernel code:
> >
> >        + many processes => long task list
> >        + in kernel code => need to check stack
> >
> > I wondered how much it is caused by livepatching do_exit() which
> > takes tasklist_lock. The idea was:
>
> I’ll drop the changes in do_exit() and run the test again.

The RCU warning is still triggered even when the do_exit() is not
livepatched, but the frequency of occurrence decreases slightly.

However, with the following change, the RCU warning ceases to appear,
even when do_exit() is present and the test is run at a high frequency
(once every 5 seconds).

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 30187b1..c436aa6 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -400,11 +400,22 @@ void klp_try_complete_transition(void)
         * unless the patch includes changes to a very common function.
         */
        read_lock(&tasklist_lock);
-       for_each_process_thread(g, task)
+       for_each_process_thread(g, task) {
+               if (task->patch_state == klp_target_state)
+                       continue;
                if (!klp_try_switch_task(task))
                        complete = false;
+
+               if (need_resched()) {
+                       complete = false;
+                       break;
+               }
+       }
        read_unlock(&tasklist_lock);

+       /* The above operation might be expensive. */
+       cond_resched();
+
        /*
         * Ditto for the idle "swapper" tasks.
         */

-- 
Regards
Yafang
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Song Liu 1 year ago
On Mon, Feb 3, 2025 at 1:45 AM Yafang Shao <laoar.shao@gmail.com> wrote:
[...]
>
> If you’re managing a large fleet of servers, this issue is far from negligible.
>
> >
> > > Can you provide examples of companies that use atomic replacement at
> > > scale in their production environments?
> >
> > At least SUSE uses it as a solution for its customers. No many problems
> > have been reported since we started ~10 years ago.

We (Meta) always use atomic replacement for our live patches.

>
> Perhaps we’re running different workloads.
> Going back to the original purpose of livepatching: is it designed to address
> security vulnerabilities, or to deploy new features?
> If it’s the latter, then there’s definitely a lot of room for improvement.

We only use KLP to fix bugs and security vulnerabilities. We do not use
live patches to deploy new features.

Thanks,
Song
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Yafang Shao 1 year ago
On Tue, Feb 4, 2025 at 5:53 AM Song Liu <song@kernel.org> wrote:
>
> On Mon, Feb 3, 2025 at 1:45 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> [...]
> >
> > If you’re managing a large fleet of servers, this issue is far from negligible.
> >
> > >
> > > > Can you provide examples of companies that use atomic replacement at
> > > > scale in their production environments?
> > >
> > > At least SUSE uses it as a solution for its customers. No many problems
> > > have been reported since we started ~10 years ago.
>
> We (Meta) always use atomic replacement for our live patches.
>
> >
> > Perhaps we’re running different workloads.
> > Going back to the original purpose of livepatching: is it designed to address
> > security vulnerabilities, or to deploy new features?
> > If it’s the latter, then there’s definitely a lot of room for improvement.
>
> We only use KLP to fix bugs and security vulnerabilities. We do not use
> live patches to deploy new features.

+BPF

Hello Song,

Since bpf_fexit also uses trampolines, I was curious about what would
happen if I attached do_exit() to fexit. Unfortunately, it triggers a
bug in BPF as well. The BPF program is as follows:

SEC("fexit/do_exit")
int fexit_do_exit
{
    return 0;
}

After the fexit program exits, the trampoline is still left over:

$ bpftool  link show  <<<< nothing output
$ grep "bpf_trampoline_[0-9]" /proc/kallsyms
ffffffffc04cb000 t bpf_trampoline_6442526459    [bpf]

We could either add functions annotated as "__noreturn" to the deny
list for fexit as follows, or we could explore a more generic
solution, such as embedding the "noreturn" information into the BTF
and extracting it when attaching fexit.

Any thoughts?

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d77abb87ffb1..37192888473c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -22742,6 +22742,13 @@ BTF_ID(func, __rcu_read_unlock)
 #endif
 BTF_SET_END(btf_id_deny)

+BTF_SET_START(fexit_deny)
+BTF_ID_UNUSED
+/* do_exit never returns */
+/* TODO: Add other functions annotated with __noreturn or figure out
a generic solution */
+BTF_ID(func, do_exit)
+BTF_SET_END(fexit_deny)
+
 static bool can_be_sleepable(struct bpf_prog *prog)
 {
        if (prog->type == BPF_PROG_TYPE_TRACING) {
@@ -22830,6 +22837,9 @@ static int check_attach_btf_id(struct
bpf_verifier_env *env)
        } else if (prog->type == BPF_PROG_TYPE_TRACING &&
                   btf_id_set_contains(&btf_id_deny, btf_id)) {
                return -EINVAL;
+       } else if (prog->expected_attach_type == BPF_TRACE_FEXIT &&
+                  btf_id_set_contains(&fexit_deny, btf_id)) {
+               return -EINVAL;
        }

        key = bpf_trampoline_compute_key(tgt_prog,
prog->aux->attach_btf, btf_id);


--
Regards
Yafang
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Song Liu 1 year ago
On Wed, Feb 5, 2025 at 6:43 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Tue, Feb 4, 2025 at 5:53 AM Song Liu <song@kernel.org> wrote:
> >
> > On Mon, Feb 3, 2025 at 1:45 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > [...]
> > >
> > > If you’re managing a large fleet of servers, this issue is far from negligible.
> > >
> > > >
> > > > > Can you provide examples of companies that use atomic replacement at
> > > > > scale in their production environments?
> > > >
> > > > At least SUSE uses it as a solution for its customers. No many problems
> > > > have been reported since we started ~10 years ago.
> >
> > We (Meta) always use atomic replacement for our live patches.
> >
> > >
> > > Perhaps we’re running different workloads.
> > > Going back to the original purpose of livepatching: is it designed to address
> > > security vulnerabilities, or to deploy new features?
> > > If it’s the latter, then there’s definitely a lot of room for improvement.
> >
> > We only use KLP to fix bugs and security vulnerabilities. We do not use
> > live patches to deploy new features.
>
> +BPF
>
> Hello Song,
>
> Since bpf_fexit also uses trampolines, I was curious about what would
> happen if I attached do_exit() to fexit. Unfortunately, it triggers a
> bug in BPF as well. The BPF program is as follows:
>
> SEC("fexit/do_exit")
> int fexit_do_exit
> {
>     return 0;
> }
>
> After the fexit program exits, the trampoline is still left over:
>
> $ bpftool  link show  <<<< nothing output
> $ grep "bpf_trampoline_[0-9]" /proc/kallsyms
> ffffffffc04cb000 t bpf_trampoline_6442526459    [bpf]

I think we should first understand why the trampoline is not
freed.

> We could either add functions annotated as "__noreturn" to the deny
> list for fexit as follows, or we could explore a more generic
> solution, such as embedding the "noreturn" information into the BTF
> and extracting it when attaching fexit.

I personally don't think this is really necessary. It is good to have.
But a reasonable user should not expect noreturn function to
generate fexit events.

Thanks,
Song

> Any thoughts?
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d77abb87ffb1..37192888473c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -22742,6 +22742,13 @@ BTF_ID(func, __rcu_read_unlock)
>  #endif
>  BTF_SET_END(btf_id_deny)
>
> +BTF_SET_START(fexit_deny)
> +BTF_ID_UNUSED
> +/* do_exit never returns */
> +/* TODO: Add other functions annotated with __noreturn or figure out
> a generic solution */
> +BTF_ID(func, do_exit)
> +BTF_SET_END(fexit_deny)
> +
>  static bool can_be_sleepable(struct bpf_prog *prog)
>  {
>         if (prog->type == BPF_PROG_TYPE_TRACING) {
> @@ -22830,6 +22837,9 @@ static int check_attach_btf_id(struct
> bpf_verifier_env *env)
>         } else if (prog->type == BPF_PROG_TYPE_TRACING &&
>                    btf_id_set_contains(&btf_id_deny, btf_id)) {
>                 return -EINVAL;
> +       } else if (prog->expected_attach_type == BPF_TRACE_FEXIT &&
> +                  btf_id_set_contains(&fexit_deny, btf_id)) {
> +               return -EINVAL;
>         }
>
>         key = bpf_trampoline_compute_key(tgt_prog,
> prog->aux->attach_btf, btf_id);
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Yafang Shao 1 year ago
On Thu, Feb 6, 2025 at 1:59 AM Song Liu <song@kernel.org> wrote:
>
> On Wed, Feb 5, 2025 at 6:43 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Tue, Feb 4, 2025 at 5:53 AM Song Liu <song@kernel.org> wrote:
> > >
> > > On Mon, Feb 3, 2025 at 1:45 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > [...]
> > > >
> > > > If you’re managing a large fleet of servers, this issue is far from negligible.
> > > >
> > > > >
> > > > > > Can you provide examples of companies that use atomic replacement at
> > > > > > scale in their production environments?
> > > > >
> > > > > At least SUSE uses it as a solution for its customers. No many problems
> > > > > have been reported since we started ~10 years ago.
> > >
> > > We (Meta) always use atomic replacement for our live patches.
> > >
> > > >
> > > > Perhaps we’re running different workloads.
> > > > Going back to the original purpose of livepatching: is it designed to address
> > > > security vulnerabilities, or to deploy new features?
> > > > If it’s the latter, then there’s definitely a lot of room for improvement.
> > >
> > > We only use KLP to fix bugs and security vulnerabilities. We do not use
> > > live patches to deploy new features.
> >
> > +BPF
> >
> > Hello Song,
> >
> > Since bpf_fexit also uses trampolines, I was curious about what would
> > happen if I attached do_exit() to fexit. Unfortunately, it triggers a
> > bug in BPF as well. The BPF program is as follows:
> >
> > SEC("fexit/do_exit")
> > int fexit_do_exit
> > {
> >     return 0;
> > }
> >
> > After the fexit program exits, the trampoline is still left over:
> >
> > $ bpftool  link show  <<<< nothing output
> > $ grep "bpf_trampoline_[0-9]" /proc/kallsyms
> > ffffffffc04cb000 t bpf_trampoline_6442526459    [bpf]
>
> I think we should first understand why the trampoline is not
> freed.

IIUC, the fexit works as follows,

  bpf_trampoline
    + __bpf_tramp_enter
       + percpu_ref_get(&tr->pcref);

    + call do_exit()

    + __bpf_tramp_exit
       + percpu_ref_put(&tr->pcref);

Since do_exit() never returns, the refcnt of the trampoline image is
never decremented, preventing it from being freed.

>
> > We could either add functions annotated as "__noreturn" to the deny
> > list for fexit as follows, or we could explore a more generic
> > solution, such as embedding the "noreturn" information into the BTF
> > and extracting it when attaching fexit.
>
> I personally don't think this is really necessary. It is good to have.
> But a reasonable user should not expect noreturn function to
> generate fexit events.

If we don't plan to fix it, we should clearly document it to guide
users and advise them against using it.

--
Regards
Yafang
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Song Liu 1 year ago
On Wed, Feb 5, 2025 at 6:55 PM Yafang Shao <laoar.shao@gmail.com> wrote:
[...]
> > I think we should first understand why the trampoline is not
> > freed.
>
> IIUC, the fexit works as follows,
>
>   bpf_trampoline
>     + __bpf_tramp_enter
>        + percpu_ref_get(&tr->pcref);
>
>     + call do_exit()
>
>     + __bpf_tramp_exit
>        + percpu_ref_put(&tr->pcref);
>
> Since do_exit() never returns, the refcnt of the trampoline image is
> never decremented, preventing it from being freed.

Thanks for the explanation. In this case, I think it makes sense to
disallow attaching fexit programs on __noreturn functions. I am not
sure what is the best solution for it though.

Thanks,
Song


> >
> > > We could either add functions annotated as "__noreturn" to the deny
> > > list for fexit as follows, or we could explore a more generic
> > > solution, such as embedding the "noreturn" information into the BTF
> > > and extracting it when attaching fexit.
> >
> > I personally don't think this is really necessary. It is good to have.
> > But a reasonable user should not expect noreturn function to
> > generate fexit events.
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Yafang Shao 12 months ago
On Fri, Feb 7, 2025 at 2:01 AM Song Liu <song@kernel.org> wrote:
>
> On Wed, Feb 5, 2025 at 6:55 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> [...]
> > > I think we should first understand why the trampoline is not
> > > freed.
> >
> > IIUC, the fexit works as follows,
> >
> >   bpf_trampoline
> >     + __bpf_tramp_enter
> >        + percpu_ref_get(&tr->pcref);
> >
> >     + call do_exit()
> >
> >     + __bpf_tramp_exit
> >        + percpu_ref_put(&tr->pcref);
> >
> > Since do_exit() never returns, the refcnt of the trampoline image is
> > never decremented, preventing it from being freed.
>
> Thanks for the explanation. In this case, I think it makes sense to
> disallow attaching fexit programs on __noreturn functions. I am not
> sure what is the best solution for it though.

There is a tools/objtool/noreturns.h. Perhaps we could create a
similar noreturns.h under kernel/bpf and add all relevant functions to
the fexit deny list.

--
Regards
Yafang
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Alexei Starovoitov 12 months ago
On Fri, Feb 7, 2025 at 10:42 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Fri, Feb 7, 2025 at 2:01 AM Song Liu <song@kernel.org> wrote:
> >
> > On Wed, Feb 5, 2025 at 6:55 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > [...]
> > > > I think we should first understand why the trampoline is not
> > > > freed.
> > >
> > > IIUC, the fexit works as follows,
> > >
> > >   bpf_trampoline
> > >     + __bpf_tramp_enter
> > >        + percpu_ref_get(&tr->pcref);
> > >
> > >     + call do_exit()
> > >
> > >     + __bpf_tramp_exit
> > >        + percpu_ref_put(&tr->pcref);
> > >
> > > Since do_exit() never returns, the refcnt of the trampoline image is
> > > never decremented, preventing it from being freed.
> >
> > Thanks for the explanation. In this case, I think it makes sense to
> > disallow attaching fexit programs on __noreturn functions. I am not
> > sure what is the best solution for it though.
>
> There is a tools/objtool/noreturns.h. Perhaps we could create a
> similar noreturns.h under kernel/bpf and add all relevant functions to
> the fexit deny list.

Pls avoid copy paste if possible.
Something like:

BTF_SET_START(fexit_deny)
#define NORETURN(fn) BTF_ID(func, fn)
#include "../../tools/objtool/noreturns.h"

Should work?

Josh,
maybe we should move noreturns.h to some common location?
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Josh Poimboeuf 12 months ago
On Sat, Feb 08, 2025 at 07:47:12AM -0800, Alexei Starovoitov wrote:
> On Fri, Feb 7, 2025 at 10:42 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Fri, Feb 7, 2025 at 2:01 AM Song Liu <song@kernel.org> wrote:
> > >
> > > On Wed, Feb 5, 2025 at 6:55 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > [...]
> > > > > I think we should first understand why the trampoline is not
> > > > > freed.
> > > >
> > > > IIUC, the fexit works as follows,
> > > >
> > > >   bpf_trampoline
> > > >     + __bpf_tramp_enter
> > > >        + percpu_ref_get(&tr->pcref);
> > > >
> > > >     + call do_exit()
> > > >
> > > >     + __bpf_tramp_exit
> > > >        + percpu_ref_put(&tr->pcref);
> > > >
> > > > Since do_exit() never returns, the refcnt of the trampoline image is
> > > > never decremented, preventing it from being freed.
> > >
> > > Thanks for the explanation. In this case, I think it makes sense to
> > > disallow attaching fexit programs on __noreturn functions. I am not
> > > sure what is the best solution for it though.
> >
> > There is a tools/objtool/noreturns.h. Perhaps we could create a
> > similar noreturns.h under kernel/bpf and add all relevant functions to
> > the fexit deny list.
> 
> Pls avoid copy paste if possible.
> Something like:
> 
> BTF_SET_START(fexit_deny)
> #define NORETURN(fn) BTF_ID(func, fn)
> #include "../../tools/objtool/noreturns.h"
> 
> Should work?
> 
> Josh,
> maybe we should move noreturns.h to some common location?

The tools code is meant to be independent from the kernel, but it could
be synced by copying it to both include/linux and tools/include/linux,
and then make sure it stays in sync with tools/objtool/sync-check.sh.

However, noreturns.h is manually edited, and only for some arches.  And
even for those arches it's likely not exhaustive: we only add to it when
we notice an objtool warning, and not all calls to noreturns will
necessarily trigger a warning.  So I'd be careful about relying on that.

Also that file is intended to be temporary, there have been proposals to
add compiler support for annotating noreturns.  That hasn't been
implemented yet, help wanted!

I think the noreturn info is available in DWARF, can that be converted
to BTF?

Or is there some way to release outstanding trampolines in do_exit()?

-- 
Josh
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Alexei Starovoitov 12 months ago
On Sat, Feb 8, 2025 at 11:32 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Sat, Feb 08, 2025 at 07:47:12AM -0800, Alexei Starovoitov wrote:
> > On Fri, Feb 7, 2025 at 10:42 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Fri, Feb 7, 2025 at 2:01 AM Song Liu <song@kernel.org> wrote:
> > > >
> > > > On Wed, Feb 5, 2025 at 6:55 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > [...]
> > > > > > I think we should first understand why the trampoline is not
> > > > > > freed.
> > > > >
> > > > > IIUC, the fexit works as follows,
> > > > >
> > > > >   bpf_trampoline
> > > > >     + __bpf_tramp_enter
> > > > >        + percpu_ref_get(&tr->pcref);
> > > > >
> > > > >     + call do_exit()
> > > > >
> > > > >     + __bpf_tramp_exit
> > > > >        + percpu_ref_put(&tr->pcref);
> > > > >
> > > > > Since do_exit() never returns, the refcnt of the trampoline image is
> > > > > never decremented, preventing it from being freed.
> > > >
> > > > Thanks for the explanation. In this case, I think it makes sense to
> > > > disallow attaching fexit programs on __noreturn functions. I am not
> > > > sure what is the best solution for it though.
> > >
> > > There is a tools/objtool/noreturns.h. Perhaps we could create a
> > > similar noreturns.h under kernel/bpf and add all relevant functions to
> > > the fexit deny list.
> >
> > Pls avoid copy paste if possible.
> > Something like:
> >
> > BTF_SET_START(fexit_deny)
> > #define NORETURN(fn) BTF_ID(func, fn)
> > #include "../../tools/objtool/noreturns.h"
> >
> > Should work?
> >
> > Josh,
> > maybe we should move noreturns.h to some common location?
>
> The tools code is meant to be independent from the kernel, but it could
> be synced by copying it to both include/linux and tools/include/linux,
> and then make sure it stays in sync with tools/objtool/sync-check.sh.
>
> However, noreturns.h is manually edited, and only for some arches.  And
> even for those arches it's likely not exhaustive: we only add to it when
> we notice an objtool warning, and not all calls to noreturns will
> necessarily trigger a warning.  So I'd be careful about relying on that.
>
> Also that file is intended to be temporary, there have been proposals to
> add compiler support for annotating noreturns.  That hasn't been
> implemented yet, help wanted!
>
> I think the noreturn info is available in DWARF, can that be converted
> to BTF?

There are 30k+ noreturn funcs in dwarf. So pahole would need to have
some heuristic to filter out the noise.
It's doable, but we need to stop the bleeding the simplest way
and the fix would need to be backported too.
We can copy paste noreturns.h or #include it from the current location
for now and think of better ways for -next.

> Or is there some way to release outstanding trampolines in do_exit()?

we can walk all trampolines in do_exit,
but we'd still need to:
if (trampoline->func.addr == do_exit || ...addr == __x64_sys_exit ||
before dropping the refcnt.
which is the same thing, but worse.
Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
Posted by Yafang Shao 12 months ago
On Sun, Feb 9, 2025 at 11:57 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Feb 8, 2025 at 11:32 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > On Sat, Feb 08, 2025 at 07:47:12AM -0800, Alexei Starovoitov wrote:
> > > On Fri, Feb 7, 2025 at 10:42 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > On Fri, Feb 7, 2025 at 2:01 AM Song Liu <song@kernel.org> wrote:
> > > > >
> > > > > On Wed, Feb 5, 2025 at 6:55 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > [...]
> > > > > > > I think we should first understand why the trampoline is not
> > > > > > > freed.
> > > > > >
> > > > > > IIUC, the fexit works as follows,
> > > > > >
> > > > > >   bpf_trampoline
> > > > > >     + __bpf_tramp_enter
> > > > > >        + percpu_ref_get(&tr->pcref);
> > > > > >
> > > > > >     + call do_exit()
> > > > > >
> > > > > >     + __bpf_tramp_exit
> > > > > >        + percpu_ref_put(&tr->pcref);
> > > > > >
> > > > > > Since do_exit() never returns, the refcnt of the trampoline image is
> > > > > > never decremented, preventing it from being freed.
> > > > >
> > > > > Thanks for the explanation. In this case, I think it makes sense to
> > > > > disallow attaching fexit programs on __noreturn functions. I am not
> > > > > sure what is the best solution for it though.
> > > >
> > > > There is a tools/objtool/noreturns.h. Perhaps we could create a
> > > > similar noreturns.h under kernel/bpf and add all relevant functions to
> > > > the fexit deny list.
> > >
> > > Pls avoid copy paste if possible.
> > > Something like:
> > >
> > > BTF_SET_START(fexit_deny)
> > > #define NORETURN(fn) BTF_ID(func, fn)
> > > #include "../../tools/objtool/noreturns.h"
> > >
> > > Should work?
> > >
> > > Josh,
> > > maybe we should move noreturns.h to some common location?
> >
> > The tools code is meant to be independent from the kernel, but it could
> > be synced by copying it to both include/linux and tools/include/linux,
> > and then make sure it stays in sync with tools/objtool/sync-check.sh.
> >
> > However, noreturns.h is manually edited, and only for some arches.  And
> > even for those arches it's likely not exhaustive: we only add to it when
> > we notice an objtool warning, and not all calls to noreturns will
> > necessarily trigger a warning.  So I'd be careful about relying on that.
> >
> > Also that file is intended to be temporary, there have been proposals to
> > add compiler support for annotating noreturns.  That hasn't been
> > implemented yet, help wanted!
> >
> > I think the noreturn info is available in DWARF, can that be converted
> > to BTF?
>
> There are 30k+ noreturn funcs in dwarf. So pahole would need to have
> some heuristic to filter out the noise.
> It's doable, but we need to stop the bleeding the simplest way
> and the fix would need to be backported too.
> We can copy paste noreturns.h or #include it from the current location
> for now and think of better ways for -next.

It seems like the simplest way at present.
I will send a patch.


--
Regards
Yafang