Nginx performance testing with Apparmor enabled (with Nginx running in
unconfined profile), on kernel versions 6.1 and 6.5 show significant
drop in throughput scalability when Nginx workers are scaled to use
higher number of CPUs across various L3 cache domains.
Below is one sample data on the throughput scalability loss, based on
results on AMD Zen4 system with 96 CPUs with SMT core count 2:
Config Cache Domains apparmor=off apparmor=on
scaling eff (%) scaling eff (%)
8C16T 1 100% 100%
16C32T 2 95% 94%
24C48T 3 94% 93%
48C96T 6 92% 88%
96C192T 12 85% 68%
There is a significant drop in scaling efficiency for 96 cores/192 SMT
threads.
Perf tool shows most of the contention coming from below places:
6.56% nginx [kernel.vmlinux] [k] apparmor_current_getsecid_subj
6.22% nginx [kernel.vmlinux] [k] apparmor_file_open
The majority of the CPU cycles is found to be due to memory contention
in atomic_fetch_add and atomic_fetch_sub operations from kref_get() and
kref_put() operations on AppArmor labels.
A part of the contention was fixed with commit 2516fde1fa00 ("apparmor:
Optimize retrieving current task secid"). After including this commit, the
scaling efficiency improved to below:
Config Cache Domains apparmor=on apparmor=on (patched)
scaling eff (%) scaling eff (%)
8C16T 1 100% 100%
16C32T 2 97% 93%
24C48T 3 94% 92%
48C96T 6 88% 88%
96C192T 12 65% 79%
However, the scaling efficiency impact is still significant even after
including the commit. Also, the performance impact is even higher for
>192 CPUs. In addition, the memory contention impact would increase
when there is a high frequency of label update operations and labels
are marked stale more frequently.
Use the new percpu managed mode for tracking release of all Apparmor
labels. Using percpu refcount for Apparmor label's refcounting improves
throughput scalability for Nginx:
Config Cache Domains apparmor=on (percpuref)
scaling eff (%)
8C16T 1 100%
16C32T 2 96%
24C48T 3 94%
48C96T 6 93%
96C192T 12 90%
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
The apparmor_file_open() refcount contention has been resolved recently
with commit f4fee216df7d ("apparmor: try to avoid refing the label in
apparmor_file_open"). I have posted this series to get feedback on the
approach to improve refcount scalability within apparmor subsystem.
security/apparmor/label.c | 1 +
security/apparmor/policy_ns.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/security/apparmor/label.c b/security/apparmor/label.c
index aa9e6eac3ecc..016a45a180b1 100644
--- a/security/apparmor/label.c
+++ b/security/apparmor/label.c
@@ -710,6 +710,7 @@ static struct aa_label *__label_insert(struct aa_labelset *ls,
rb_link_node(&label->node, parent, new);
rb_insert_color(&label->node, &ls->root);
label->flags |= FLAG_IN_TREE;
+ percpu_ref_switch_to_managed(&label->count);
return aa_get_label(label);
}
diff --git a/security/apparmor/policy_ns.c b/security/apparmor/policy_ns.c
index 1f02cfe1d974..18eb58b68a60 100644
--- a/security/apparmor/policy_ns.c
+++ b/security/apparmor/policy_ns.c
@@ -124,6 +124,7 @@ static struct aa_ns *alloc_ns(const char *prefix, const char *name)
goto fail_unconfined;
/* ns and ns->unconfined share ns->unconfined refcount */
ns->unconfined->ns = ns;
+ percpu_ref_switch_to_managed(&ns->unconfined->label.count);
atomic_set(&ns->uniq_null, 0);
@@ -377,6 +378,7 @@ int __init aa_alloc_root_ns(void)
}
kernel_t = &kernel_p->label;
root_ns->unconfined->ns = aa_get_ns(root_ns);
+ percpu_ref_switch_to_managed(&root_ns->unconfined->label.count);
return 0;
}
--
2.34.1
Hello, kernel test robot noticed "WARNING:at_lib/percpu-refcount.c:#__percpu_ref_switch_to_managed" on: commit: 59b177cbdc4908a728329b8eec742969a2285979 ("[RFC 6/6] apparmor: Switch labels to percpu ref managed mode") url: https://github.com/intel-lab-lkp/linux/commits/Neeraj-Upadhyay/percpu-refcount-Add-managed-mode-for-RCU-released-objects/20240916-131210 base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-nonmm-unstable patch link: https://lore.kernel.org/all/20240916050811.473556-7-Neeraj.Upadhyay@amd.com/ patch subject: [RFC 6/6] apparmor: Switch labels to percpu ref managed mode in testcase: boot compiler: gcc-12 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G (please refer to attached dmesg/kmsg for entire log/backtrace) +------------------------------------------------------------------+------------+------------+ | | 124f137c55 | 59b177cbdc | +------------------------------------------------------------------+------------+------------+ | WARNING:at_lib/percpu-refcount.c:#__percpu_ref_switch_to_managed | 0 | 13 | | RIP:__percpu_ref_switch_to_managed | 0 | 13 | +------------------------------------------------------------------+------------+------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202409181358.c07681be-oliver.sang@intel.com [ 13.041399][ T0] ------------[ cut here ]------------ [ 13.042302][ T0] Percpu ref is already managed [ 13.042302][ T0] WARNING: CPU: 0 PID: 0 at lib/percpu-refcount.c:151 __percpu_ref_switch_to_managed (lib/percpu-refcount.c:151 (discriminator 3)) [ 13.042302][ T0] Modules linked in: [ 13.042302][ T0] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.11.0-rc6-00143-g59b177cbdc49 #12 [ 13.042302][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 13.042302][ T0] RIP: 0010:__percpu_ref_switch_to_managed (lib/percpu-refcount.c:151 (discriminator 3)) [ 13.042302][ T0] Code: 03 75 58 65 48 ff 08 e8 c1 7c ee fe eb b1 80 3d 5d a2 ca 03 00 75 c5 48 c7 c7 c0 61 8f 91 c6 05 4d a2 ca 03 01 e8 03 af ce fe <0f> 0b eb ae 31 f6 48 89 df e8 05 f7 ff ff e9 54 fe ff ff e8 4b aa All code ======== 0: 03 75 58 add 0x58(%rbp),%esi 3: 65 48 ff 08 decq %gs:(%rax) 7: e8 c1 7c ee fe callq 0xfffffffffeee7ccd c: eb b1 jmp 0xffffffffffffffbf e: 80 3d 5d a2 ca 03 00 cmpb $0x0,0x3caa25d(%rip) # 0x3caa272 15: 75 c5 jne 0xffffffffffffffdc 17: 48 c7 c7 c0 61 8f 91 mov $0xffffffff918f61c0,%rdi 1e: c6 05 4d a2 ca 03 01 movb $0x1,0x3caa24d(%rip) # 0x3caa272 25: e8 03 af ce fe callq 0xfffffffffeceaf2d 2a:* 0f 0b ud2 <-- trapping instruction 2c: eb ae jmp 0xffffffffffffffdc 2e: 31 f6 xor %esi,%esi 30: 48 89 df mov %rbx,%rdi 33: e8 05 f7 ff ff callq 0xfffffffffffff73d 38: e9 54 fe ff ff jmpq 0xfffffffffffffe91 3d: e8 .byte 0xe8 3e: 4b aa rex.WXB stos %al,%es:(%rdi) Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: eb ae jmp 0xffffffffffffffb2 4: 31 f6 xor %esi,%esi 6: 48 89 df mov %rbx,%rdi 9: e8 05 f7 ff ff callq 0xfffffffffffff713 e: e9 54 fe ff ff jmpq 0xfffffffffffffe67 13: e8 .byte 0xe8 14: 4b aa rex.WXB stos %al,%es:(%rdi) [ 13.042302][ T0] RSP: 0000:ffffffff92407e00 EFLAGS: 00010082 [ 13.042302][ T0] RAX: 0000000000000000 RBX: ffff8881008a6500 RCX: 1ffffffff24a3780 [ 13.042302][ T0] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001 [ 13.042302][ T0] RBP: ffff8881008a6508 R08: 0000000000000000 R09: fffffbfff24a3780 [ 13.042302][ T0] R10: ffffffff9251bc03 R11: 0000000000000001 R12: ffff8881001f3500 [ 13.042302][ T0] R13: ffff8881001f3500 R14: 0000000000000005 R15: 00000000000147b0 [ 13.042302][ T0] FS: 0000000000000000(0000) GS:ffff8883a8400000(0000) knlGS:0000000000000000 [ 13.042302][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 13.042302][ T0] CR2: ffff88843ffff000 CR3: 00000003b6e62000 CR4: 00000000000006f0 [ 13.042302][ T0] Call Trace: [ 13.042302][ T0] <TASK> [ 13.042302][ T0] ? __warn (kernel/panic.c:741) [ 13.042302][ T0] ? __percpu_ref_switch_to_managed (lib/percpu-refcount.c:151 (discriminator 3)) [ 13.042302][ T0] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 13.042302][ T0] ? handle_bug (arch/x86/kernel/traps.c:239) [ 13.042302][ T0] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) [ 13.042302][ T0] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) [ 13.042302][ T0] ? __percpu_ref_switch_to_managed (lib/percpu-refcount.c:151 (discriminator 3)) [ 13.042302][ T0] percpu_ref_switch_to_managed (include/linux/spinlock.h:406 lib/percpu-refcount.c:182) [ 13.042302][ T0] aa_alloc_root_ns (security/apparmor/policy_ns.c:383) [ 13.042302][ T0] ? aa_setup_dfa_engine (security/apparmor/lsm.c:2194) [ 13.042302][ T0] apparmor_init (security/apparmor/lsm.c:2235) [ 13.042302][ T0] initialize_lsm (security/security.c:263 (discriminator 3)) [ 13.042302][ T0] ordered_lsm_init (security/security.c:422 (discriminator 3)) [ 13.042302][ T0] security_init (security/security.c:475) [ 13.042302][ T0] start_kernel (init/main.c:1085) [ 13.042302][ T0] x86_64_start_reservations (arch/x86/kernel/head64.c:495) [ 13.042302][ T0] x86_64_start_kernel (arch/x86/kernel/head64.c:437 (discriminator 17)) [ 13.042302][ T0] common_startup_64 (arch/x86/kernel/head_64.S:421) [ 13.042302][ T0] </TASK> [ 13.042302][ T0] ---[ end trace 0000000000000000 ]--- [ 13.044823][ T0] AppArmor: AppArmor initialized [ 13.062176][ T0] Mount-cache hash table entries: 32768 (order: 6, 262144 bytes, linear) [ 13.065846][ T0] Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes, linear) The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240918/202409181358.c07681be-oliver.sang@intel.com -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
© 2016 - 2024 Red Hat, Inc.