apparmor: cache buffers on percpu list if there is lock, contention

[PATCH v5 0/4] apparmor: cache buffers on percpu list if there is lock, contention

Posted by John Johansen 2 years, 3 months ago

On 10/5/23 21:18, Sergey Senozhatsky wrote:
> On (23/06/26 17:31), John Johansen wrote:
>> On 6/26/23 16:33, Anil Altinay wrote:
>>> Hi John,
>>>
>>> I was wondering if you get a chance to work on patch v4. Please let me know if you need help with testing.
>>>
>>
>> yeah, testing help is always much appreciated. I have a v4, and I am
>> working on 3 alternate version to compare against, to help give a better
>> sense if we can get away with simplifying or tweak the scaling.
>>
>> I should be able to post them out some time tonight.
> 
> Hi John,
> 
> Did you get a chance to post v4? I may be able to give it some testing
> on our real-life case.

sorry yes, how about a v5. That is simplified with 3 follow on patches
that aren't strictly necessary, but some combination of them might be
better than just the base patch, but splitting them out makes the
individual changes easier to review.

---

df323337e507 ("apparmor: Use a memory pool instead per-CPU caches")
changed buffer allocation to use a memory pool, however on a heavily
loaded machine there can be lock contention on the global buffers
lock. Add a percpu list to cache buffers on when lock contention is
encountered.

When allocating buffers attempt to use cached buffers first,
before taking the global buffers lock. When freeing buffers
try to put them back to the global list but if contention is
encountered, put the buffer on the percpu list.

The length of time a buffer is held on the percpu list is dynamically
adjusted based on lock contention.

v5:
- simplify base patch by removing: improvements can be added later
   - MAX_LOCAL and must lock
   - contention scaling.
v4:
- fix percpu ->count buffer count which had been spliced across a
   debug patch.
- introduce define for MAX_LOCAL_COUNT
- rework count check and locking around it.
- update commit message to reference commit that introduced the
   memory.
v3:
- limit number of buffers that can be pushed onto the percpu
   list. This avoids a problem on some kernels where one percpu
   list can inherit buffers from another cpu after a reschedule,
   causing more kernel memory to used than is necessary. Under
   normal conditions this should eventually return to normal
   but under pathelogical conditions the extra memory consumption
   may have been unbouanded
v2:
- dynamically adjust buffer hold time on percpu list based on
   lock contention.
v1:
- cache buffers on percpu list on lock contention

Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>

Re: [PATCH v5 0/4] apparmor: cache buffers on percpu list if there is lock, contention

Posted by Sergey Senozhatsky 2 years, 3 months ago

On (23/10/17 02:21), John Johansen wrote:
> > > yeah, testing help is always much appreciated. I have a v4, and I am
> > > working on 3 alternate version to compare against, to help give a better
> > > sense if we can get away with simplifying or tweak the scaling.
> > > 
> > > I should be able to post them out some time tonight.
> > 
> > Hi John,
> > 
> > Did you get a chance to post v4? I may be able to give it some testing
> > on our real-life case.
> 
> sorry yes, how about a v5. That is simplified with 3 follow on patches
> that aren't strictly necessary, but some combination of them might be
> better than just the base patch, but splitting them out makes the
> individual changes easier to review.

Sorry for late reply. So I gave it a try but, apparently, our build
environment has changed quite significantly since the last time I
looked into it.

I don't see that many aa_get/put_buffer() anymore. apparmor buffer
functions are mostly called form the exec path:

	security_bprm_creds_for_exec()
	 apparmor_bprm_creds_for_exec()
	  make_vfsuid()
	   aa_get_buffer()

As for vfs_statx()->...->apparmor_inode_getattr()->aa_path_perm(),
that path is bpf_lsm_inode_getsecid() now.

Re: [PATCH v5 1/4] apparmor: cache buffers on percpu list if there is lock, contention

Posted by John Johansen 2 years, 3 months ago

df323337e507 ("apparmor: Use a memory pool instead per-CPU caches")
changed buffer allocation to use a memory pool, however on a heavily
loaded machine there can be lock contention on the global buffers
lock. Add a percpu list to cache buffers on when lock contention is
encountered.

When allocating buffers attempt to use cached buffers first,
before taking the global buffers lock. When freeing buffers
try to put them back to the global list but if contention is
encountered, put the buffer on the percpu list.

The length of time a buffer is held on the percpu list is dynamically
adjusted based on lock contention.  The amount of hold time is
increased and decreased linearly.

Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
---
  security/apparmor/lsm.c | 67 ++++++++++++++++++++++++++++++++++++++---
  1 file changed, 62 insertions(+), 5 deletions(-)

diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index c80c1bd3024a..ce4f3e7a784d 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -49,12 +49,19 @@ union aa_buffer {
  	DECLARE_FLEX_ARRAY(char, buffer);
  };
  
+struct aa_local_cache {
+	unsigned int hold;
+	unsigned int count;
+	struct list_head head;
+};
+
  #define RESERVE_COUNT 2
  static int reserve_count = RESERVE_COUNT;
  static int buffer_count;
  
  static LIST_HEAD(aa_global_buffers);
  static DEFINE_SPINLOCK(aa_buffers_lock);
+static DEFINE_PER_CPU(struct aa_local_cache, aa_local_buffers);
  
  /*
   * LSM hook functions
@@ -1789,11 +1796,32 @@ static int param_set_mode(const char *val, const struct kernel_param *kp)
  char *aa_get_buffer(bool in_atomic)
  {
  	union aa_buffer *aa_buf;
+	struct aa_local_cache *cache;
  	bool try_again = true;
  	gfp_t flags = (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
  
+	/* use per cpu cached buffers first */
+	cache = get_cpu_ptr(&aa_local_buffers);
+	if (!list_empty(&cache->head)) {
+		aa_buf = list_first_entry(&cache->head, union aa_buffer, list);
+		list_del(&aa_buf->list);
+		cache->hold--;
+		cache->count--;
+		put_cpu_ptr(&aa_local_buffers);
+		return &aa_buf->buffer[0];
+	}
+	put_cpu_ptr(&aa_local_buffers);
+
+	if (!spin_trylock(&aa_buffers_lock)) {
+		cache = get_cpu_ptr(&aa_local_buffers);
+		cache->hold += 1;
+		put_cpu_ptr(&aa_local_buffers);
+		spin_lock(&aa_buffers_lock);
+	} else {
+		cache = get_cpu_ptr(&aa_local_buffers);
+		put_cpu_ptr(&aa_local_buffers);
+	}
  retry:
-	spin_lock(&aa_buffers_lock);
  	if (buffer_count > reserve_count ||
  	    (in_atomic && !list_empty(&aa_global_buffers))) {
  		aa_buf = list_first_entry(&aa_global_buffers, union aa_buffer,
@@ -1819,6 +1847,7 @@ char *aa_get_buffer(bool in_atomic)
  	if (!aa_buf) {
  		if (try_again) {
  			try_again = false;
+			spin_lock(&aa_buffers_lock);
  			goto retry;
  		}
  		pr_warn_once("AppArmor: Failed to allocate a memory buffer.\n");
@@ -1830,15 +1859,34 @@ char *aa_get_buffer(bool in_atomic)
  void aa_put_buffer(char *buf)
  {
  	union aa_buffer *aa_buf;
+	struct aa_local_cache *cache;
  
  	if (!buf)
  		return;
  	aa_buf = container_of(buf, union aa_buffer, buffer[0]);
  
-	spin_lock(&aa_buffers_lock);
-	list_add(&aa_buf->list, &aa_global_buffers);
-	buffer_count++;
-	spin_unlock(&aa_buffers_lock);
+	cache = get_cpu_ptr(&aa_local_buffers);
+	if (!cache->hold) {
+		put_cpu_ptr(&aa_local_buffers);
+
+		if (spin_trylock(&aa_buffers_lock)) {
+			/* put back on global list */
+			list_add(&aa_buf->list, &aa_global_buffers);
+			buffer_count++;
+			spin_unlock(&aa_buffers_lock);
+			cache = get_cpu_ptr(&aa_local_buffers);
+			put_cpu_ptr(&aa_local_buffers);
+			return;
+		}
+		/* contention on global list, fallback to percpu */
+		cache = get_cpu_ptr(&aa_local_buffers);
+		cache->hold += 1;
+	}
+
+	/* cache in percpu list */
+	list_add(&aa_buf->list, &cache->head);
+	cache->count++;
+	put_cpu_ptr(&aa_local_buffers);
  }
  
  /*
@@ -1880,6 +1928,15 @@ static int __init alloc_buffers(void)
  	union aa_buffer *aa_buf;
  	int i, num;
  
+	/*
+	 * per cpu set of cached allocated buffers used to help reduce
+	 * lock contention
+	 */
+	for_each_possible_cpu(i) {
+		per_cpu(aa_local_buffers, i).hold = 0;
+		per_cpu(aa_local_buffers, i).count = 0;
+		INIT_LIST_HEAD(&per_cpu(aa_local_buffers, i).head);
+	}
  	/*
  	 * A function may require two buffers at once. Usually the buffers are
  	 * used for a short period of time and are shared. On UP kernel buffers
-- 
2.34.1

Re: [PATCH v5 2/4] apparmor: exponential backoff on cache buffer contention

Posted by John Johansen 2 years, 3 months ago

Reduce contention on the global buffers lock by using  an exponential
back off strategy where the amount tries to hold is doubled when
contention is encoutered and backed off linearly when there isn't
contention.

Signed-off-by: John Johansen <john.johansen@canonical.com>
---
  security/apparmor/lsm.c | 18 ++++++++++++++++--
  1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index ce4f3e7a784d..fd6779ff0da4 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -50,6 +50,7 @@ union aa_buffer {
  };
  
  struct aa_local_cache {
+	unsigned int contention;
  	unsigned int hold;
  	unsigned int count;
  	struct list_head head;
@@ -1793,6 +1794,14 @@ static int param_set_mode(const char *val, const struct kernel_param *kp)
  	return 0;
  }
  
+static void update_contention(struct aa_local_cache *cache)
+{
+	cache->contention += 1;
+	if (cache->contention > 9)
+		cache->contention = 9;
+	cache->hold += 1 << cache->contention;		/* 2, 4, 8, ... */
+}
+
  char *aa_get_buffer(bool in_atomic)
  {
  	union aa_buffer *aa_buf;
@@ -1814,11 +1823,13 @@ char *aa_get_buffer(bool in_atomic)
  
  	if (!spin_trylock(&aa_buffers_lock)) {
  		cache = get_cpu_ptr(&aa_local_buffers);
-		cache->hold += 1;
+		update_contention(cache);
  		put_cpu_ptr(&aa_local_buffers);
  		spin_lock(&aa_buffers_lock);
  	} else {
  		cache = get_cpu_ptr(&aa_local_buffers);
+		if (cache->contention)
+			cache->contention--;
  		put_cpu_ptr(&aa_local_buffers);
  	}
  retry:
@@ -1875,12 +1886,14 @@ void aa_put_buffer(char *buf)
  			buffer_count++;
  			spin_unlock(&aa_buffers_lock);
  			cache = get_cpu_ptr(&aa_local_buffers);
+			if (cache->contention)
+				cache->contention--;
  			put_cpu_ptr(&aa_local_buffers);
  			return;
  		}
  		/* contention on global list, fallback to percpu */
  		cache = get_cpu_ptr(&aa_local_buffers);
-		cache->hold += 1;
+		update_contention(cache);
  	}
  
  	/* cache in percpu list */
@@ -1933,6 +1946,7 @@ static int __init alloc_buffers(void)
  	 * lock contention
  	 */
  	for_each_possible_cpu(i) {
+		per_cpu(aa_local_buffers, i).contention = 0;
  		per_cpu(aa_local_buffers, i).hold = 0;
  		per_cpu(aa_local_buffers, i).count = 0;
  		INIT_LIST_HEAD(&per_cpu(aa_local_buffers, i).head);
-- 
2.34.1

Re: [PATCH v5 3/4] apparmor: experiment with faster backoff on global buffer

Posted by John Johansen 2 years, 3 months ago

Instead of doubling hold count when contention is encounter increase
it by 8x. This makes for a faster back off, but results in buffers
being held longer.

Signed-off-by: John Johansen <john.johansen@canonical.com>
---
  security/apparmor/lsm.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index fd6779ff0da4..52423d88854a 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -1796,10 +1796,10 @@ static int param_set_mode(const char *val, const struct kernel_param *kp)
  
  static void update_contention(struct aa_local_cache *cache)
  {
-	cache->contention += 1;
+	cache->contention += 3;
  	if (cache->contention > 9)
  		cache->contention = 9;
-	cache->hold += 1 << cache->contention;		/* 2, 4, 8, ... */
+	cache->hold += 1 << cache->contention;		/* 8, 64, 512, ... */
  }
  
  char *aa_get_buffer(bool in_atomic)
-- 
2.34.1

Re: [PATCH v5 4/4] apparmor: limit the number of buffers in percpu cache

Posted by John Johansen 2 years, 3 months ago

Force buffers to be returned to the global pool, regardless of contention
when the percpu cache is full. This ensures that the percpu buffer list
never grows longer than needed.

Signed-off-by: John Johansen <john.johansen@canonical.com>
---
  security/apparmor/lsm.c | 9 ++++++++-
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 52423d88854a..e6765f64f6bf 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -56,6 +56,7 @@ struct aa_local_cache {
  	struct list_head head;
  };
  
+#define MAX_LOCAL_COUNT 2
  #define RESERVE_COUNT 2
  static int reserve_count = RESERVE_COUNT;
  static int buffer_count;
@@ -1878,9 +1879,15 @@ void aa_put_buffer(char *buf)
  
  	cache = get_cpu_ptr(&aa_local_buffers);
  	if (!cache->hold) {
+		bool must_lock = cache->count >= MAX_LOCAL_COUNT;
+
  		put_cpu_ptr(&aa_local_buffers);
  
-		if (spin_trylock(&aa_buffers_lock)) {
+		if (must_lock) {
+			spin_lock(&aa_buffers_lock);
+			goto locked;
+		} else if (spin_trylock(&aa_buffers_lock)) {
+		locked:
  			/* put back on global list */
  			list_add(&aa_buf->list, &aa_global_buffers);
  			buffer_count++;
-- 
2.34.1