ACPI: APEI: check return value of task_work_add to prevent memory leaks

[PATCH] ACPI: APEI: check return value of task_work_add to prevent memory leaks

Posted by Wupeng Ma 2 months ago

task_work_add() can fail with -ESRCH if the target task is exiting.
When it fails, the caller must handle the error and free any allocated
resources.

ghes_do_memory_failure() allocates a twcb structure from ghes_estatus_pool
before calling task_work_add(). If task_work_add() fails, twcb is leaked.

This can happen due to a race during task exit:

  do_exit()
    exit_mm()           # current->mm cleared
    exit_task_work()    # task->task_works = &work_exited

ghes_do_memory_failure() checks current->mm before allocating twcb,
but exit_task_work() may run before task_work_add() completes.  At that
point task->task_works == &work_exited, causing task_work_add() to fail.

Fixes the leak by checking the return value and freeing twcb on failure.

Fixes: c1f1fda14137 ("ACPI: APEI: handle synchronous exceptions in task work")
Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
---
 drivers/acpi/apei/ghes.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 8acd2742bb27d..4ffe65ecf4a87 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -520,8 +520,11 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 		twcb->pfn = pfn;
 		twcb->flags = flags;
 		init_task_work(&twcb->twork, memory_failure_cb);
-		task_work_add(current, &twcb->twork, TWA_RESUME);
-		return true;
+		if (!task_work_add(current, &twcb->twork, TWA_RESUME))
+			return true;
+
+		gen_pool_free(ghes_estatus_pool, (unsigned long)twcb, sizeof(*twcb));
+		return false;
 	}
 
 	memory_failure_queue(pfn, flags);
-- 
2.43.0

Re: [PATCH] ACPI: APEI: check return value of task_work_add to prevent memory leaks

Posted by Hanjun Guo 1 month, 3 weeks ago

Hi Wupeng,

On 2026/4/17 14:50, Wupeng Ma wrote:
> task_work_add() can fail with -ESRCH if the target task is exiting.
> When it fails, the caller must handle the error and free any allocated
> resources.
> 
> ghes_do_memory_failure() allocates a twcb structure from ghes_estatus_pool
> before calling task_work_add(). If task_work_add() fails, twcb is leaked.
> 
> This can happen due to a race during task exit:
> 
>    do_exit()
>      exit_mm()           # current->mm cleared
>      exit_task_work()    # task->task_works = &work_exited
> 
> ghes_do_memory_failure() checks current->mm before allocating twcb,
> but exit_task_work() may run before task_work_add() completes.  At that
> point task->task_works == &work_exited, causing task_work_add() to fail.

There are multi places in the kernel to call task_work_add() without
checking the return value, does this race only cause bug in
ghes_do_memory_failure()?

Thanks
Hanjun

Re: [PATCH] ACPI: APEI: check return value of task_work_add to prevent memory leaks

Posted by mawupeng 1 month, 3 weeks ago


On 周二 2026-4-21 17:02, Hanjun Guo wrote:
> Hi Wupeng,
> 
> On 2026/4/17 14:50, Wupeng Ma wrote:
>> task_work_add() can fail with -ESRCH if the target task is exiting.
>> When it fails, the caller must handle the error and free any allocated
>> resources.
>>
>> ghes_do_memory_failure() allocates a twcb structure from ghes_estatus_pool
>> before calling task_work_add(). If task_work_add() fails, twcb is leaked.
>>
>> This can happen due to a race during task exit:
>>
>>    do_exit()
>>      exit_mm()           # current->mm cleared
>>      exit_task_work()    # task->task_works = &work_exited
>>
>> ghes_do_memory_failure() checks current->mm before allocating twcb,
>> but exit_task_work() may run before task_work_add() completes.  At that
>> point task->task_works == &work_exited, causing task_work_add() to fail.
> 
> There are multi places in the kernel to call task_work_add() without
> checking the return value, does this race only cause bug in
> ghes_do_memory_failure()?

Thanks for the review. 

We have analyzed all the called functions, and apart from this location, only
binder_deferred_fd_close has a potential resource leak issue upon failure.


> 
> Thanks
> Hanjun

Re: [PATCH] ACPI: APEI: check return value of task_work_add to prevent memory leaks

Posted by Hanjun Guo 1 month, 2 weeks ago

On 2026/4/21 17:18, mawupeng wrote:
> 
> 
> On 周二 2026-4-21 17:02, Hanjun Guo wrote:
>> Hi Wupeng,
>>
>> On 2026/4/17 14:50, Wupeng Ma wrote:
>>> task_work_add() can fail with -ESRCH if the target task is exiting.
>>> When it fails, the caller must handle the error and free any allocated
>>> resources.
>>>
>>> ghes_do_memory_failure() allocates a twcb structure from ghes_estatus_pool
>>> before calling task_work_add(). If task_work_add() fails, twcb is leaked.
>>>
>>> This can happen due to a race during task exit:
>>>
>>>     do_exit()
>>>       exit_mm()           # current->mm cleared
>>>       exit_task_work()    # task->task_works = &work_exited
>>>
>>> ghes_do_memory_failure() checks current->mm before allocating twcb,
>>> but exit_task_work() may run before task_work_add() completes.  At that
>>> point task->task_works == &work_exited, causing task_work_add() to fail.
>>
>> There are multi places in the kernel to call task_work_add() without
>> checking the return value, does this race only cause bug in
>> ghes_do_memory_failure()?
> 
> Thanks for the review.
> 
> We have analyzed all the called functions, and apart from this location, only
> binder_deferred_fd_close has a potential resource leak issue upon failure.

I think this is a real bugfix.

Would you mind explaining the race in this way in the commit log,

    CPU0                   CPU1

do_exit()               xxx

to explicitly show the problem?

Thanks
Hanjun