drivers/acpi/apei/ghes.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
task_work_add() can fail with -ESRCH if the target task is exiting.
When it fails, the caller must handle the error and free any allocated
resources.
ghes_do_memory_failure() allocates a twcb structure from ghes_estatus_pool
before calling task_work_add(). If task_work_add() fails, twcb is leaked.
This can happen due to a race during task exit:
do_exit()
exit_mm() # current->mm cleared
exit_task_work() # task->task_works = &work_exited
ghes_do_memory_failure() checks current->mm before allocating twcb,
but exit_task_work() may run before task_work_add() completes. At that
point task->task_works == &work_exited, causing task_work_add() to fail.
Fixes the leak by checking the return value and freeing twcb on failure.
Fixes: c1f1fda14137 ("ACPI: APEI: handle synchronous exceptions in task work")
Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
---
drivers/acpi/apei/ghes.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 8acd2742bb27d..4ffe65ecf4a87 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -520,8 +520,11 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
twcb->pfn = pfn;
twcb->flags = flags;
init_task_work(&twcb->twork, memory_failure_cb);
- task_work_add(current, &twcb->twork, TWA_RESUME);
- return true;
+ if (!task_work_add(current, &twcb->twork, TWA_RESUME))
+ return true;
+
+ gen_pool_free(ghes_estatus_pool, (unsigned long)twcb, sizeof(*twcb));
+ return false;
}
memory_failure_queue(pfn, flags);
--
2.43.0
Hi Wupeng, On 2026/4/17 14:50, Wupeng Ma wrote: > task_work_add() can fail with -ESRCH if the target task is exiting. > When it fails, the caller must handle the error and free any allocated > resources. > > ghes_do_memory_failure() allocates a twcb structure from ghes_estatus_pool > before calling task_work_add(). If task_work_add() fails, twcb is leaked. > > This can happen due to a race during task exit: > > do_exit() > exit_mm() # current->mm cleared > exit_task_work() # task->task_works = &work_exited > > ghes_do_memory_failure() checks current->mm before allocating twcb, > but exit_task_work() may run before task_work_add() completes. At that > point task->task_works == &work_exited, causing task_work_add() to fail. There are multi places in the kernel to call task_work_add() without checking the return value, does this race only cause bug in ghes_do_memory_failure()? Thanks Hanjun
On 周二 2026-4-21 17:02, Hanjun Guo wrote: > Hi Wupeng, > > On 2026/4/17 14:50, Wupeng Ma wrote: >> task_work_add() can fail with -ESRCH if the target task is exiting. >> When it fails, the caller must handle the error and free any allocated >> resources. >> >> ghes_do_memory_failure() allocates a twcb structure from ghes_estatus_pool >> before calling task_work_add(). If task_work_add() fails, twcb is leaked. >> >> This can happen due to a race during task exit: >> >> do_exit() >> exit_mm() # current->mm cleared >> exit_task_work() # task->task_works = &work_exited >> >> ghes_do_memory_failure() checks current->mm before allocating twcb, >> but exit_task_work() may run before task_work_add() completes. At that >> point task->task_works == &work_exited, causing task_work_add() to fail. > > There are multi places in the kernel to call task_work_add() without > checking the return value, does this race only cause bug in > ghes_do_memory_failure()? Thanks for the review. We have analyzed all the called functions, and apart from this location, only binder_deferred_fd_close has a potential resource leak issue upon failure. > > Thanks > Hanjun
On 2026/4/21 17:18, mawupeng wrote:
>
>
> On 周二 2026-4-21 17:02, Hanjun Guo wrote:
>> Hi Wupeng,
>>
>> On 2026/4/17 14:50, Wupeng Ma wrote:
>>> task_work_add() can fail with -ESRCH if the target task is exiting.
>>> When it fails, the caller must handle the error and free any allocated
>>> resources.
>>>
>>> ghes_do_memory_failure() allocates a twcb structure from ghes_estatus_pool
>>> before calling task_work_add(). If task_work_add() fails, twcb is leaked.
>>>
>>> This can happen due to a race during task exit:
>>>
>>> do_exit()
>>> exit_mm() # current->mm cleared
>>> exit_task_work() # task->task_works = &work_exited
>>>
>>> ghes_do_memory_failure() checks current->mm before allocating twcb,
>>> but exit_task_work() may run before task_work_add() completes. At that
>>> point task->task_works == &work_exited, causing task_work_add() to fail.
>>
>> There are multi places in the kernel to call task_work_add() without
>> checking the return value, does this race only cause bug in
>> ghes_do_memory_failure()?
>
> Thanks for the review.
>
> We have analyzed all the called functions, and apart from this location, only
> binder_deferred_fd_close has a potential resource leak issue upon failure.
I think this is a real bugfix.
Would you mind explaining the race in this way in the commit log,
CPU0 CPU1
do_exit() xxx
to explicitly show the problem?
Thanks
Hanjun
© 2016 - 2026 Red Hat, Inc.