From nobody Sat Nov 23 12:54:31 2024 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18759A50 for ; Wed, 13 Nov 2024 00:01:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731456122; cv=none; b=N3fTE41cleGVzeE14b9iUFbxIUVXR1GK7Vj+2/smecGADMt+QdwDm4/QOegoFHXsqJYr5W3KAxc2bMSm4Lf7zqhIFgCirWlrzg+NTtWPhPnXmIzpiE/A80dlCIA+4gQhOPuraOsul6GIk3o9uYB8DwyvV/KPzj6innO44JF7cPo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731456122; c=relaxed/simple; bh=Q1b/AY9t9aUu6Mu4IsCtaLqvkjDwt2IwIwt6+gJsR9Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hXntWEBCmP9RbAT2P01dj4XQjimo3pGxTOKVtuskOGf9ueuCcc34AQO8I6R3DbOlpSBZ8ZIIrTrsdqPvQcU2u4B3lrq5bADulY4Dzv4Jg/tXAp9kZnZzFsdhMdsvaPGZDU5LBbWcpiSWYRF3HAOWcAlcfOABpd2wCo6WojIgDt0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=JLzVw6F8; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="JLzVw6F8" Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4ACN1Ohx006074; Wed, 13 Nov 2024 00:01:40 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=Mrkrj KzT1wSbz77CbxZEjRtVE8fmoSTduLYtgoBjwWo=; b=JLzVw6F8lYVEua9qAsvrL T6y6sONNCz4Uj7r+HSI+voRKcPHCo64QLPbfcevbeJoSJ/CuE4j58qVS4+tdoCsC e/eQzZdnfdBI99sYfn/hSztOJsm8jUIfBxbO8Wx3gIwUF8/PQDEiSYVW9NZ8SFN3 ZgB13A9ge6o2FKdqcHZ03R+n47d/IdabVVWmC+zaeFh6LcnP/rLLCxQydv/DtCyy 7JPDPmN2CjMUqHljprlAZSHnId3vFQBUg2weH8nuw94qqKQ998gbKLLeba+CRK65 xyV3kaILpZlna0fFS5l5vVtKL0vxZFbY9Sg7RFkSs+RUFFAf7HzX4tDhoPHAnXwv w== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 42t0k25pch-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Nov 2024 00:01:40 +0000 (GMT) Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 4ACNGZsb008025; Wed, 13 Nov 2024 00:01:39 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 42sx68uq7y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 13 Nov 2024 00:01:39 +0000 Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 4AD01bKg011807; Wed, 13 Nov 2024 00:01:38 GMT Received: from psang-work.osdevelopmeniad.oraclevcn.com (psang-work.allregionaliads.osdevelopmeniad.oraclevcn.com [100.100.253.35]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 42sx68uq2j-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 13 Nov 2024 00:01:38 +0000 From: Prakash Sangappa To: linux-kernel@vger.kernel.org Cc: rostedt@goodmis.org, peterz@infradead.org, tglx@linutronix.de, daniel.m.jordan@oracle.com, prakash.sangappa@oracle.com Subject: [RFC PATCH 1/4] Introduce per thread user-kernel shared structure Date: Wed, 13 Nov 2024 00:01:23 +0000 Message-ID: <20241113000126.967713-2-prakash.sangappa@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241113000126.967713-1-prakash.sangappa@oracle.com> References: <20241113000126.967713-1-prakash.sangappa@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-11-12_09,2024-11-12_02,2024-09-30_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 spamscore=0 phishscore=0 bulkscore=0 malwarescore=0 suspectscore=0 mlxscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2409260000 definitions=main-2411120193 X-Proofpoint-ORIG-GUID: diJilSc6g_0HYTdl2FgZE8-JgNiPYE4k X-Proofpoint-GUID: diJilSc6g_0HYTdl2FgZE8-JgNiPYE4k Content-Type: text/plain; charset="utf-8" A structure per thread is allocated from a page that is shared mapped between user space and kernel, to be used as a means for faster communication. This will facilitate sharing thread specific information between user space and kernel, which can be accessed by the application without requiring system calls in latency sensitive code path. This change adds a new system call, which will allocate the shared structure and return its mapped user address. Multiple such structures will be allocated on a page to accommodate requests from different threads of a multithreaded process. Available space on a page is managed using a bitmap. When a thread exits, the shared structure is freed and can get reused for another thread that requests it. More pages will be allocated and used as needed based on the number of threads requesting use of shared structures. These pages are all freed when the process exits. Each of these per thread shared structures are rounded to 128 bytes. Available space in this structure can be used to add structure members to implement new features. Signed-off-by: Prakash Sangappa --- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/mm_types.h | 4 + include/linux/sched.h | 5 + include/linux/syscalls.h | 2 + include/linux/task_shared.h | 63 +++++ include/uapi/asm-generic/unistd.h | 4 +- include/uapi/linux/task_shared.h | 18 ++ init/Kconfig | 10 + kernel/fork.c | 12 + kernel/sys_ni.c | 2 + mm/Makefile | 1 + mm/mmap.c | 13 ++ mm/task_shared.c | 306 +++++++++++++++++++++++++ 14 files changed, 441 insertions(+), 1 deletion(-) create mode 100644 include/linux/task_shared.h create mode 100644 include/uapi/linux/task_shared.h create mode 100644 mm/task_shared.c diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscal= ls/syscall_32.tbl index 534c74b14fab..3838fdc3d292 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -468,3 +468,4 @@ 460 i386 lsm_set_self_attr sys_lsm_set_self_attr 461 i386 lsm_list_modules sys_lsm_list_modules 462 i386 mseal sys_mseal +463 i386 task_getshared sys_task_getshared diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscal= ls/syscall_64.tbl index 7093ee21c0d1..5bc4ecd74117 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -386,6 +386,7 @@ 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules 462 common mseal sys_mseal +463 common task_getshared sys_task_getshared =20 # # Due to a historical design error, certain syscalls are numbered differen= tly diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6e3bdf8e38bc..d32d92a47c34 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1030,6 +1030,10 @@ struct mm_struct { #endif } lru_gen; #endif /* CONFIG_LRU_GEN_WALKS_MMU */ +#ifdef CONFIG_TASKSHARED + /* user shared pages */ + void *usharedpg; +#endif } __randomize_layout; =20 /* diff --git a/include/linux/sched.h b/include/linux/sched.h index ee9d8aecc5b5..1ca7d4efa932 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1593,6 +1593,11 @@ struct task_struct { struct user_event_mm *user_event_mm; #endif =20 +#ifdef CONFIG_TASKSHARED + /* user shared struct */ + void *task_ushrd; +#endif + /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 5758104921e6..3ca79244aa0b 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -997,6 +997,8 @@ asmlinkage long sys_spu_create(const char __user *name, unsigned int flags, umode_t mode, int fd); =20 =20 +asmlinkage long sys_task_getshared(long opt, long flags, void __user *uadd= r); + /* * Deprecated system calls which are still defined in * include/uapi/asm-generic/unistd.h and wanted by >=3D 1 arch diff --git a/include/linux/task_shared.h b/include/linux/task_shared.h new file mode 100644 index 000000000000..983fdae47308 --- /dev/null +++ b/include/linux/task_shared.h @@ -0,0 +1,63 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __TASK_SHARED_H__ +#define __TASK_SHARED_H__ + +#include +#include + +#ifdef CONFIG_TASKSHARED +/* + * Track user-kernel shared pages referred by mm_struct + */ +struct ushared_pages { + struct list_head plist; + struct list_head frlist; + unsigned long pcount; +}; + + +/* + * Following is used for cacheline aligned allocations of shared structures + * within a page. + */ +union task_shared { + struct task_sharedinfo ts; + char s[128]; +}; + +/* + * Struct to track per page slots + */ +struct ushared_pg { + struct list_head list; + struct list_head fr_list; + struct page *pages[2]; + u64 bitmap; /* free slots */ + int slot_count; + unsigned long kaddr; + unsigned long vaddr; /* user address */ + struct vm_special_mapping ushrd_mapping; +}; + +/* + * Following struct is referred by struct task_struct, contains mapped add= ress + * of per thread shared structure allocated. + */ +struct task_ushrd_struct { + union task_shared *kaddr; /* kernel address */ + union task_shared *uaddr; /* user address */ + struct ushared_pg *upg; +}; + +extern void task_ushared_free(struct task_struct *t); +extern void mm_ushared_clear(struct mm_struct *mm); +#else /* !CONFIG_TASKSHARED */ +static inline void task_ushared_free(struct task_struct *t) +{ +} + +static inline void mm_ushared_clear(struct mm_struct *mm) +{ +} +#endif /* !CONFIG_TASKSHARED */ +#endif /* __TASK_SHARED_H__ */ diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/u= nistd.h index 5bf6148cac2b..7f6367616fb5 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -840,9 +840,11 @@ __SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules) =20 #define __NR_mseal 462 __SYSCALL(__NR_mseal, sys_mseal) +#define __NR_task_getshared 463 +__SYSCALL(__NR_task_getshared, sys_task_getshared) =20 #undef __NR_syscalls -#define __NR_syscalls 463 +#define __NR_syscalls 464 =20 /* * 32 bit systems traditionally used different diff --git a/include/uapi/linux/task_shared.h b/include/uapi/linux/task_sha= red.h new file mode 100644 index 000000000000..a07902c57380 --- /dev/null +++ b/include/uapi/linux/task_shared.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef LINUX_TASK_SHARED_H +#define LINUX_TASK_SHARED_H + +/* + * Per task user-kernel mapped structure + */ + +/* + * Option to request allocation of struct task_sharedinfo shared structure, + * used for sharing per thread information between userspace and kernel. + */ +#define TASK_SHAREDINFO 1 + +struct task_sharedinfo { + int version; +}; +#endif diff --git a/init/Kconfig b/init/Kconfig index a7666e186064..1f84851d1b7e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1803,6 +1803,16 @@ config DEBUG_RSEQ Enable extra debugging checks for the rseq system call. =20 If unsure, say N. +config TASKSHARED + bool "Enable task getshared syscall" if EXPERT + default y + help + Enable mechanism to provide per thread shared structure mapped + between userspace<->kernel for faster communication. Used + for sharing per thread information. + + If unsure, say Y. + =20 config CACHESTAT_SYSCALL bool "Enable cachestat() system call" if EXPERT diff --git a/kernel/fork.c b/kernel/fork.c index 22f43721d031..b40792c84718 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -112,6 +112,7 @@ #include #include #include +#include =20 #include =20 @@ -1126,6 +1127,11 @@ static struct task_struct *dup_task_struct(struct ta= sk_struct *orig, int node) if (err) goto free_stack; =20 +#ifdef CONFIG_TASKSHARED + /* task's shared structures are not inherited across fork */ + tsk->task_ushrd =3D NULL; +#endif + #ifdef CONFIG_SECCOMP /* * We must handle setting up seccomp filters once we're under @@ -1282,6 +1288,10 @@ static struct mm_struct *mm_init(struct mm_struct *m= m, struct task_struct *p, #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(CONFIG_SPLIT_PMD_PTLO= CKS) mm->pmd_huge_pte =3D NULL; #endif + +#ifdef CONFIG_TASKSHARED + mm->usharedpg =3D NULL; +#endif mm_init_uprobes_state(mm); hugetlb_count_init(mm); =20 @@ -1346,6 +1356,7 @@ static inline void __mmput(struct mm_struct *mm) ksm_exit(mm); khugepaged_exit(mm); /* must run before exit_mmap */ exit_mmap(mm); + mm_ushared_clear(mm); mm_put_huge_zero_folio(mm); set_mm_exe_file(mm, NULL); if (!list_empty(&mm->mmlist)) { @@ -1605,6 +1616,7 @@ static int wait_for_vfork_done(struct task_struct *ch= ild, static void mm_release(struct task_struct *tsk, struct mm_struct *mm) { uprobe_free_utask(tsk); + task_ushared_free(tsk); =20 /* Get rid of any cached register state */ deactivate_mm(tsk, mm); diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index c00a86931f8c..9039a69e95ac 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -390,5 +390,7 @@ COND_SYSCALL(setuid16); =20 /* restartable sequence */ COND_SYSCALL(rseq); +/* task shared */ +COND_SYSCALL(task_getshared); =20 COND_SYSCALL(uretprobe); diff --git a/mm/Makefile b/mm/Makefile index d5639b036166..007743b40f87 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -145,3 +145,4 @@ obj-$(CONFIG_GENERIC_IOREMAP) +=3D ioremap.o obj-$(CONFIG_SHRINKER_DEBUG) +=3D shrinker_debug.o obj-$(CONFIG_EXECMEM) +=3D execmem.o obj-$(CONFIG_TMPFS_QUOTA) +=3D shmem_quota.o +obj-$(CONFIG_TASKSHARED) +=3D task_shared.o diff --git a/mm/mmap.c b/mm/mmap.c index 79d541f1502b..05b947fac55b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2086,6 +2086,18 @@ static int special_mapping_split(struct vm_area_stru= ct *vma, unsigned long addr) return -EINVAL; } =20 +/* + * XXX: Having a mkwrite hook prevents attempt to update file time on + * page write fault, these mappings do not have a file structure associate= d. + */ +static vm_fault_t special_mapping_page_mkwrite(struct vm_fault *vmf) +{ + struct page *page =3D vmf->page; + lock_page(page); + + return VM_FAULT_LOCKED; +} + static const struct vm_operations_struct special_mapping_vmops =3D { .close =3D special_mapping_close, .fault =3D special_mapping_fault, @@ -2094,6 +2106,7 @@ static const struct vm_operations_struct special_mapp= ing_vmops =3D { /* vDSO code relies that VVAR can't be accessed remotely */ .access =3D NULL, .may_split =3D special_mapping_split, + .page_mkwrite =3D special_mapping_page_mkwrite, }; =20 static vm_fault_t special_mapping_fault(struct vm_fault *vmf) diff --git a/mm/task_shared.c b/mm/task_shared.c new file mode 100644 index 000000000000..cea45d913b91 --- /dev/null +++ b/mm/task_shared.c @@ -0,0 +1,306 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Per thread shared structure mechanism + */ + +#define TASK_USHARED_SLOTS (PAGE_SIZE/sizeof(union task_shared)) + +/* + * Called once to init struct ushared_pages pointer. + */ +static int init_mm_ushared(struct mm_struct *mm) +{ + struct ushared_pages *usharedpg; + + usharedpg =3D kmalloc(sizeof(struct ushared_pages), GFP_KERNEL); + if (usharedpg =3D=3D NULL) + return 1; + + INIT_LIST_HEAD(&usharedpg->plist); + INIT_LIST_HEAD(&usharedpg->frlist); + usharedpg->pcount =3D 0; + mmap_write_lock(mm); + if (mm->usharedpg =3D=3D NULL) { + mm->usharedpg =3D usharedpg; + usharedpg =3D NULL; + } + mmap_write_unlock(mm); + if (usharedpg !=3D NULL) + kfree(usharedpg); + return 0; +} + +static int init_task_ushrd(struct task_struct *t) +{ + struct task_ushrd_struct *ushrd; + + ushrd =3D kzalloc(sizeof(struct task_ushrd_struct), GFP_KERNEL); + if (ushrd =3D=3D NULL) + return 1; + + mmap_write_lock(t->mm); + if (t->task_ushrd =3D=3D NULL) { + t->task_ushrd =3D ushrd; + ushrd =3D NULL; + } + mmap_write_unlock(t->mm); + if (ushrd !=3D NULL) + kfree(ushrd); + return 0; +} + +/* + * Called from __mmput(), mm is going away + */ +void mm_ushared_clear(struct mm_struct *mm) +{ + struct ushared_pg *upg; + struct ushared_pg *tmp; + struct ushared_pages *usharedpg; + + if (mm =3D=3D NULL || mm->usharedpg =3D=3D NULL) + return; + + usharedpg =3D mm->usharedpg; + if (list_empty(&usharedpg->frlist)) + goto out; + + list_for_each_entry_safe(upg, tmp, &usharedpg->frlist, fr_list) { + list_del(&upg->fr_list); + put_page(upg->pages[0]); + kfree(upg); + } +out: + kfree(mm->usharedpg); + mm->usharedpg =3D NULL; + +} + +void task_ushared_free(struct task_struct *t) +{ + struct task_ushrd_struct *ushrd =3D t->task_ushrd; + struct mm_struct *mm =3D t->mm; + struct ushared_pages *usharedpg; + int slot; + + if (mm =3D=3D NULL || mm->usharedpg =3D=3D NULL || ushrd =3D=3D NULL) + return; + + usharedpg =3D mm->usharedpg; + mmap_write_lock(mm); + + if (ushrd->upg =3D=3D NULL) + goto out; + + slot =3D (unsigned long)((unsigned long)ushrd->uaddr + - ushrd->upg->vaddr) / sizeof(union task_shared); + clear_bit(slot, (unsigned long *)(&ushrd->upg->bitmap)); + + /* move to head */ + if (ushrd->upg->slot_count =3D=3D 0) { + list_del(&ushrd->upg->fr_list); + list_add(&ushrd->upg->fr_list, &usharedpg->frlist); + } + + ushrd->upg->slot_count++; + + ushrd->uaddr =3D ushrd->kaddr =3D NULL; + ushrd->upg =3D NULL; + +out: + t->task_ushrd =3D NULL; + mmap_write_unlock(mm); + kfree(ushrd); +} + +/* map shared page */ +static int task_shared_add_vma(struct ushared_pg *pg) +{ + struct vm_area_struct *vma; + struct mm_struct *mm =3D current->mm; + unsigned long ret =3D 1; + + + if (!pg->vaddr) { + /* Try to map as high as possible, this is only a hint. */ + pg->vaddr =3D get_unmapped_area(NULL, TASK_SIZE - PAGE_SIZE, + PAGE_SIZE, 0, 0); + if (pg->vaddr & ~PAGE_MASK) { + ret =3D 0; + goto fail; + } + } + + vma =3D _install_special_mapping(mm, pg->vaddr, PAGE_SIZE, + VM_SHARED|VM_READ|VM_MAYREAD| + VM_WRITE|VM_MAYWRITE|VM_DONTCOPY|VM_IO, + &pg->ushrd_mapping); + if (IS_ERR(vma)) { + ret =3D 0; + pg->vaddr =3D 0; + goto fail; + } + + pg->kaddr =3D (unsigned long)page_address(pg->pages[0]); +fail: + return ret; +} + +/* + * Allocate a page, map user address and add to freelist + */ +static struct ushared_pg *ushared_allocpg(void) +{ + + struct ushared_pg *pg; + struct mm_struct *mm =3D current->mm; + struct ushared_pages *usharedpg =3D mm->usharedpg; + + if (usharedpg =3D=3D NULL) + return NULL; + pg =3D kzalloc(sizeof(*pg), GFP_KERNEL); + + if (unlikely(!pg)) + return NULL; + pg->ushrd_mapping.name =3D "[task_shared]"; + pg->ushrd_mapping.fault =3D NULL; + pg->ushrd_mapping.pages =3D pg->pages; + pg->pages[0] =3D alloc_page(GFP_KERNEL); + if (!pg->pages[0]) + goto out; + pg->pages[1] =3D NULL; + pg->bitmap =3D 0; + + /* + * page size should be 4096 or 8192 + */ + pg->slot_count =3D TASK_USHARED_SLOTS; + + mmap_write_lock(mm); + if (task_shared_add_vma(pg)) { + list_add(&pg->fr_list, &usharedpg->frlist); + usharedpg->pcount++; + mmap_write_unlock(mm); + return pg; + } + mmap_write_unlock(mm); + + __free_page(pg->pages[0]); +out: + kfree(pg); + return NULL; +} + + +/* + * Allocate task_shared struct for calling thread. + */ +static int task_ushared_alloc(void) +{ + struct mm_struct *mm =3D current->mm; + struct ushared_pg *ent =3D NULL; + struct task_ushrd_struct *ushrd; + struct ushared_pages *usharedpg; + int tryalloc =3D 0; + int slot =3D -1; + int ret =3D -ENOMEM; + + if (mm->usharedpg =3D=3D NULL && init_mm_ushared(mm)) + return ret; + + if (current->task_ushrd =3D=3D NULL && init_task_ushrd(current)) + return ret; + + usharedpg =3D mm->usharedpg; + ushrd =3D current->task_ushrd; +repeat: + if (mmap_write_lock_killable(mm)) + return -EINTR; + + ent =3D list_empty(&usharedpg->frlist) ? NULL : + list_entry(usharedpg->frlist.next, + struct ushared_pg, fr_list); + + if (ent =3D=3D NULL || ent->slot_count =3D=3D 0) { + if (tryalloc =3D=3D 0) { + mmap_write_unlock(mm); + (void)ushared_allocpg(); + tryalloc =3D 1; + goto repeat; + } else { + ent =3D NULL; + } + } + + if (ent) { + slot =3D find_first_zero_bit((unsigned long *)(&ent->bitmap), + TASK_USHARED_SLOTS); + BUG_ON(slot >=3D TASK_USHARED_SLOTS); + + set_bit(slot, (unsigned long *)(&ent->bitmap)); + + ushrd->uaddr =3D (union task_shared *)(ent->vaddr + + (slot * sizeof(union task_shared))); + ushrd->kaddr =3D (union task_shared *)(ent->kaddr + + (slot * sizeof(union task_shared))); + ushrd->upg =3D ent; + ent->slot_count--; + /* move it to tail */ + if (ent->slot_count =3D=3D 0) { + list_del(&ent->fr_list); + list_add_tail(&ent->fr_list, &usharedpg->frlist); + } + + ret =3D 0; + } + + mmap_write_unlock(mm); + return ret; +} + + +/* + * Get Task Shared structure, allocate if needed and return mapped user ad= dress. + */ +static long task_getshared(u64 opt, u64 flags, void __user *uaddr) +{ + struct task_ushrd_struct *ushrd =3D current->task_ushrd; + + /* currently only TASK_SHAREDINFO supported */ + if (opt !=3D TASK_SHAREDINFO) + return (-EINVAL); + + /* if a shared structure is already allocated, return address */ + if (ushrd !=3D NULL && ushrd->upg !=3D NULL) { + if (copy_to_user(uaddr, &ushrd->uaddr, + sizeof(struct task_sharedinfo *))) + return (-EFAULT); + return 0; + } + + task_ushared_alloc(); + ushrd =3D current->task_ushrd; + if (ushrd !=3D NULL && ushrd->upg !=3D NULL) { + if (copy_to_user(uaddr, &ushrd->uaddr, + sizeof(struct task_sharedinfo *))) + return (-EFAULT); + return 0; + } + return (-ENOMEM); +} + + +SYSCALL_DEFINE3(task_getshared, u64, opt, u64, flags, void __user *, uaddr) +{ + return task_getshared(opt, flags, uaddr); +} --=20 2.43.5 From nobody Sat Nov 23 12:54:31 2024 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A95D4800 for ; Wed, 13 Nov 2024 00:01:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731456121; cv=none; b=UV7OKyBQksN+A+0zZ4qJ4ea9kVicDeG+Hc1KucmWzRr1SiQ0lZTelCCTezT6+lIBEYi4G35ENgKzSHNGf3eAICIwG33z8QSWP6cL4Aa/V8QA0LBLDKoTZTqd86LOGKHqu9Cf9oaDOlFzSMZXwvHRaD46W/BuCVcpiLmkk0si5C8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731456121; c=relaxed/simple; bh=OiAGIEGGIcbEnAnd1bCCitwZcpjye5jEN/qWdPFxlOU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XzVykFOcYmADu/yX09vl5gCwz3ghtSweOdOJbit8Oj69GhqhgfA4BzTqNOD3vHCgztpMAao/USkFwWhpXJEil9T/grlmboMeLkfsVcuzJ2E/CStDnMh1dg81qhIPqsXa8jdP9cY1PcojzcXUtgBVFT40ySgx+MWrZjB6BToqLIQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=a5JYqmHm; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="a5JYqmHm" Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4ACN1WS2026420; Wed, 13 Nov 2024 00:01:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=hIsy3 2Kufsjc84Tmu/XoYPOOh39oLFi4btQaeZJIVJ4=; b=a5JYqmHmgIDorGhrF2W2V oKQk6JombaN2pMO5REAvBY0oOBL1q7Y/IiZ1OzRmdQ6F0x95ou501/cFhhPnOFge Ngb4FOa4Rp3g8QAPWfwuAncZMLs+8c5Un43tgN1/lj8S9kQ1Uy3mK7Qt9BSdc/+3 MHUXLirNVYPor/WuEbsyi6UiK/L0uPwOQGf/TxA4MZBFpfmTmPjHIFPgAVY6OsLq jGhWOfR/xpn+wlVsMOC+x/wCdgQBMpBU0dmL+Dc+wquX8Kr+RVm4+tffNM9tzdgo LiCzURXuKSRWpM23ol3yAZDn6Rc2o9wy5w8wYE/Xcd+KSibTXBV8aIic6iPevvtK w== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 42t0mbdqxp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Nov 2024 00:01:40 +0000 (GMT) Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 4ACN4jqM007920; Wed, 13 Nov 2024 00:01:40 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 42sx68uq8m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 13 Nov 2024 00:01:40 +0000 Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 4AD01bKi011807; Wed, 13 Nov 2024 00:01:39 GMT Received: from psang-work.osdevelopmeniad.oraclevcn.com (psang-work.allregionaliads.osdevelopmeniad.oraclevcn.com [100.100.253.35]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 42sx68uq2j-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 13 Nov 2024 00:01:39 +0000 From: Prakash Sangappa To: linux-kernel@vger.kernel.org Cc: rostedt@goodmis.org, peterz@infradead.org, tglx@linutronix.de, daniel.m.jordan@oracle.com, prakash.sangappa@oracle.com Subject: [RFC PATCH 2/4] Scheduler time extention Date: Wed, 13 Nov 2024 00:01:24 +0000 Message-ID: <20241113000126.967713-3-prakash.sangappa@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241113000126.967713-1-prakash.sangappa@oracle.com> References: <20241113000126.967713-1-prakash.sangappa@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-11-12_09,2024-11-12_02,2024-09-30_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 spamscore=0 phishscore=0 bulkscore=0 malwarescore=0 suspectscore=0 mlxscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2409260000 definitions=main-2411120193 X-Proofpoint-GUID: mZjXpmKrj6q3jD9-8okMhstnnsmKoQgu X-Proofpoint-ORIG-GUID: mZjXpmKrj6q3jD9-8okMhstnnsmKoQgu Content-Type: text/plain; charset="utf-8" Introduce support for a thread to request extending execution time on the cpu, when holding locks in user space. Adds a member 'sched_delay' to the per thread shared mapped structure. Request for cpu execution time extention is made by the thread by updating 'sched_delay' member. Signed-off-by: Prakash Sangappa --- include/linux/entry-common.h | 10 +++++-- include/linux/sched.h | 17 +++++++++++ include/uapi/linux/task_shared.h | 2 +- kernel/entry/common.c | 15 ++++++---- kernel/sched/core.c | 16 ++++++++++ kernel/sched/syscalls.c | 7 +++++ mm/task_shared.c | 50 ++++++++++++++++++++++++++++++++ 7 files changed, 108 insertions(+), 9 deletions(-) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 1e50cdb83ae5..904f5cdfe0b7 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -302,7 +302,7 @@ void arch_do_signal_or_restart(struct pt_regs *regs); * exit_to_user_mode_loop - do any pending work before leaving to user spa= ce */ unsigned long exit_to_user_mode_loop(struct pt_regs *regs, - unsigned long ti_work); + unsigned long ti_work, bool irq); =20 /** * exit_to_user_mode_prepare - call exit_to_user_mode_loop() if required @@ -314,7 +314,8 @@ unsigned long exit_to_user_mode_loop(struct pt_regs *re= gs, * EXIT_TO_USER_MODE_WORK are set * 4) check that interrupts are still disabled */ -static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs) +static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs, + bool irq) { unsigned long ti_work; =20 @@ -325,7 +326,10 @@ static __always_inline void exit_to_user_mode_prepare(= struct pt_regs *regs) =20 ti_work =3D read_thread_flags(); if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK)) - ti_work =3D exit_to_user_mode_loop(regs, ti_work); + ti_work =3D exit_to_user_mode_loop(regs, ti_work, irq); + + if (irq) + taskshrd_delay_resched_fini(); =20 arch_exit_to_user_mode_prepare(regs, ti_work); =20 diff --git a/include/linux/sched.h b/include/linux/sched.h index 1ca7d4efa932..b53e7a878a01 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -326,6 +326,7 @@ extern int __must_check io_schedule_prepare(void); extern void io_schedule_finish(int token); extern long io_schedule_timeout(long timeout); extern void io_schedule(void); +extern void hrtick_local_start(u64 delay); =20 /** * struct prev_cputime - snapshot of system and user cputime @@ -957,6 +958,9 @@ struct task_struct { * ->sched_remote_wakeup gets used, so it can be in this word. */ unsigned sched_remote_wakeup:1; +#ifdef CONFIG_TASKSHARED + unsigned taskshrd_sched_delay:1; +#endif #ifdef CONFIG_RT_MUTEXES unsigned sched_rt_mutex:1; #endif @@ -2186,6 +2190,19 @@ static inline bool owner_on_cpu(struct task_struct *= owner) unsigned long sched_cpu_util(int cpu); #endif /* CONFIG_SMP */ =20 +#ifdef CONFIG_TASKSHARED + +extern bool taskshrd_delay_resched(void); +extern void taskshrd_delay_resched_fini(void); +extern void taskshrd_delay_resched_tick(void); +#else + +static inline bool taskshrd_delay_resched(void) { return false; } +static inline void taskshrd_delay_resched_fini(void) { } +static inline void taskshrd_delay_resched_tick(void) { } + +#endif + #ifdef CONFIG_SCHED_CORE extern void sched_core_free(struct task_struct *tsk); extern void sched_core_fork(struct task_struct *p); diff --git a/include/uapi/linux/task_shared.h b/include/uapi/linux/task_sha= red.h index a07902c57380..6e4c664eea60 100644 --- a/include/uapi/linux/task_shared.h +++ b/include/uapi/linux/task_shared.h @@ -13,6 +13,6 @@ #define TASK_SHAREDINFO 1 =20 struct task_sharedinfo { - int version; + volatile unsigned short sched_delay; }; #endif diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 11ec8320b59d..0e0360e8c127 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -89,7 +89,8 @@ void __weak arch_do_signal_or_restart(struct pt_regs *reg= s) { } * @ti_work: TIF work flags as read by the caller */ __always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs, - unsigned long ti_work) + unsigned long ti_work, + bool irq) { /* * Before returning to user space ensure that all pending work @@ -99,8 +100,12 @@ __always_inline unsigned long exit_to_user_mode_loop(st= ruct pt_regs *regs, =20 local_irq_enable_exit_to_user(ti_work); =20 - if (ti_work & _TIF_NEED_RESCHED) - schedule(); + if (ti_work & _TIF_NEED_RESCHED) { + if (irq && taskshrd_delay_resched()) + clear_tsk_need_resched(current); + else + schedule(); + } =20 if (ti_work & _TIF_UPROBE) uprobe_notify_resume(regs); @@ -208,7 +213,7 @@ static __always_inline void __syscall_exit_to_user_mode= _work(struct pt_regs *reg { syscall_exit_to_user_mode_prepare(regs); local_irq_disable_exit_to_user(); - exit_to_user_mode_prepare(regs); + exit_to_user_mode_prepare(regs, false); } =20 void syscall_exit_to_user_mode_work(struct pt_regs *regs) @@ -232,7 +237,7 @@ noinstr void irqentry_enter_from_user_mode(struct pt_re= gs *regs) noinstr void irqentry_exit_to_user_mode(struct pt_regs *regs) { instrumentation_begin(); - exit_to_user_mode_prepare(regs); + exit_to_user_mode_prepare(regs, true); instrumentation_end(); exit_to_user_mode(); } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 71b6396db118..713c43491403 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -815,6 +815,7 @@ void update_rq_clock(struct rq *rq) =20 static void hrtick_clear(struct rq *rq) { + taskshrd_delay_resched_tick(); if (hrtimer_active(&rq->hrtick_timer)) hrtimer_cancel(&rq->hrtick_timer); } @@ -830,6 +831,8 @@ static enum hrtimer_restart hrtick(struct hrtimer *time= r) =20 WARN_ON_ONCE(cpu_of(rq) !=3D smp_processor_id()); =20 + taskshrd_delay_resched_tick(); + rq_lock(rq, &rf); update_rq_clock(rq); rq->curr->sched_class->task_tick(rq, rq->curr, 1); @@ -903,6 +906,16 @@ void hrtick_start(struct rq *rq, u64 delay) =20 #endif /* CONFIG_SMP */ =20 +void hrtick_local_start(u64 delay) +{ + struct rq *rq =3D this_rq(); + struct rq_flags rf; + + rq_lock(rq, &rf); + hrtick_start(rq, delay); + rq_unlock(rq, &rf); +} + static void hrtick_rq_init(struct rq *rq) { #ifdef CONFIG_SMP @@ -6645,6 +6658,9 @@ static void __sched notrace __schedule(int sched_mode) picked: clear_tsk_need_resched(prev); clear_preempt_need_resched(); +#ifdef CONFIG_TASKSHARED + prev->taskshrd_sched_delay =3D 0; +#endif #ifdef CONFIG_SCHED_DEBUG rq->last_seen_need_resched_ns =3D 0; #endif diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c index d23c34b8b3eb..0904667924d8 100644 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -1419,6 +1419,13 @@ static void do_sched_yield(void) */ SYSCALL_DEFINE0(sched_yield) { + +#ifdef CONFIG_TASKSHARED + if (current->taskshrd_sched_delay) { + schedule(); + return 0; + } +#endif do_sched_yield(); return 0; } diff --git a/mm/task_shared.c b/mm/task_shared.c index cea45d913b91..575b335d6879 100644 --- a/mm/task_shared.c +++ b/mm/task_shared.c @@ -268,6 +268,56 @@ static int task_ushared_alloc(void) return ret; } =20 +bool taskshrd_delay_resched(void) +{ + struct task_struct *t =3D current; + struct task_ushrd_struct *shrdp =3D t->task_ushrd; + + if (!IS_ENABLED(CONFIG_SCHED_HRTICK)) + return false; + + if(shrdp =3D=3D NULL || shrdp->kaddr =3D=3D NULL) + return false; + + if (t->taskshrd_sched_delay) + return false; + + if (!(shrdp->kaddr->ts.sched_delay)) + return false; + + shrdp->kaddr->ts.sched_delay =3D 0; + t->taskshrd_sched_delay =3D 1; + + return true; +} + +void taskshrd_delay_resched_fini(void) +{ +#ifdef CONFIG_SCHED_HRTICK + struct task_struct *t =3D current; + /* + * IRQs off, guaranteed to return to userspace, start timer on this CPU + * to limit the resched-overdraft. + * + * If your critical section is longer than 50 us you get to keep the + * pieces. + */ + if (t->taskshrd_sched_delay) + hrtick_local_start(50 * NSEC_PER_USEC); +#endif +} + +void taskshrd_delay_resched_tick(void) +{ +#ifdef CONFIG_SCHED_HRTICK + struct task_struct *t =3D current; + + if (t->taskshrd_sched_delay) { + set_tsk_need_resched(t); + } +#endif +} + =20 /* * Get Task Shared structure, allocate if needed and return mapped user ad= dress. --=20 2.43.5 From nobody Sat Nov 23 12:54:31 2024 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 921EF20ED for ; Wed, 13 Nov 2024 00:01:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731456120; cv=none; b=sJ8I/vl2smJLEbTdwy/wN+N6okgV/tmrZ8gS+Kx3TSh/YHg4EjKbejRMR57+8BAKaVl+XNYBTX9QecukyH85KdIoA1Mv0da/NVyatLbkL0j8SXbgrR5YyQnwPblxj91TLwWLSEW9pONblZXD1lL+cCSchV1/mIcV2pkzMOADXDo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731456120; c=relaxed/simple; bh=IY4GAH++/EKRcd8NAEKfW3X6nfJ3TmoSG5uf/OmuG3k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nyyuukB/EYn2SJiSXWgWKFXYJF9pOPHF3X4ysmS+VNKF9bAy8f2InrCYagAjFCoKgSCpzY5WkH65YQqJVJ+p7B4+dPoIkrtm0i3OG6hCAWjeFB2ewH8tzvcDAbrynUkdolDWPzNUTXCgrsJzRvOUIzPPk7VSIJEEgsISugHZOMI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=XFBZXGmi; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="XFBZXGmi" Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4ACN1V8U026399; Wed, 13 Nov 2024 00:01:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=H+sK7 mIUePH4fiExpJB0QMD2IK4GxjsnzM1SkRbELAw=; b=XFBZXGmi/jUxBUb9jddVx 9cu1hGvSFKYLXFZW03aOStliTPBTJwQrZHsYH7992omoC6s0A4Dkp87y29y0Ba/W bK1FwumYgBGsbopOXgEyBRFU171HIpfJXq4FQn++VE+PPcgzDRos3T3zaBIkGoxb E8CnLa8O+JEZ8UtY0ddMYe5VOyE/qA3NLGzpelCkxHpVQJXtNEiMM3zhMbeizsyk EYnlpemkMoYuC+SiOdB91vMaE6ESU7Wg4fGlUzDvOXHEggh2mVHaiq2OBOngF2O/ HQEqe71yPNF/svyyze4SC2xmdq4r5voLgQddOUqowu3S9JiyDZJ/dtL9T5HtYrOa w== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 42t0mbdqxq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Nov 2024 00:01:41 +0000 (GMT) Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 4ACMlZ3M007934; Wed, 13 Nov 2024 00:01:41 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 42sx68uq9m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 13 Nov 2024 00:01:41 +0000 Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 4AD01bKk011807; Wed, 13 Nov 2024 00:01:40 GMT Received: from psang-work.osdevelopmeniad.oraclevcn.com (psang-work.allregionaliads.osdevelopmeniad.oraclevcn.com [100.100.253.35]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 42sx68uq2j-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 13 Nov 2024 00:01:40 +0000 From: Prakash Sangappa To: linux-kernel@vger.kernel.org Cc: rostedt@goodmis.org, peterz@infradead.org, tglx@linutronix.de, daniel.m.jordan@oracle.com, prakash.sangappa@oracle.com Subject: [RFC PATCH 3/4] Indicate if schedular preemption delay request is granted Date: Wed, 13 Nov 2024 00:01:25 +0000 Message-ID: <20241113000126.967713-4-prakash.sangappa@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241113000126.967713-1-prakash.sangappa@oracle.com> References: <20241113000126.967713-1-prakash.sangappa@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-11-12_09,2024-11-12_02,2024-09-30_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 spamscore=0 phishscore=0 bulkscore=0 malwarescore=0 suspectscore=0 mlxscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2409260000 definitions=main-2411120193 X-Proofpoint-GUID: oQEOGIfja_pm_5jzE17-6b99M6fQ-eU_ X-Proofpoint-ORIG-GUID: oQEOGIfja_pm_5jzE17-6b99M6fQ-eU_ Content-Type: text/plain; charset="utf-8" Indicate to user space if the preemption delay request was granted or denied. Signed-off-by: Prakash Sangappa --- include/uapi/linux/task_shared.h | 11 +++++++++++ mm/task_shared.c | 14 +++++++++++--- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/task_shared.h b/include/uapi/linux/task_sha= red.h index 6e4c664eea60..a0f7ef0c69d0 100644 --- a/include/uapi/linux/task_shared.h +++ b/include/uapi/linux/task_shared.h @@ -15,4 +15,15 @@ struct task_sharedinfo { volatile unsigned short sched_delay; }; + +/* + * 'sched_delay' values: + * TASK_PREEMPT_DELAY_REQ - application sets to request preemption delay. + * TASK_PREEMPT_DELAY_GRANTED - set by kernel if granted extended time on = cpu. + * TASK_PREEMPT_DELAY_DENIED- set by kernel if not granted because the + * application requested preemption delay again within the extended ti= me. + */ +#define TASK_PREEMPT_DELAY_REQ 1 +#define TASK_PREEMPT_DELAY_GRANTED 2 +#define TASK_PREEMPT_DELAY_DENIED 3 #endif diff --git a/mm/task_shared.c b/mm/task_shared.c index 575b335d6879..5b8a068a6b44 100644 --- a/mm/task_shared.c +++ b/mm/task_shared.c @@ -279,13 +279,21 @@ bool taskshrd_delay_resched(void) if(shrdp =3D=3D NULL || shrdp->kaddr =3D=3D NULL) return false; =20 - if (t->taskshrd_sched_delay) + if (t->taskshrd_sched_delay) { + if (shrdp->kaddr->ts.sched_delay + =3D=3D TASK_PREEMPT_DELAY_REQ) { + /* not granted */ + shrdp->kaddr->ts.sched_delay + =3D TASK_PREEMPT_DELAY_DENIED; + } return false; + } =20 - if (!(shrdp->kaddr->ts.sched_delay)) + if (shrdp->kaddr->ts.sched_delay !=3D TASK_PREEMPT_DELAY_REQ) return false; =20 - shrdp->kaddr->ts.sched_delay =3D 0; + /* granted */ + shrdp->kaddr->ts.sched_delay =3D TASK_PREEMPT_DELAY_GRANTED; t->taskshrd_sched_delay =3D 1; =20 return true; --=20 2.43.5 From nobody Sat Nov 23 12:54:31 2024 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E6842594 for ; Wed, 13 Nov 2024 00:01:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731456121; cv=none; b=tvQj+fALeqlp9MKgXFlLSPYwOivUEQTB4ck+ty2eBoNfOD2vR3ywfwcG2sfUVvZF9nfBL4zXSss1dYsq3MoWGHQtVOBslFGyj3tpoYRsQJkwWEI1Ei9SZo8UiivnMTj3m9rb9jVTN2LBTdjDMtaGJ5Uga+e4Ocay2T6acBZh6HA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731456121; c=relaxed/simple; bh=TbgoMEuwGi6G4LUaIyabcfrYg0z38clwHZwShAlAZuQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=k+t5Qy0s2XLkTGkw6qpL7YFH7IvocABs/5drPouxgqQmrcMDEUiD1K06k++tm7eHM3MNpzbxRxHA2k/cp1tR0xiIj2Gpyw+wNVQn9VRBCCBsPkVDHy8Mjb30u2i1VbLEWIJBG06vReaHVOR8PDYtTnn3E1eSPLw/j4pBtfZOHlc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=EOWxRHY4; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="EOWxRHY4" Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4ACN1OpF014771; Wed, 13 Nov 2024 00:01:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=vqZgh 5sUUyWoAJsJxhJC7O1mO8aLdHp325y5+0SCYio=; b=EOWxRHY4ZSkbJSrXGQ0DA HkhbltoQVE16mKRBj3UwMrNuou3m6TMWUpBh++M47svdWaet0YxVY3hiyYsecGDM TofI8f/MliOa2HA5s1tZ7N1TgMBIo2AZpuV6oYtwf0Jej4RftbAaYi9leZGf3XWG 0wJLCSMTPqZ5sNnEkh4/BGlDB26efMm/weRBCyF8Cor6SJMooli5LYEIFwZWrSlF sRbOan7KqG0aZ0pWhO2LKjG1Uipq+XnAiOfCqcp0l1IbZHFMKpJFZel4DwYf8gwl +AwpWBG2airGmYiIET/f5omMS+An15tJt/JJXLxCLaqjLDQXNi3YBYiihrwubJ5W g== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 42t0nwnq8v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Nov 2024 00:01:43 +0000 (GMT) Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 4ACMiekd007941; Wed, 13 Nov 2024 00:01:42 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 42sx68uqac-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 13 Nov 2024 00:01:42 +0000 Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 4AD01bKm011807; Wed, 13 Nov 2024 00:01:41 GMT Received: from psang-work.osdevelopmeniad.oraclevcn.com (psang-work.allregionaliads.osdevelopmeniad.oraclevcn.com [100.100.253.35]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 42sx68uq2j-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 13 Nov 2024 00:01:41 +0000 From: Prakash Sangappa To: linux-kernel@vger.kernel.org Cc: rostedt@goodmis.org, peterz@infradead.org, tglx@linutronix.de, daniel.m.jordan@oracle.com, prakash.sangappa@oracle.com Subject: [RFC PATCH 4/4] Add scheduler preemption delay granted stats Date: Wed, 13 Nov 2024 00:01:26 +0000 Message-ID: <20241113000126.967713-5-prakash.sangappa@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241113000126.967713-1-prakash.sangappa@oracle.com> References: <20241113000126.967713-1-prakash.sangappa@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-11-12_09,2024-11-12_02,2024-09-30_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 spamscore=0 phishscore=0 bulkscore=0 malwarescore=0 suspectscore=0 mlxscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2409260000 definitions=main-2411120193 X-Proofpoint-GUID: bz74X_KWIExTMkRz0cGkRtmAcgLpLpH0 X-Proofpoint-ORIG-GUID: bz74X_KWIExTMkRz0cGkRtmAcgLpLpH0 Content-Type: text/plain; charset="utf-8" Add scheduler stats to record number of times preemption delay was granted or denied. Signed-off-by: Prakash Sangappa --- include/linux/sched.h | 8 ++++++++ kernel/sched/core.c | 12 ++++++++++++ kernel/sched/debug.c | 4 ++++ mm/task_shared.c | 2 ++ 4 files changed, 26 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index b53e7a878a01..e3f5760632f4 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -327,6 +327,10 @@ extern void io_schedule_finish(int token); extern long io_schedule_timeout(long timeout); extern void io_schedule(void); extern void hrtick_local_start(u64 delay); +#ifdef CONFIG_TASKSHARED +extern void update_stat_preempt_delayed(struct task_struct *t); +extern void update_stat_preempt_denied(struct task_struct *t); +#endif =20 /** * struct prev_cputime - snapshot of system and user cputime @@ -532,6 +536,10 @@ struct sched_statistics { u64 nr_wakeups_affine_attempts; u64 nr_wakeups_passive; u64 nr_wakeups_idle; +#ifdef CONFIG_TASKSHARED + u64 nr_preempt_delay_granted; + u64 nr_preempt_delay_denied; +#endif =20 #ifdef CONFIG_SCHED_CORE u64 core_forceidle_sum; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 713c43491403..54fa4b68adaf 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -916,6 +916,18 @@ void hrtick_local_start(u64 delay) rq_unlock(rq, &rf); } =20 +#ifdef CONFIG_TASKSHARED +void update_stat_preempt_delayed(struct task_struct *t) +{ + schedstat_inc(t->stats.nr_preempt_delay_granted); +} + +void update_stat_preempt_denied(struct task_struct *t) +{ + schedstat_inc(t->stats.nr_preempt_delay_denied); +} +#endif + static void hrtick_rq_init(struct rq *rq) { #ifdef CONFIG_SMP diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 4a9fbbe843c0..ace7856f13c3 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -1215,6 +1215,10 @@ void proc_sched_show_task(struct task_struct *p, str= uct pid_namespace *ns, P_SCHEDSTAT(nr_wakeups_affine_attempts); P_SCHEDSTAT(nr_wakeups_passive); P_SCHEDSTAT(nr_wakeups_idle); +#ifdef CONFIG_TASKSHARED + P_SCHEDSTAT(nr_preempt_delay_granted); + P_SCHEDSTAT(nr_preempt_delay_denied); +#endif =20 avg_atom =3D p->se.sum_exec_runtime; if (nr_switches) diff --git a/mm/task_shared.c b/mm/task_shared.c index 5b8a068a6b44..35aecc718c8e 100644 --- a/mm/task_shared.c +++ b/mm/task_shared.c @@ -285,6 +285,7 @@ bool taskshrd_delay_resched(void) /* not granted */ shrdp->kaddr->ts.sched_delay =3D TASK_PREEMPT_DELAY_DENIED; + update_stat_preempt_denied(t); } return false; } @@ -295,6 +296,7 @@ bool taskshrd_delay_resched(void) /* granted */ shrdp->kaddr->ts.sched_delay =3D TASK_PREEMPT_DELAY_GRANTED; t->taskshrd_sched_delay =3D 1; + update_stat_preempt_delayed(t); =20 return true; } --=20 2.43.5