From nobody Mon Jun 29 21:05:28 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B007BC433EF for ; Wed, 2 Feb 2022 14:51:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345162AbiBBOu7 (ORCPT ); Wed, 2 Feb 2022 09:50:59 -0500 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]:27504 "EHLO mx0b-00069f02.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232471AbiBBOuz (ORCPT ); Wed, 2 Feb 2022 09:50:55 -0500 Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 212DCJMA026575; Wed, 2 Feb 2022 14:50:52 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=lrqpq6IoCWrPdDLDJOV+NUYMYprJ4k5bUqgJNg/bM4g=; b=0/GhyWnsSZDj6JhhrjgGW50A1rikRU5hCQAQyQXLO7VA0AfqlGYjw5YEex0zC5KwBeT6 qRkRiKvchiiMzDMA4eke3GbWbjL3AoYw6SiVx3dShS2eC41AmcR3LNCemCx/E7/uM6PE WYBbzrGMoZzvMqgfcLHss57/uziKZAuiazGM4yHiXF8l19uyWgaoAFWX3l6LQYlmZ5JH LEp944l1PAYUjsiP42worQBQkg2pvX4Y8mNfMG2nKcyzNNoTK0MGp0uC6qmD6RLLAbPt HIIILaadPSWrTkdG20deBxSw/Ag2IXLGBKJt+N++XLj9Xqs05Lwv8A4Gzi+Aixn9QF5P RA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by mx0b-00069f02.pphosted.com with ESMTP id 3dxj9veayx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 02 Feb 2022 14:50:50 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.1.2/8.16.1.2) with SMTP id 212EfRB7114549; Wed, 2 Feb 2022 14:50:49 GMT Received: from nam12-mw2-obe.outbound.protection.outlook.com (mail-mw2nam12lp2046.outbound.protection.outlook.com [104.47.66.46]) by userp3030.oracle.com with ESMTP id 3dvtq2s84m-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 02 Feb 2022 14:50:49 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VGHz22Neub/VVT+AwvKrwfFbMAwzzyHsCD0cr/BgMN5b2anwrELiuP6BtPAnwWVVvVzFS6SNmZSdfO+ai2kK5S/EQhvVZSifnEq16SnIUYyCK0AH90POnGWDPpYj9/49Tby4elUZkjKvtpe9X2AkcH4HQmLaSh8uS0J7vnOmD2DKNG73Va35HLekeGFjDEn4Yi9hl46sQ2r77mBjAe+ISlUqFQaAtuj/94P8Bh+s+lOEmFixTqm391WXVBoPxK9I2jQPu0erGMHOQwNLABj8HS/XgtWkF9SkFIgBtBq2+xNQbVJdCOfxVHF7kf3UYnOxLR21Il4vc9pVh1jxfX3Fyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lrqpq6IoCWrPdDLDJOV+NUYMYprJ4k5bUqgJNg/bM4g=; b=YS7UUAXTLCxumH84FXP2D0ik6KgnAU6kNIEydHSGo4hJ3+EFxAwHPVvDr127NWUblfDh1Cxy/Q+kwj+rebfFot2NLs7d1WBHsOfs01X/q1M7Ab5LJwckOiQWabcQieKjnVN+Su5hEjR0PBBATMgKn/tP0IT1h/KH1tr555zmz8407PS6T4VbTGdGbRLVFzX5djQbWF1VZ0omeiH6aqJKYQz7zyqw5885ZefNHB7d65QFsXEZqy9ijC6mo2PE05sDIS+hk5DoLr6Lp4VRzmiVEGQLdfwFW/0200CrHLbp6nZHp9JxlbQyv6kUA1zOZ37nZKBoqlAFvuCnk4yclEYQzg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lrqpq6IoCWrPdDLDJOV+NUYMYprJ4k5bUqgJNg/bM4g=; b=G+jxFjnsmqpgWelO7rvdBsQvEf81EwsgIkjM4wrLQXpGdKv9KhS01VF+qSdbpV7M4yNxnfpViJw89zcNFXRr1cR6ar/4X1L9I6YetQgw3QJO1pkPj5+IRPQMi7pLCYP7J6Bid62HIey4ThGJq1/ryfTPjgpVqR93ihabd+30oIU= Received: from CO1PR10MB4468.namprd10.prod.outlook.com (2603:10b6:303:6c::24) by MN2PR10MB3597.namprd10.prod.outlook.com (2603:10b6:208:117::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4930.18; Wed, 2 Feb 2022 14:50:46 +0000 Received: from CO1PR10MB4468.namprd10.prod.outlook.com ([fe80::a5d1:ed4:5ab6:e9b1]) by CO1PR10MB4468.namprd10.prod.outlook.com ([fe80::a5d1:ed4:5ab6:e9b1%3]) with mapi id 15.20.4951.012; Wed, 2 Feb 2022 14:50:46 +0000 From: Imran Khan To: tj@kernel.org, gregkh@linuxfoundation.org Cc: linux-kernel@vger.kernel.org Subject: [PATCH v4 1/2] kernfs: use hashed mutex and spinlock in place of global ones. Date: Thu, 3 Feb 2022 01:50:26 +1100 Message-Id: <20220202145027.723733-2-imran.f.khan@oracle.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220202145027.723733-1-imran.f.khan@oracle.com> References: <20220202145027.723733-1-imran.f.khan@oracle.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: BYAPR05CA0017.namprd05.prod.outlook.com (2603:10b6:a03:c0::30) To CO1PR10MB4468.namprd10.prod.outlook.com (2603:10b6:303:6c::24) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 4d939299-38a8-40d1-6e1c-08d9e65b64ca X-MS-TrafficTypeDiagnostic: MN2PR10MB3597:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:8882; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ug3lQmA4qo68JWtMi54XqHgVi6zflJHKDR13mq0US0RL7zlBV4Awyax55lAOwGzKwRvg81h1rQfkJBr/xOn6SD6n9vFEZMxz47wxgeoEdgIrZzSpAp91b8aTE6xqjiSRe/zKnKFJBwyHqu4MFktMJnBfmj34+hGr1LoZKV8doRXNTedBr0NH98T7ZD5lRJY+EDXdPr0kjlQLfdrvGo7sxj5Jdn1OZQgrlFf9FOaVvWE6QD9yBBI2xJ9HJRDHWEPzpiAa7dqEBxkflK5HzpnL3VSvuINJW2A4Kymbn3lmFVeJodKsJmPM3p88f/2mewPnq0VI/p7KxbY5BprvsHC+YbV7q8vXop/Cem9nuiXXkJ3zT64por2L79adH00cf78lOtR0kiT3hTb/+XWYl0tsXJVHG8IHgRRonkhQ7jsH/jODxE42ZSX2OeA+kIusfa+n0hKVPmAKix/ngc2k+T3adoBAPqGGH2uJ06ID4Dr9ilUSRDadpQlVgjFRZjWFt5QK45M/NbA/S2au4+935Szh++TB8xKsxAXPOzy0F63cgaql6wQEUIFhS+L4g+16jH+q1WmXbDM/9OC5y1AJh89GD1rhR0RQ9OjdAete+gSeGQ10UrrhEkdzjzoNijQTuEOli7FQQTnvCIGo8qZvtTyKKA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CO1PR10MB4468.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(66476007)(1076003)(66556008)(66946007)(2616005)(36756003)(6506007)(52116002)(6666004)(38100700002)(30864003)(6512007)(5660300002)(86362001)(508600001)(6486002)(186003)(2906002)(103116003)(4326008)(8936002)(8676002)(83380400001)(316002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?K7R7J4W+BGWSxuPqVbnIqjZ4hq4KKB9bZvz7yHjnnb8xA+KJIEDTot17X+rn?= =?us-ascii?Q?m81laRroTBVJeL+ulFON//0hOKS5jY5z1lLZX+E4XUSGGhG8cPEt1IpxSsNF?= =?us-ascii?Q?9gL2KCUFVylMDm8dZatX62XkBHyOFqJWYoGHj+EC78POTKL3R93tuHFkJJf1?= =?us-ascii?Q?NI8QXLTgq9P96E/gmk08ceNbrv3J+ZjOHTnDRo/nDWhDnY5I1xW7JKSCGTfe?= =?us-ascii?Q?KJ2YPpbXisnhrS8vm5UVy+G2nfxiqjGtWSGOLnupve2CZHYGSPKpG+t3+t39?= =?us-ascii?Q?itX9KowSP0nVXJNyPGx5PrxMf5Rk7U8C6piz+p7xCXbkPhJc7xOV59Vkevk7?= =?us-ascii?Q?zTrW0314XdZAbK/5Mr5a0v3OD+rVwo9uBKsQW0q7Ie7e1YZ/d2/oleHjOd9j?= =?us-ascii?Q?H3sRmeUrc3i5bskjewpua6SYNpvX0B13VIDFq9eB9J0SSG0LenXnEey2GKSZ?= =?us-ascii?Q?Tr86wiJDZxfJ+tBalYk2lswfFvYkusT2aWN+ZDjhzjQUxmUGf1WdsJmioahT?= =?us-ascii?Q?77AvwBv4Ylk4ezKpW05OcF+jlM9yQI3OCDn2vAdjp4QykawnyIu28yyTSVdr?= =?us-ascii?Q?OHHRU4q+A49U//8TpO77pTLBr7aBPl49mzrpYEOhC7qaBDO9zypbJVrPq3z9?= =?us-ascii?Q?y4uA8Kn8qLVY4yEtJshiIDi9DifTVnkJYuUl4p7XWR9vsmiQvivdxl8iqLou?= =?us-ascii?Q?b4N6L+emWP6LtcnMwHZKHZg5eSwYSCYiXJt+Uzpy+dYWQ/Xxx/EnqRa768d/?= =?us-ascii?Q?witZWfFR1cjGhm842tA0h5c3BgBc282i6J7YiQ4sJwAEOHh7R5ZzRbM1JSAg?= =?us-ascii?Q?wRcSb4Qf7WbgJu5EDjKfZzM8Tlx8HZ58XPYcpR+bmfdHqjEzlmcGiw8QZycu?= =?us-ascii?Q?9E+k8N6qzf63CUA54fYTcqOM1uPHAwsjqFqfwO20ahCZGRYtVyioTjvuNFpf?= =?us-ascii?Q?ClLsPRzo/Q0IkNLG1cK1zn+rxsWH8sg2m2UsEr1kgtkJbL+5hUL0b3jhYFpJ?= =?us-ascii?Q?ysyAXhqLcg9aoxw0lQ/jQ/kfHB1s4mSKRwQHQa8EEjzN+4f5d/CrvEFD0vjy?= =?us-ascii?Q?9IC6/R+s2AW5Yu5aQIharr5701OKsFQ/5GLe/dyt6SRyPzxQS1n6bX6ZhcRY?= =?us-ascii?Q?hi7dT7nPh/geLuakBAXNFnz0a9rZqzV7y5EJajtG6nlYdvFYHmVOaCsjl0tB?= =?us-ascii?Q?2Zd2PtDRTB+kVO8Y8WzqYbkCBwUItxzNCmMPwlgilpLVNsOoESoJFDg4VsdR?= =?us-ascii?Q?v6bIZmlLdNcbqZLSLFancJ5cidj+DHiZsKuWyVK2GVWKuTN4nRU9X0uUufqu?= =?us-ascii?Q?ULPiJ7EZSoTiQ3YcBO1wQVlGSK/7pSbCVjohNZmrTZK/VgunfcLlgzyj/VLX?= =?us-ascii?Q?zKVLEdF4Oo80dX1xkm7Q5L/I5t/7UBAUdj/UvfFUU+4N9U1/blMntPTZZrtH?= =?us-ascii?Q?lmPpsZyDfb8/qXwY/F5hvbr1ptu52JIdMTK2/Drtz05qbZr2fCIStatxjn80?= =?us-ascii?Q?ZEVc7S69SaYqw9iOAHdR1pXDa8ntMzq5DokeBOh+QIez/WdKQNuha4ddG4kL?= =?us-ascii?Q?A1pyt1aWuHQu3z5aeml9QhQf0XsH7jF/Y0km7otfR8d+gQwjbSs+Hj6g9c7C?= =?us-ascii?Q?+WmPEMBK83stnUPIazpL/nYodjgnLNPXAIAK5e4Gcf8B?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4d939299-38a8-40d1-6e1c-08d9e65b64ca X-MS-Exchange-CrossTenant-AuthSource: CO1PR10MB4468.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Feb 2022 14:50:46.1410 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: TwhvgkubX7Va7OddoOTnT0Whvr0Yuvkt/JfHbiJjyen+DdDcsscc/eL2XOaRviXBGvjhMTFuI4guqsqxomJH5A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR10MB3597 X-Proofpoint-Virus-Version: vendor=nai engine=6300 definitions=10246 signatures=673430 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxlogscore=999 adultscore=0 malwarescore=0 bulkscore=0 suspectscore=0 mlxscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2202020081 X-Proofpoint-ORIG-GUID: niviasyNmvWfsmVYGUWMz2HnvlXFKGcV X-Proofpoint-GUID: niviasyNmvWfsmVYGUWMz2HnvlXFKGcV Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Right now a global mutex (kernfs_open_file_mutex) protects list of kernfs_open_file instances corresponding to a sysfs attribute. So even if different tasks are opening or closing different sysfs files they can contend on osq_lock of this mutex. The contention is more apparent in large scale systems with few hundred CPUs where most of the CPUs have running tasks that are opening, accessing or closing sysfs files at any point of time. Since each list of kernfs_open_file belongs to a kernfs_open_node instance which in turn corresponds to one kernfs_node, moving global kernfs_open_file_mutex within kernfs_node would sound like fixing this contention but it has unwanted side effect of bloating up kernfs_node size and hence kobject memory usage. Also since kernfs_node->attr.open points to kernfs_open_node instance corresponding to the kernfs_node, we can use a kernfs_node specific spinlock in place of current global spinlock i.e kernfs_open_node_lock. But this approach will increase kobject memory usage as well. Use per-fs hashed locks in place of above mentioned global locks to reduce kernfs access contention without increasing kobject memory usage. Signed-off-by: Imran Khan --- fs/kernfs/dir.c | 5 +++ fs/kernfs/file.c | 61 ++++++++++++++++--------------------- fs/kernfs/kernfs-internal.h | 51 +++++++++++++++++++++++++++++++ include/linux/kernfs.h | 39 ++++++++++++++++++++++++ 4 files changed, 122 insertions(+), 34 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index e6d9772ddb4ca..d26fb3bffda92 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -909,6 +909,7 @@ struct kernfs_root *kernfs_create_root(struct kernfs_sy= scall_ops *scops, { struct kernfs_root *root; struct kernfs_node *kn; + int lock_count; =20 root =3D kzalloc(sizeof(*root), GFP_KERNEL); if (!root) @@ -916,6 +917,10 @@ struct kernfs_root *kernfs_create_root(struct kernfs_s= yscall_ops *scops, =20 idr_init(&root->ino_idr); init_rwsem(&root->kernfs_rwsem); + for (lock_count =3D 0; lock_count < NR_KERNFS_LOCKS; lock_count++) { + spin_lock_init(&root->open_node_locks[lock_count].lock); + mutex_init(&root->open_file_mutex[lock_count].lock); + } INIT_LIST_HEAD(&root->supers); =20 /* diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index 9414a7a60a9f4..018d038b72fdd 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -18,20 +18,6 @@ =20 #include "kernfs-internal.h" =20 -/* - * There's one kernfs_open_file for each open file and one kernfs_open_node - * for each kernfs_node with one or more open files. - * - * kernfs_node->attr.open points to kernfs_open_node. attr.open is - * protected by kernfs_open_node_lock. - * - * filp->private_data points to seq_file whose ->private points to - * kernfs_open_file. kernfs_open_files are chained at - * kernfs_open_node->files, which is protected by kernfs_open_file_mutex. - */ -static DEFINE_SPINLOCK(kernfs_open_node_lock); -static DEFINE_MUTEX(kernfs_open_file_mutex); - struct kernfs_open_node { atomic_t refcnt; atomic_t event; @@ -524,10 +510,11 @@ static int kernfs_get_open_node(struct kernfs_node *k= n, struct kernfs_open_file *of) { struct kernfs_open_node *on, *new_on =3D NULL; - + struct mutex *mutex =3D NULL; + spinlock_t *lock =3D NULL; retry: - mutex_lock(&kernfs_open_file_mutex); - spin_lock_irq(&kernfs_open_node_lock); + mutex =3D kernfs_open_file_mutex_lock(kn); + lock =3D kernfs_open_node_lock(kn); =20 if (!kn->attr.open && new_on) { kn->attr.open =3D new_on; @@ -540,8 +527,8 @@ static int kernfs_get_open_node(struct kernfs_node *kn, list_add_tail(&of->list, &on->files); } =20 - spin_unlock_irq(&kernfs_open_node_lock); - mutex_unlock(&kernfs_open_file_mutex); + spin_unlock_irq(lock); + mutex_unlock(mutex); =20 if (on) { kfree(new_on); @@ -575,10 +562,14 @@ static void kernfs_put_open_node(struct kernfs_node *= kn, struct kernfs_open_file *of) { struct kernfs_open_node *on =3D kn->attr.open; + struct mutex *mutex =3D NULL; + spinlock_t *lock =3D NULL; unsigned long flags; =20 - mutex_lock(&kernfs_open_file_mutex); - spin_lock_irqsave(&kernfs_open_node_lock, flags); + mutex =3D kernfs_open_file_mutex_lock(kn); + lock =3D kernfs_open_node_lock_ptr(kn); + + spin_lock_irqsave(lock, flags); =20 if (of) list_del(&of->list); @@ -588,8 +579,8 @@ static void kernfs_put_open_node(struct kernfs_node *kn, else on =3D NULL; =20 - spin_unlock_irqrestore(&kernfs_open_node_lock, flags); - mutex_unlock(&kernfs_open_file_mutex); + spin_unlock_irqrestore(lock, flags); + mutex_unlock(mutex); =20 kfree(on); } @@ -729,11 +720,11 @@ static void kernfs_release_file(struct kernfs_node *k= n, /* * @of is guaranteed to have no other file operations in flight and * we just want to synchronize release and drain paths. - * @kernfs_open_file_mutex is enough. @of->mutex can't be used + * @open_file_mutex is enough. @of->mutex can't be used * here because drain path may be called from places which can * cause circular dependency. */ - lockdep_assert_held(&kernfs_open_file_mutex); + lockdep_assert_held(kernfs_open_file_mutex_ptr(kn)); =20 if (!of->released) { /* @@ -750,11 +741,12 @@ static int kernfs_fop_release(struct inode *inode, st= ruct file *filp) { struct kernfs_node *kn =3D inode->i_private; struct kernfs_open_file *of =3D kernfs_of(filp); + struct mutex *lock =3D NULL; =20 if (kn->flags & KERNFS_HAS_RELEASE) { - mutex_lock(&kernfs_open_file_mutex); + lock =3D kernfs_open_file_mutex_lock(kn); kernfs_release_file(kn, of); - mutex_unlock(&kernfs_open_file_mutex); + mutex_unlock(lock); } =20 kernfs_put_open_node(kn, of); @@ -769,19 +761,21 @@ void kernfs_drain_open_files(struct kernfs_node *kn) { struct kernfs_open_node *on; struct kernfs_open_file *of; + struct mutex *mutex =3D NULL; + spinlock_t *lock =3D NULL; =20 if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE))) return; =20 - spin_lock_irq(&kernfs_open_node_lock); + lock =3D kernfs_open_node_lock(kn); on =3D kn->attr.open; if (on) atomic_inc(&on->refcnt); - spin_unlock_irq(&kernfs_open_node_lock); + spin_unlock_irq(lock); if (!on) return; =20 - mutex_lock(&kernfs_open_file_mutex); + mutex =3D kernfs_open_file_mutex_lock(kn); =20 list_for_each_entry(of, &on->files, list) { struct inode *inode =3D file_inode(of->file); @@ -793,8 +787,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn) kernfs_release_file(kn, of); } =20 - mutex_unlock(&kernfs_open_file_mutex); - + mutex_unlock(mutex); kernfs_put_open_node(kn, NULL); } =20 @@ -922,13 +915,13 @@ void kernfs_notify(struct kernfs_node *kn) return; =20 /* kick poll immediately */ - spin_lock_irqsave(&kernfs_open_node_lock, flags); + spin_lock_irqsave(kernfs_open_node_lock_ptr(kn), flags); on =3D kn->attr.open; if (on) { atomic_inc(&on->event); wake_up_interruptible(&on->poll); } - spin_unlock_irqrestore(&kernfs_open_node_lock, flags); + spin_unlock_irqrestore(kernfs_open_node_lock_ptr(kn), flags); =20 /* schedule work to kick fsnotify */ spin_lock_irqsave(&kernfs_notify_lock, flags); diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h index f9cc912c31e1b..cc49a6cd94154 100644 --- a/fs/kernfs/kernfs-internal.h +++ b/fs/kernfs/kernfs-internal.h @@ -31,6 +31,7 @@ struct kernfs_iattrs { atomic_t user_xattr_size; }; =20 + /* +1 to avoid triggering overflow warning when negating it */ #define KN_DEACTIVATED_BIAS (INT_MIN + 1) =20 @@ -147,4 +148,54 @@ void kernfs_drain_open_files(struct kernfs_node *kn); */ extern const struct inode_operations kernfs_symlink_iops; =20 +static inline spinlock_t *kernfs_open_node_lock_ptr(struct kernfs_node *kn) +{ + struct kernfs_root *root; + int idx =3D hash_ptr(kn, NR_KERNFS_LOCK_BITS); + + root =3D kernfs_root(kn); + + return &root->open_node_locks[idx].lock; +} + +static inline spinlock_t *kernfs_open_node_lock(struct kernfs_node *kn) +{ + struct kernfs_root *root; + spinlock_t *lock; + int idx =3D hash_ptr(kn, NR_KERNFS_LOCK_BITS); + + root =3D kernfs_root(kn); + + lock =3D &root->open_node_locks[idx].lock; + + spin_lock_irq(lock); + + return lock; +} + +static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node = *kn) +{ + struct kernfs_root *root; + int idx =3D hash_ptr(kn, NR_KERNFS_LOCK_BITS); + + root =3D kernfs_root(kn); + + return &root->open_file_mutex[idx].lock; +} + +static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node= *kn) +{ + struct kernfs_root *root; + struct mutex *lock; + int idx =3D hash_ptr(kn, NR_KERNFS_LOCK_BITS); + + root =3D kernfs_root(kn); + + lock =3D &root->open_file_mutex[idx].lock; + + mutex_lock(lock); + + return lock; +} + #endif /* __KERNFS_INTERNAL_H */ diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 861c4f0f8a29f..5bf9f02ce9dce 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -18,6 +18,8 @@ #include #include #include +#include +#include =20 struct file; struct dentry; @@ -34,6 +36,40 @@ struct kernfs_fs_context; struct kernfs_open_node; struct kernfs_iattrs; =20 +/* + * NR_KERNFS_LOCK_BITS determines size (NR_KERNFS_LOCKS) of hash + * table of locks. + * Having a small hash table would impact scalability, since + * more and more kernfs_node objects will end up using same lock + * and having a very large hash table would waste memory. + * + * At the moment size of hash table of locks is being set based on + * the number of CPUs as follows: + * + * NR_CPU NR_KERNFS_LOCK_BITS NR_KERNFS_LOCKS + * 1 1 2 + * 2-3 2 4 + * 4-7 4 16 + * 8-15 6 64 + * 16-31 8 256 + * 32 and more 10 1024 + */ +#ifdef CONFIG_SMP +#define NR_KERNFS_LOCK_BITS (2 * (ilog2(NR_CPUS < 32 ? NR_CPUS : 32))) +#else +#define NR_KERNFS_LOCK_BITS 1 +#endif + +#define NR_KERNFS_LOCKS (1 << NR_KERNFS_LOCK_BITS) + +struct kernfs_open_node_lock { + spinlock_t lock; +} ____cacheline_aligned_in_smp; + +struct kernfs_open_file_mutex { + struct mutex lock; +} ____cacheline_aligned_in_smp; + enum kernfs_node_type { KERNFS_DIR =3D 0x0001, KERNFS_FILE =3D 0x0002, @@ -90,6 +126,7 @@ enum kernfs_root_flag { KERNFS_ROOT_SUPPORT_USER_XATTR =3D 0x0008, }; =20 + /* type-specific structures for kernfs_node union members */ struct kernfs_elem_dir { unsigned long subdirs; @@ -201,6 +238,8 @@ struct kernfs_root { =20 wait_queue_head_t deactivate_waitq; struct rw_semaphore kernfs_rwsem; + struct kernfs_open_node_lock open_node_locks[NR_KERNFS_LOCKS]; + struct kernfs_open_file_mutex open_file_mutex[NR_KERNFS_LOCKS]; }; =20 struct kernfs_open_file { --=20 2.30.2 From nobody Mon Jun 29 21:05:28 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C26E1C4332F for ; Wed, 2 Feb 2022 14:51:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345149AbiBBOvD (ORCPT ); Wed, 2 Feb 2022 09:51:03 -0500 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]:29640 "EHLO mx0b-00069f02.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345141AbiBBOu4 (ORCPT ); Wed, 2 Feb 2022 09:50:56 -0500 Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 212DBw9L011396; Wed, 2 Feb 2022 14:50:53 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=KYwYi7co+fBflqIz4SKHnNnrvU51dzzg3JFQjVsXtNA=; b=q8NbgtU/GXAiHmI1EEHs3U3U9iAJbFGnWgRO4PbL9IICLfs1qw+6alwDdOuM5G5vfa5l cZ2tEGPjvYsqCwmBBEJSYSvH2/EFL1wWKozpVqgbpDp8M45YpP0wI/G8g15N8XHzd8/W qi8lsfkdDp/Ms6Er5el6lNBbrDkwApRUqCCw3OY9ST9BZZFUZY0iWezZP+i+3YGKd66W ISQShJ/b7YTIwdO+8/O+cOBxFAKRxP9SZoMib113X/kesI7VAvkKRJgEAeYjGC008s2e HnECI/+iHMz3UcTKu7/v73TOgkAMYGD29aMJbRoLloqPKKFXf1lRTmr/hhiQryDfQpAN RQ== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by mx0b-00069f02.pphosted.com with ESMTP id 3dxj9fx9g7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 02 Feb 2022 14:50:52 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.1.2/8.16.1.2) with SMTP id 212EfjNS148873; Wed, 2 Feb 2022 14:50:51 GMT Received: from nam12-bn8-obe.outbound.protection.outlook.com (mail-bn8nam12lp2175.outbound.protection.outlook.com [104.47.55.175]) by aserp3030.oracle.com with ESMTP id 3dvumhhpsg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 02 Feb 2022 14:50:51 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ewEW40s753k7DU3mtMxEd0caeozja8ogONeSdSn6IY74RLlqbC0LLcMv+OxF/2ChN64JN2yqjtdKcjBpXdxMvDCqPRpD4/mXEj6UDhQvwkQN9T3prkY5Vx4x9bKLw5JaPlsKUQQYwMhMrFwBSHItl0f1Lqi/b6QCaiaP8dKkoyxDj4cBKysZwevuMoGqkxf++zbKWsW7LSramvIuzNp3aZtSPqXAx6lAj68w9oFhrXsetpz2iWdwOQdAHglNpVOUhnvAOVSCxua79ujJC16/O+7wVVzmSvhS8pH/3lvX2aTRvZSyCfV3bI4ygoZRgrc71iL63eEcFt9FyJX4uwBA3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KYwYi7co+fBflqIz4SKHnNnrvU51dzzg3JFQjVsXtNA=; b=Gaow00fW9Gnmyh/PNZ8aAOzNYDemdZM6sjDNbfVyInKPV+FvhDBmYKOx3Kj2UMoHwb1R8I+A/b2G02ZCWWovILkDYwT+Z1mRGq4uRO3+WDtuHRhMLVCbf5ZPDxT2IlqKrxng9j2RRl89/AZ/zhHB4hdiHgvdTsF8lDJEZ4cGVOovxLXR+p7a/O9ANLpOoUxCUXx0e8eCnbQmxS1oPF6BtUbgkNQG2n1EPRIN0taVc3PT0yUu/18VIlIsB36atIllPd23zVACRbC9Ojn/8acHr1pmXoJ7nXOXxa12+8gURxYTB5uiMonQuQDH+98e2NLjQA2LyCKxWebssw0HVm4auQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KYwYi7co+fBflqIz4SKHnNnrvU51dzzg3JFQjVsXtNA=; b=a/TVfHi9I9ilkE3zRNc190C+Td9XhVsiANV0EKbyUraUexbmflIz+2ZL5kBt12CVhT6ryJD7RnwEgksDS5JUqUhCKi7vhWD0lKJPReNQ941JpB9O8+vaaB2k/o2kdMM8TzwCj6iqsmM4WW2FfrcqdsBVIja/lbBbStzfd74GqW8= Received: from CO1PR10MB4468.namprd10.prod.outlook.com (2603:10b6:303:6c::24) by MN2PR10MB3597.namprd10.prod.outlook.com (2603:10b6:208:117::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4930.18; Wed, 2 Feb 2022 14:50:49 +0000 Received: from CO1PR10MB4468.namprd10.prod.outlook.com ([fe80::a5d1:ed4:5ab6:e9b1]) by CO1PR10MB4468.namprd10.prod.outlook.com ([fe80::a5d1:ed4:5ab6:e9b1%3]) with mapi id 15.20.4951.012; Wed, 2 Feb 2022 14:50:49 +0000 From: Imran Khan To: tj@kernel.org, gregkh@linuxfoundation.org Cc: linux-kernel@vger.kernel.org Subject: [PATCH v4 2/2] kernfs: Replace per-fs global rwsem with per-fs hashed rwsem. Date: Thu, 3 Feb 2022 01:50:27 +1100 Message-Id: <20220202145027.723733-3-imran.f.khan@oracle.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220202145027.723733-1-imran.f.khan@oracle.com> References: <20220202145027.723733-1-imran.f.khan@oracle.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: BYAPR05CA0017.namprd05.prod.outlook.com (2603:10b6:a03:c0::30) To CO1PR10MB4468.namprd10.prod.outlook.com (2603:10b6:303:6c::24) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 2d6ecebb-c530-440b-e0fc-08d9e65b6640 X-MS-TrafficTypeDiagnostic: MN2PR10MB3597:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:6430; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: qdz5REPFyzx5u90gd0arQ/AufZzFkrtZ4CD7/hItLbpQLXvN+yqZqsfRD34AgD1VBI4Uo/ko6C4V47d+DkmVkD0+LIXxNuAE0L4JzPFF9cJrRktew2ZyxH20Zh65TxnFTdjmZ+C9xUPns8x/KI60tLTYBtAJmtD4/3RmwixeV4DlEK8S5b34VIj8qXltxD4yVy979eWPA36NMsWEhvCkSgT79eu3S/+Wum0BHWSw2j0J1G7o+o51FRbz77sFjKU5ZEe/0sBwEOvqyFSzrloSG0J8aqeHWAa/HygxE6PQPdqof60Jp8DPAsc3YjftKru7rZh6zl6ObRqcTnBHFARKcDs58iw7+zsbaYnUwUi9tl/kutXLFN3nsyY3KnhLGjZK0V38Km52/tGNaVE31JdFN5SJZbG0Tu61/ZHMioKPokMmUxGeyQBr7U7km8U0ct6iKq/iJEvTJ6aNVU0ksn4fBMvXNen3mcJqISmvHJoLw9Reg0MRaZt1nbSU0x3zRmNNCqRz6/97A3kvi1tu9Ubx+kQvAF6iML7daGkPRlP59+uqqafAVGd+SrKxQnjBnkY26PTOv7oCZh0akaW92A5r8tXcHHtK1V35Oyh35CxmTrBZkSz7oXXskCRDyO8ikZx3ITblK3Ihu0AzV5aG6cwg5/3nLxnuErPwh4o8srTOpFNj/CUCvSrvC/PzHxRl7ds6 X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CO1PR10MB4468.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(66476007)(1076003)(66556008)(66946007)(2616005)(36756003)(6506007)(52116002)(6666004)(38100700002)(30864003)(6512007)(5660300002)(86362001)(508600001)(6486002)(186003)(2906002)(103116003)(4326008)(8936002)(8676002)(83380400001)(316002)(21314003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?B8W3v2QP8dFivXLzP7FoYNlSryxOey5QEVz3Y2ZSDpUFxCvaysbSKW7y4GlP?= =?us-ascii?Q?w2DOTNHmVOGTwFkLi43VV1ZGsyLAon3yR8UCzA8TeIZURH+vbyKEwdzru9t6?= =?us-ascii?Q?xdiZvzRVNO+Y5OlOPKrqQnMPogj3aF24H5pBIm5VJ9///HsuhMPIWj3dH0ka?= =?us-ascii?Q?Sm2Y2mPpZVgroj4LIDNF9bQ6Q0vbRpYtZrZAOPKN4j2y8AduXPoF4wm3JBw1?= =?us-ascii?Q?UoSvxy13mZMuGmf+HSm+ljmCOw44bF9RObqvk0+4KEDcAbtoZXwve6HUQCxq?= =?us-ascii?Q?I47TID6icZbSIXJnQ4/pA+WU+7PZRChEGJoCNDEq8sjRVtoNiWmDT9q9kty1?= =?us-ascii?Q?cL3HUE2niHVzNfdOKPSq6D1sJaNuy1IJEP6jkzMSpDnm+SWVoPtWfwOU3zbQ?= =?us-ascii?Q?BephxeXqDIyQiMo4/r1fCM5vPfUIZEGmNxbsdlaY/Go5l/XEDl1h2OIc4Sjq?= =?us-ascii?Q?rfsf74mUmaC1As9XE8pyoCOAVNhzGyxw73HMMYkyU8FebFJ402ljluoRRCAi?= =?us-ascii?Q?VjK1XdswqSlvcphOZB7Uwsh99VviAfNFKt8xKHCLINwv0Ymm2KFAhsNNfpQQ?= =?us-ascii?Q?I+7oL4/oMWG3NiP5dAcOfTPgIw2jSwrOVUEj1Pvj6uYW1v702SQN0K3po76y?= =?us-ascii?Q?9Rik5oJRwwG9zD5625j4pYuQ6VmM5Ez27buzcPBzaBKCRgYRKel75IMV3O/m?= =?us-ascii?Q?wTTRlbNuFT4KCtDHczF4rpi7qRD5oLa9HA+ExzlFBS2pK+qV2Di50dDFk3EX?= =?us-ascii?Q?YJgPMWKjbO13/iLrrVk2fhXa3cOhTg18NzcHXfifY9ARib1JmhoHpt5q9+nZ?= =?us-ascii?Q?EHt2g6SW3l6UwyxrDYv1HTviC7QzfC86jWlP3jNIsxlikubg8RFRyWRVGi5Q?= =?us-ascii?Q?UXXlSOpsLxumxqxUd08tEoDgFzo7ICzlMdgQf1TiAMKEABCZMja+pvUr/Npl?= =?us-ascii?Q?foHeJmMHsTJdjxv28Z5v8L9Ymti9pIRu8bvUgaAwVRKnS1vBgIkjIvgLlvAo?= =?us-ascii?Q?HXNxfMsdLkxzH0XwUgn39TXXPUDR+onEhS9D9Q3eipht+omonNuZ2t8NBWgU?= =?us-ascii?Q?JJqusU9ECdMle2yBnyqoEKagYMTq85Vty6HD+6hAf4yfLMbRahd75W82rc+/?= =?us-ascii?Q?XdnlcspnkwxK4GLY31YGjDClfvQ3Yyhh/8XADD9xwGB0Kh23RyPkCfCFU61V?= =?us-ascii?Q?pVD1zr8aSQr1U55p9Sbd4n1fLR/+BZog7JXl0v32JLTQqgqES5y3scpKgcJ+?= =?us-ascii?Q?3PARyJUvdJFU8FmtMEvhWoDos2I4CHFYNmQGxu9xgazLt11Q4BuczdGTc/QX?= =?us-ascii?Q?sFRxACsjwI5E6RWYmdTHTDSkXXr0QHu0Hnh2bHh8Q/CpgLvFggvtfa6Szo6E?= =?us-ascii?Q?xrMuJb/mkfCLmmw+542k7EqvMfzm1USs2B+rOixthcXew+IoxbYhqFjVEvD9?= =?us-ascii?Q?1Apq2mC2suw3lbPmtfxXxvlgTYc3DtWOFCIUKev0enwHR+kff0KiYFIx+2OV?= =?us-ascii?Q?ho06v2punt9WvbdImRMkYYSrK+nY5MQW5zJGm9cwTAb860s5XKY2VRPH+MIN?= =?us-ascii?Q?GUfcdwcuTayNvv7SYQf7LXn6ytFXWDzfV+/a8sNnpDc703D0VlFN8aiLU1LH?= =?us-ascii?Q?9MmrVe6Jdo3r7xhdREsimL6nvXUVS1ehw22j/+60eUed?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2d6ecebb-c530-440b-e0fc-08d9e65b6640 X-MS-Exchange-CrossTenant-AuthSource: CO1PR10MB4468.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Feb 2022 14:50:48.9231 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 7GrFfY6Qg7hcXTxWdMla/6tkDyVrDe7P/eDKOcJxNwwkTbc+0RFYdScocs2pEr3PW1idvmpAhV9AoIwnbDDP+Q== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR10MB3597 X-Proofpoint-Virus-Version: vendor=nai engine=6300 definitions=10246 signatures=673430 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 malwarescore=0 phishscore=0 mlxscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2202020081 X-Proofpoint-GUID: rbEUv1Y-ZJyODPx6Gb-IhE84fXekPuCx X-Proofpoint-ORIG-GUID: rbEUv1Y-ZJyODPx6Gb-IhE84fXekPuCx Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Having a single rwsem to synchronize all operations across a kernfs based file system (cgroup, sysfs etc.) does not scale well. Replace it with a hashed rwsem to reduce contention around single per-fs rwsem. Also introduce a perfs rwsem to protect per-fs list of kernfs_super_info. Signed-off-by: Imran Khan Reported-by: kernel test robot --- fs/kernfs/dir.c | 268 +++++++++++++++++++++++++----------- fs/kernfs/file.c | 6 +- fs/kernfs/inode.c | 18 ++- fs/kernfs/kernfs-internal.h | 112 +++++++++++++++ fs/kernfs/mount.c | 13 +- fs/kernfs/symlink.c | 5 +- include/linux/kernfs.h | 5 +- 7 files changed, 325 insertions(+), 102 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index d26fb3bffda92..89645ba453ab8 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -25,7 +25,9 @@ static DEFINE_SPINLOCK(kernfs_idr_lock); /* root->ino_idr= */ =20 static bool kernfs_active(struct kernfs_node *kn) { - lockdep_assert_held(&kernfs_root(kn)->kernfs_rwsem); + int idx =3D hash_ptr(kn, NR_KERNFS_LOCK_BITS); + + lockdep_assert_held(&kernfs_root(kn)->kernfs_rwsem[idx]); return atomic_read(&kn->active) >=3D 0; } =20 @@ -455,35 +457,36 @@ void kernfs_put_active(struct kernfs_node *kn) * removers may invoke this function concurrently on @kn and all will * return after draining is complete. */ -static void kernfs_drain(struct kernfs_node *kn) - __releases(&kernfs_root(kn)->kernfs_rwsem) - __acquires(&kernfs_root(kn)->kernfs_rwsem) +static void kernfs_drain(struct kernfs_node *kn, struct kernfs_node *anc) + __releases(&kernfs_root(anc)->kernfs_rwsem[a_idx]) + __acquires(&kernfs_root(anc)->kernfs_rwsem[a_idx]) { struct kernfs_root *root =3D kernfs_root(kn); + int a_idx =3D hash_ptr(anc, NR_KERNFS_LOCK_BITS); =20 - lockdep_assert_held_write(&root->kernfs_rwsem); - WARN_ON_ONCE(kernfs_active(kn)); + lockdep_assert_held_write(&root->kernfs_rwsem[a_idx]); + WARN_ON_ONCE(atomic_read(&kn->active) >=3D 0); =20 - up_write(&root->kernfs_rwsem); + up_write_kernfs_rwsem(anc); =20 - if (kernfs_lockdep(kn)) { - rwsem_acquire(&kn->dep_map, 0, 0, _RET_IP_); - if (atomic_read(&kn->active) !=3D KN_DEACTIVATED_BIAS) - lock_contended(&kn->dep_map, _RET_IP_); + if (kernfs_lockdep(anc)) { + rwsem_acquire(&anc->dep_map, 0, 0, _RET_IP_); + if (atomic_read(&anc->active) !=3D KN_DEACTIVATED_BIAS) + lock_contended(&anc->dep_map, _RET_IP_); } =20 /* but everyone should wait for draining */ wait_event(root->deactivate_waitq, atomic_read(&kn->active) =3D=3D KN_DEACTIVATED_BIAS); =20 - if (kernfs_lockdep(kn)) { - lock_acquired(&kn->dep_map, _RET_IP_); - rwsem_release(&kn->dep_map, _RET_IP_); + if (kernfs_lockdep(anc)) { + lock_acquired(&anc->dep_map, _RET_IP_); + rwsem_release(&anc->dep_map, _RET_IP_); } =20 kernfs_drain_open_files(kn); =20 - down_write(&root->kernfs_rwsem); + down_write_kernfs_rwsem(anc, LOCK_SELF, 0); } =20 /** @@ -718,12 +721,11 @@ struct kernfs_node *kernfs_find_and_get_node_by_id(st= ruct kernfs_root *root, int kernfs_add_one(struct kernfs_node *kn) { struct kernfs_node *parent =3D kn->parent; - struct kernfs_root *root =3D kernfs_root(parent); struct kernfs_iattrs *ps_iattr; bool has_ns; int ret; =20 - down_write(&root->kernfs_rwsem); + down_write_kernfs_rwsem(parent, LOCK_SELF, 0); =20 ret =3D -EINVAL; has_ns =3D kernfs_ns_enabled(parent); @@ -754,7 +756,7 @@ int kernfs_add_one(struct kernfs_node *kn) ps_iattr->ia_mtime =3D ps_iattr->ia_ctime; } =20 - up_write(&root->kernfs_rwsem); + up_write_kernfs_rwsem(parent); =20 /* * Activate the new node unless CREATE_DEACTIVATED is requested. @@ -768,7 +770,7 @@ int kernfs_add_one(struct kernfs_node *kn) return 0; =20 out_unlock: - up_write(&root->kernfs_rwsem); + up_write_kernfs_rwsem(parent); return ret; } =20 @@ -788,8 +790,9 @@ static struct kernfs_node *kernfs_find_ns(struct kernfs= _node *parent, struct rb_node *node =3D parent->dir.children.rb_node; bool has_ns =3D kernfs_ns_enabled(parent); unsigned int hash; + int idx =3D hash_ptr(parent, NR_KERNFS_LOCK_BITS); =20 - lockdep_assert_held(&kernfs_root(parent)->kernfs_rwsem); + lockdep_assert_held(&kernfs_root(parent)->kernfs_rwsem[idx]); =20 if (has_ns !=3D (bool)ns) { WARN(1, KERN_WARNING "kernfs: ns %s in '%s' for '%s'\n", @@ -820,8 +823,9 @@ static struct kernfs_node *kernfs_walk_ns(struct kernfs= _node *parent, { size_t len; char *p, *name; + int idx =3D hash_ptr(parent, NR_KERNFS_LOCK_BITS); =20 - lockdep_assert_held_read(&kernfs_root(parent)->kernfs_rwsem); + lockdep_assert_held_read(&kernfs_root(parent)->kernfs_rwsem[idx]); =20 /* grab kernfs_rename_lock to piggy back on kernfs_pr_cont_buf */ spin_lock_irq(&kernfs_rename_lock); @@ -860,12 +864,11 @@ struct kernfs_node *kernfs_find_and_get_ns(struct ker= nfs_node *parent, const char *name, const void *ns) { struct kernfs_node *kn; - struct kernfs_root *root =3D kernfs_root(parent); =20 - down_read(&root->kernfs_rwsem); + down_read_kernfs_rwsem(parent, LOCK_SELF, 0); kn =3D kernfs_find_ns(parent, name, ns); kernfs_get(kn); - up_read(&root->kernfs_rwsem); + up_read_kernfs_rwsem(parent); =20 return kn; } @@ -885,12 +888,11 @@ struct kernfs_node *kernfs_walk_and_get_ns(struct ker= nfs_node *parent, const char *path, const void *ns) { struct kernfs_node *kn; - struct kernfs_root *root =3D kernfs_root(parent); =20 - down_read(&root->kernfs_rwsem); + down_read_kernfs_rwsem(parent, LOCK_SELF, 0); kn =3D kernfs_walk_ns(parent, path, ns); kernfs_get(kn); - up_read(&root->kernfs_rwsem); + up_read_kernfs_rwsem(parent); =20 return kn; } @@ -916,11 +918,12 @@ struct kernfs_root *kernfs_create_root(struct kernfs_= syscall_ops *scops, return ERR_PTR(-ENOMEM); =20 idr_init(&root->ino_idr); - init_rwsem(&root->kernfs_rwsem); for (lock_count =3D 0; lock_count < NR_KERNFS_LOCKS; lock_count++) { spin_lock_init(&root->open_node_locks[lock_count].lock); mutex_init(&root->open_file_mutex[lock_count].lock); + init_rwsem(&root->kernfs_rwsem[lock_count]); } + init_rwsem(&root->supers_rwsem); INIT_LIST_HEAD(&root->supers); =20 /* @@ -1067,12 +1070,12 @@ static int kernfs_dop_revalidate(struct dentry *den= try, unsigned int flags) if (parent) { spin_unlock(&dentry->d_lock); root =3D kernfs_root(parent); - down_read(&root->kernfs_rwsem); + down_read_kernfs_rwsem(parent, LOCK_SELF, 0); if (kernfs_dir_changed(parent, dentry)) { - up_read(&root->kernfs_rwsem); + up_read_kernfs_rwsem(parent); return 0; } - up_read(&root->kernfs_rwsem); + up_read_kernfs_rwsem(parent); } else spin_unlock(&dentry->d_lock); =20 @@ -1084,7 +1087,7 @@ static int kernfs_dop_revalidate(struct dentry *dentr= y, unsigned int flags) =20 kn =3D kernfs_dentry_node(dentry); root =3D kernfs_root(kn); - down_read(&root->kernfs_rwsem); + down_read_kernfs_rwsem(kn, LOCK_SELF, 0); =20 /* The kernfs node has been deactivated */ if (!kernfs_active(kn)) @@ -1103,10 +1106,10 @@ static int kernfs_dop_revalidate(struct dentry *den= try, unsigned int flags) kernfs_info(dentry->d_sb)->ns !=3D kn->ns) goto out_bad; =20 - up_read(&root->kernfs_rwsem); + up_read_kernfs_rwsem(kn); return 1; out_bad: - up_read(&root->kernfs_rwsem); + up_read_kernfs_rwsem(kn); return 0; } =20 @@ -1125,23 +1128,28 @@ static struct dentry *kernfs_iop_lookup(struct inod= e *dir, const void *ns =3D NULL; =20 root =3D kernfs_root(parent); - down_read(&root->kernfs_rwsem); + + down_read_kernfs_rwsem(parent, LOCK_SELF, 0); if (kernfs_ns_enabled(parent)) ns =3D kernfs_info(dir->i_sb)->ns; =20 kn =3D kernfs_find_ns(parent, dentry->d_name.name, ns); + up_read_kernfs_rwsem(parent); /* attach dentry and inode */ if (kn) { /* Inactive nodes are invisible to the VFS so don't * create a negative. */ + down_read_kernfs_rwsem(kn, LOCK_SELF, 0); if (!kernfs_active(kn)) { - up_read(&root->kernfs_rwsem); + /* Unlock both node and parent before returning */ + up_read_kernfs_rwsem(kn); return NULL; } inode =3D kernfs_get_inode(dir->i_sb, kn); if (!inode) inode =3D ERR_PTR(-ENOMEM); + up_read_kernfs_rwsem(kn); } /* * Needed for negative dentry validation. @@ -1149,9 +1157,10 @@ static struct dentry *kernfs_iop_lookup(struct inode= *dir, * or transforms from positive dentry in dentry_unlink_inode() * called from vfs_rmdir(). */ + down_read_kernfs_rwsem(parent, LOCK_SELF, 0); if (!IS_ERR(inode)) kernfs_set_rev(parent, dentry); - up_read(&root->kernfs_rwsem); + up_read_kernfs_rwsem(parent); =20 /* instantiate and hash (possibly negative) dentry */ return d_splice_alias(inode, dentry); @@ -1273,8 +1282,9 @@ static struct kernfs_node *kernfs_next_descendant_pos= t(struct kernfs_node *pos, struct kernfs_node *root) { struct rb_node *rbn; + int idx =3D hash_ptr(root, NR_KERNFS_LOCK_BITS); =20 - lockdep_assert_held_write(&kernfs_root(root)->kernfs_rwsem); + lockdep_assert_held_write(&kernfs_root(root)->kernfs_rwsem[idx]); =20 /* if first iteration, visit leftmost descendant which may be root */ if (!pos) @@ -1309,9 +1319,8 @@ static struct kernfs_node *kernfs_next_descendant_pos= t(struct kernfs_node *pos, void kernfs_activate(struct kernfs_node *kn) { struct kernfs_node *pos; - struct kernfs_root *root =3D kernfs_root(kn); =20 - down_write(&root->kernfs_rwsem); + down_write_kernfs_rwsem(kn, LOCK_SELF, 0); =20 pos =3D NULL; while ((pos =3D kernfs_next_descendant_post(pos, kn))) { @@ -1325,14 +1334,15 @@ void kernfs_activate(struct kernfs_node *kn) pos->flags |=3D KERNFS_ACTIVATED; } =20 - up_write(&root->kernfs_rwsem); + up_write_kernfs_rwsem(kn); } =20 static void __kernfs_remove(struct kernfs_node *kn) { struct kernfs_node *pos; + int idx =3D hash_ptr(kn, NR_KERNFS_LOCK_BITS); =20 - lockdep_assert_held_write(&kernfs_root(kn)->kernfs_rwsem); + lockdep_assert_held_write(&kernfs_root(kn)->kernfs_rwsem[idx]); =20 /* * Short-circuit if non-root @kn has already finished removal. @@ -1346,9 +1356,16 @@ static void __kernfs_remove(struct kernfs_node *kn) =20 /* prevent any new usage under @kn by deactivating all nodes */ pos =3D NULL; - while ((pos =3D kernfs_next_descendant_post(pos, kn))) + while ((pos =3D kernfs_next_descendant_post(pos, kn))) { + int n_idx =3D hash_ptr(pos, NR_KERNFS_LOCK_BITS); + + if (n_idx !=3D idx) + down_write_kernfs_rwsem(pos, LOCK_SELF, 1); if (kernfs_active(pos)) atomic_add(KN_DEACTIVATED_BIAS, &pos->active); + if (n_idx !=3D idx) + up_write_kernfs_rwsem(pos); + } =20 /* deactivate and unlink the subtree node-by-node */ do { @@ -1369,7 +1386,7 @@ static void __kernfs_remove(struct kernfs_node *kn) * error paths without worrying about draining. */ if (kn->flags & KERNFS_ACTIVATED) - kernfs_drain(pos); + kernfs_drain(pos, kn); else WARN_ON_ONCE(atomic_read(&kn->active) !=3D KN_DEACTIVATED_BIAS); =20 @@ -1402,11 +1419,9 @@ static void __kernfs_remove(struct kernfs_node *kn) */ void kernfs_remove(struct kernfs_node *kn) { - struct kernfs_root *root =3D kernfs_root(kn); - - down_write(&root->kernfs_rwsem); + down_write_kernfs_rwsem(kn, LOCK_SELF, 0); __kernfs_remove(kn); - up_write(&root->kernfs_rwsem); + up_write_kernfs_rwsem(kn); } =20 /** @@ -1492,9 +1507,8 @@ void kernfs_unbreak_active_protection(struct kernfs_n= ode *kn) bool kernfs_remove_self(struct kernfs_node *kn) { bool ret; - struct kernfs_root *root =3D kernfs_root(kn); =20 - down_write(&root->kernfs_rwsem); + down_write_kernfs_rwsem(kn, LOCK_SELF, 0); kernfs_break_active_protection(kn); =20 /* @@ -1522,9 +1536,9 @@ bool kernfs_remove_self(struct kernfs_node *kn) atomic_read(&kn->active) =3D=3D KN_DEACTIVATED_BIAS) break; =20 - up_write(&root->kernfs_rwsem); + up_write_kernfs_rwsem(kn); schedule(); - down_write(&root->kernfs_rwsem); + down_write_kernfs_rwsem(kn, LOCK_SELF, 0); } finish_wait(waitq, &wait); WARN_ON_ONCE(!RB_EMPTY_NODE(&kn->rb)); @@ -1537,7 +1551,7 @@ bool kernfs_remove_self(struct kernfs_node *kn) */ kernfs_unbreak_active_protection(kn); =20 - up_write(&root->kernfs_rwsem); + up_write_kernfs_rwsem(kn); return ret; } =20 @@ -1555,7 +1569,7 @@ int kernfs_remove_by_name_ns(struct kernfs_node *pare= nt, const char *name, { struct kernfs_node *kn; struct kernfs_root *root; - + int idx, p_idx; if (!parent) { WARN(1, KERN_WARNING "kernfs: can not remove '%s', no directory\n", name); @@ -1563,13 +1577,15 @@ int kernfs_remove_by_name_ns(struct kernfs_node *pa= rent, const char *name, } =20 root =3D kernfs_root(parent); - down_write(&root->kernfs_rwsem); + down_write_kernfs_rwsem(parent, LOCK_SELF, 0); =20 kn =3D kernfs_find_ns(parent, name, ns); - if (kn) + up_write_kernfs_rwsem(parent); + if (kn) { + down_write_kernfs_rwsem(kn, LOCK_SELF, 0); __kernfs_remove(kn); - - up_write(&root->kernfs_rwsem); + up_write_kernfs_rwsem(kn); + } =20 if (kn) return 0; @@ -1590,35 +1606,66 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct= kernfs_node *new_parent, struct kernfs_node *old_parent; struct kernfs_root *root; const char *old_name =3D NULL; - int error; + int error, idx, np_idx, p_idx; =20 /* can't move or rename root */ if (!kn->parent) return -EINVAL; =20 root =3D kernfs_root(kn); - down_write(&root->kernfs_rwsem); + + /* + * Take lock of node's old (current) parent. + * If new parent has a different lock, then take that + * lock as well. + */ + idx =3D hash_ptr(kn, NR_KERNFS_LOCK_BITS); + p_idx =3D hash_ptr(kn->parent, NR_KERNFS_LOCK_BITS); + np_idx =3D hash_ptr(new_parent, NR_KERNFS_LOCK_BITS); + + /* + * Take only kn's lock. The subsequent kernfs_put + * may free up old_parent so if old_parent has a + * different lock, we will explicitly release that. + */ + down_write_kernfs_rwsem(kn, LOCK_SELF, 0); + + if (idx !=3D np_idx) /* new parent hashes to different lock */ + down_write_kernfs_rwsem(new_parent, LOCK_SELF, 1); + + /* old_parent hashes to a different lock */ + if (idx !=3D p_idx && p_idx !=3D np_idx) + down_write_kernfs_rwsem(kn->parent, LOCK_SELF, 2); =20 error =3D -ENOENT; if (!kernfs_active(kn) || !kernfs_active(new_parent) || - (new_parent->flags & KERNFS_EMPTY_DIR)) + (new_parent->flags & KERNFS_EMPTY_DIR)) { + if (idx !=3D p_idx && p_idx !=3D np_idx) + up_write_kernfs_rwsem(kn->parent); goto out; - + } error =3D 0; if ((kn->parent =3D=3D new_parent) && (kn->ns =3D=3D new_ns) && - (strcmp(kn->name, new_name) =3D=3D 0)) + (strcmp(kn->name, new_name) =3D=3D 0)) { + if (idx !=3D p_idx && p_idx !=3D np_idx) + up_write_kernfs_rwsem(kn->parent); goto out; /* nothing to rename */ - + } error =3D -EEXIST; - if (kernfs_find_ns(new_parent, new_name, new_ns)) + if (kernfs_find_ns(new_parent, new_name, new_ns)) { + if (idx !=3D p_idx && p_idx !=3D np_idx) + up_write_kernfs_rwsem(kn->parent); goto out; - + } /* rename kernfs_node */ if (strcmp(kn->name, new_name) !=3D 0) { error =3D -ENOMEM; new_name =3D kstrdup_const(new_name, GFP_KERNEL); - if (!new_name) + if (!new_name) { + if (idx !=3D p_idx && p_idx !=3D np_idx) + up_write_kernfs_rwsem(kn->parent); goto out; + } } else { new_name =3D NULL; } @@ -1646,12 +1693,22 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct= kernfs_node *new_parent, kn->hash =3D kernfs_name_hash(kn->name, kn->ns); kernfs_link_sibling(kn); =20 + /* Release old_parent's lock, if it is different */ + if (idx !=3D p_idx && p_idx !=3D np_idx) + up_write_kernfs_rwsem(old_parent); kernfs_put(old_parent); kfree_const(old_name); =20 error =3D 0; out: - up_write(&root->kernfs_rwsem); + /* + * If new parent lock has been taken release it. + * Lastly release node's lock. + */ + if (idx !=3D np_idx) /* new parent hashes to different lock */ + up_write_kernfs_rwsem(new_parent); + + up_write_kernfs_rwsem(kn); return error; } =20 @@ -1670,9 +1727,20 @@ static int kernfs_dir_fop_release(struct inode *inod= e, struct file *filp) static struct kernfs_node *kernfs_dir_pos(const void *ns, struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos) { + int idx, p_idx; + + p_idx =3D hash_ptr(parent, NR_KERNFS_LOCK_BITS); + lockdep_assert_held(&kernfs_root(parent)->kernfs_rwsem[p_idx]); if (pos) { - int valid =3D kernfs_active(pos) && + int valid =3D 0; + + idx =3D hash_ptr(pos, NR_KERNFS_LOCK_BITS); + if (idx !=3D p_idx) + down_read_kernfs_rwsem(pos, LOCK_SELF, 1); + valid =3D kernfs_active(pos) && pos->parent =3D=3D parent && hash =3D=3D pos->hash; + if (idx !=3D p_idx) + up_read_kernfs_rwsem(pos); kernfs_put(pos); if (!valid) pos =3D NULL; @@ -1681,18 +1749,37 @@ static struct kernfs_node *kernfs_dir_pos(const voi= d *ns, struct rb_node *node =3D parent->dir.children.rb_node; while (node) { pos =3D rb_to_kn(node); - + idx =3D hash_ptr(pos, NR_KERNFS_LOCK_BITS); + if (idx !=3D p_idx) + down_read_kernfs_rwsem(pos, LOCK_SELF, 1); if (hash < pos->hash) node =3D node->rb_left; else if (hash > pos->hash) node =3D node->rb_right; - else + else { + if (idx !=3D p_idx) + up_read_kernfs_rwsem(pos); break; + } + if (idx !=3D p_idx) + up_read_kernfs_rwsem(pos); } } /* Skip over entries which are dying/dead or in the wrong namespace */ - while (pos && (!kernfs_active(pos) || pos->ns !=3D ns)) { - struct rb_node *node =3D rb_next(&pos->rb); + while (pos) { + struct rb_node *node; + + idx =3D hash_ptr(pos, NR_KERNFS_LOCK_BITS); + if (idx !=3D p_idx) + down_read_kernfs_rwsem(pos, LOCK_SELF, 1); + if (kernfs_active(pos) && pos->ns =3D=3D ns) { + if (idx !=3D p_idx) + up_read_kernfs_rwsem(pos); + break; + } + node =3D rb_next(&pos->rb); + if (idx !=3D p_idx) + up_read_kernfs_rwsem(pos); if (!node) pos =3D NULL; else @@ -1704,16 +1791,41 @@ static struct kernfs_node *kernfs_dir_pos(const voi= d *ns, static struct kernfs_node *kernfs_dir_next_pos(const void *ns, struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos) { + int idx, p_idx; + int unlock_node =3D 0; + + p_idx =3D hash_ptr(parent, NR_KERNFS_LOCK_BITS); + lockdep_assert_held(&kernfs_root(parent)->kernfs_rwsem[p_idx]); pos =3D kernfs_dir_pos(ns, parent, ino, pos); if (pos) { + idx =3D hash_ptr(pos, NR_KERNFS_LOCK_BITS); + if (idx !=3D p_idx) + down_read_kernfs_rwsem(pos, LOCK_SELF, 1); do { struct rb_node *node =3D rb_next(&pos->rb); + if (idx !=3D p_idx) { + up_read_kernfs_rwsem(pos); + unlock_node =3D 0; + } if (!node) pos =3D NULL; - else + else { pos =3D rb_to_kn(node); + if (pos !=3D NULL) { + idx =3D hash_ptr(pos, + NR_KERNFS_LOCK_BITS); + if (idx !=3D p_idx) { + down_read_kernfs_rwsem(pos, + LOCK_SELF, + 1); + unlock_node =3D 1; + } + } + } } while (pos && (!kernfs_active(pos) || pos->ns !=3D ns)); } + if (unlock_node) + up_read_kernfs_rwsem(pos); return pos; } =20 @@ -1729,7 +1841,7 @@ static int kernfs_fop_readdir(struct file *file, stru= ct dir_context *ctx) return 0; =20 root =3D kernfs_root(parent); - down_read(&root->kernfs_rwsem); + down_read_kernfs_rwsem(parent, LOCK_SELF, 0); =20 if (kernfs_ns_enabled(parent)) ns =3D kernfs_info(dentry->d_sb)->ns; @@ -1746,12 +1858,12 @@ static int kernfs_fop_readdir(struct file *file, st= ruct dir_context *ctx) file->private_data =3D pos; kernfs_get(pos); =20 - up_read(&root->kernfs_rwsem); + up_read_kernfs_rwsem(parent); if (!dir_emit(ctx, name, len, ino, type)) return 0; - down_read(&root->kernfs_rwsem); + down_read_kernfs_rwsem(parent, LOCK_SELF, 0); } - up_read(&root->kernfs_rwsem); + up_read_kernfs_rwsem(parent); file->private_data =3D NULL; ctx->pos =3D INT_MAX; return 0; diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index 018d038b72fdd..5124add292582 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -855,8 +855,9 @@ static void kernfs_notify_workfn(struct work_struct *wo= rk) =20 root =3D kernfs_root(kn); /* kick fsnotify */ - down_write(&root->kernfs_rwsem); + down_write_kernfs_rwsem(kn, LOCK_SELF, 0); =20 + down_write(&root->supers_rwsem); list_for_each_entry(info, &kernfs_root(kn)->supers, node) { struct kernfs_node *parent; struct inode *p_inode =3D NULL; @@ -892,8 +893,9 @@ static void kernfs_notify_workfn(struct work_struct *wo= rk) =20 iput(inode); } + up_write(&root->supers_rwsem); =20 - up_write(&root->kernfs_rwsem); + up_write_kernfs_rwsem(kn); kernfs_put(kn); goto repeat; } diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c index 3d783d80f5daa..a8b16f08a667a 100644 --- a/fs/kernfs/inode.c +++ b/fs/kernfs/inode.c @@ -99,11 +99,10 @@ int __kernfs_setattr(struct kernfs_node *kn, const stru= ct iattr *iattr) int kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr) { int ret; - struct kernfs_root *root =3D kernfs_root(kn); =20 - down_write(&root->kernfs_rwsem); + down_write_kernfs_rwsem(kn, LOCK_SELF, 0); ret =3D __kernfs_setattr(kn, iattr); - up_write(&root->kernfs_rwsem); + up_write_kernfs_rwsem(kn); return ret; } =20 @@ -119,7 +118,7 @@ int kernfs_iop_setattr(struct user_namespace *mnt_usern= s, struct dentry *dentry, return -EINVAL; =20 root =3D kernfs_root(kn); - down_write(&root->kernfs_rwsem); + down_write_kernfs_rwsem(kn, LOCK_SELF, 0); error =3D setattr_prepare(&init_user_ns, dentry, iattr); if (error) goto out; @@ -132,7 +131,7 @@ int kernfs_iop_setattr(struct user_namespace *mnt_usern= s, struct dentry *dentry, setattr_copy(&init_user_ns, inode, iattr); =20 out: - up_write(&root->kernfs_rwsem); + up_write_kernfs_rwsem(kn); return error; } =20 @@ -187,14 +186,13 @@ int kernfs_iop_getattr(struct user_namespace *mnt_use= rns, { struct inode *inode =3D d_inode(path->dentry); struct kernfs_node *kn =3D inode->i_private; - struct kernfs_root *root =3D kernfs_root(kn); =20 - down_read(&root->kernfs_rwsem); + down_read_kernfs_rwsem(kn, LOCK_SELF, 0); spin_lock(&inode->i_lock); kernfs_refresh_inode(kn, inode); generic_fillattr(&init_user_ns, inode, stat); spin_unlock(&inode->i_lock); - up_read(&root->kernfs_rwsem); + up_read_kernfs_rwsem(kn); =20 return 0; } @@ -287,12 +285,12 @@ int kernfs_iop_permission(struct user_namespace *mnt_= userns, kn =3D inode->i_private; root =3D kernfs_root(kn); =20 - down_read(&root->kernfs_rwsem); + down_read_kernfs_rwsem(kn, LOCK_SELF, 0); spin_lock(&inode->i_lock); kernfs_refresh_inode(kn, inode); ret =3D generic_permission(&init_user_ns, inode, mask); spin_unlock(&inode->i_lock); - up_read(&root->kernfs_rwsem); + up_read_kernfs_rwsem(kn); =20 return ret; } diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h index cc49a6cd94154..3f011b323173c 100644 --- a/fs/kernfs/kernfs-internal.h +++ b/fs/kernfs/kernfs-internal.h @@ -19,6 +19,9 @@ #include #include =20 +#define LOCK_SELF 0 +#define LOCK_SELF_AND_PARENT 1 + struct kernfs_iattrs { kuid_t ia_uid; kgid_t ia_gid; @@ -102,6 +105,115 @@ static inline bool kernfs_dir_changed(struct kernfs_n= ode *parent, return false; } =20 +/* + * If both node and it's parent need locking, + * lock child first so that kernfs_rename_ns + * does not change the parent, leaving us + * with old parent here. + */ +static inline void down_write_kernfs_rwsem(struct kernfs_node *kn, + u8 lock_parent, + u8 nesting) +{ + int idx, p_idx; + struct kernfs_root *root; + + idx =3D hash_ptr(kn, NR_KERNFS_LOCK_BITS); + root =3D kernfs_root(kn); + + down_write_nested(&root->kernfs_rwsem[idx], nesting); + + kernfs_get(kn); + + if (kn->parent) + p_idx =3D hash_ptr(kn->parent, NR_KERNFS_LOCK_BITS); + + if (kn->parent && lock_parent && p_idx !=3D idx) { + /* + * Node and parent hash to different locks. + * node's lock has already been taken. + * Take parent's lock and update token. + */ + down_write_nested(&root->kernfs_rwsem[p_idx], + nesting + 1); + + kernfs_get(kn->parent); + kn->unlock_parent =3D 1; + } +} + +static inline void up_write_kernfs_rwsem(struct kernfs_node *kn) +{ + int p_idx, idx; + struct kernfs_root *root; + + /* node lock is already taken in down_xxx so kn->parent is safe */ + p_idx =3D hash_ptr(kn->parent, NR_KERNFS_LOCK_BITS); + idx =3D hash_ptr(kn, NR_KERNFS_LOCK_BITS); + root =3D kernfs_root(kn); + + if (kn->unlock_parent) { + kn->unlock_parent =3D 0; + up_write(&root->kernfs_rwsem[p_idx]); + kernfs_put(kn->parent); + } + + up_write(&root->kernfs_rwsem[idx]); + kernfs_put(kn); +} + +static inline void down_read_kernfs_rwsem(struct kernfs_node *kn, + u8 lock_parent, + u8 nesting) +{ + int idx, p_idx; + struct kernfs_root *root; + + idx =3D hash_ptr(kn, NR_KERNFS_LOCK_BITS); + root =3D kernfs_root(kn); + + down_read_nested(&root->kernfs_rwsem[idx], nesting); + + kernfs_get(kn); + + if (kn->parent) + p_idx =3D hash_ptr(kn->parent, NR_KERNFS_LOCK_BITS); + + if (kn->parent && lock_parent && p_idx !=3D idx) { + /* + * Node and parent hash to different locks. + * node's lock has already been taken. + * Take parent's lock and update token. + */ + down_read_nested(&root->kernfs_rwsem[p_idx], + nesting + 1); + + kernfs_get(kn->parent); + + kn->unlock_parent =3D 1; + } +} + +static inline void up_read_kernfs_rwsem(struct kernfs_node *kn) +{ + int p_idx, idx; + struct kernfs_root *root; + + /* node lock is already taken in down_xxx so kn->parent is safe */ + p_idx =3D hash_ptr(kn->parent, NR_KERNFS_LOCK_BITS); + idx =3D hash_ptr(kn, NR_KERNFS_LOCK_BITS); + root =3D kernfs_root(kn); + + if (kn->unlock_parent) { + kn->unlock_parent =3D 0; + up_read(&root->kernfs_rwsem[p_idx]); + kernfs_put(kn->parent); + } + + up_read(&root->kernfs_rwsem[idx]); + kernfs_put(kn); +} + extern const struct super_operations kernfs_sops; extern struct kmem_cache *kernfs_node_cache, *kernfs_iattrs_cache; =20 diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index cfa79715fc1a7..ebb7d9a10f47e 100644 --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -236,7 +236,6 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *k= n, static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_cont= ext *kfc) { struct kernfs_super_info *info =3D kernfs_info(sb); - struct kernfs_root *kf_root =3D kfc->root; struct inode *inode; struct dentry *root; =20 @@ -256,9 +255,9 @@ static int kernfs_fill_super(struct super_block *sb, st= ruct kernfs_fs_context *k sb->s_shrink.seeks =3D 0; =20 /* get root inode, initialize and unlock it */ - down_read(&kf_root->kernfs_rwsem); + down_read_kernfs_rwsem(info->root->kn, 0, 0); inode =3D kernfs_get_inode(sb, info->root->kn); - up_read(&kf_root->kernfs_rwsem); + up_read_kernfs_rwsem(info->root->kn); if (!inode) { pr_debug("kernfs: could not get root inode\n"); return -ENOMEM; @@ -346,9 +345,9 @@ int kernfs_get_tree(struct fs_context *fc) } sb->s_flags |=3D SB_ACTIVE; =20 - down_write(&root->kernfs_rwsem); + down_write(&root->supers_rwsem); list_add(&info->node, &info->root->supers); - up_write(&root->kernfs_rwsem); + up_write(&root->supers_rwsem); } =20 fc->root =3D dget(sb->s_root); @@ -375,9 +374,9 @@ void kernfs_kill_sb(struct super_block *sb) struct kernfs_super_info *info =3D kernfs_info(sb); struct kernfs_root *root =3D info->root; =20 - down_write(&root->kernfs_rwsem); + down_write(&root->supers_rwsem); list_del(&info->node); - up_write(&root->kernfs_rwsem); + up_write(&root->supers_rwsem); =20 /* * Remove the superblock from fs_supers/s_instances diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c index 0ab13824822f7..5d4a769e2ab1e 100644 --- a/fs/kernfs/symlink.c +++ b/fs/kernfs/symlink.c @@ -113,12 +113,11 @@ static int kernfs_getlink(struct inode *inode, char *= path) struct kernfs_node *kn =3D inode->i_private; struct kernfs_node *parent =3D kn->parent; struct kernfs_node *target =3D kn->symlink.target_kn; - struct kernfs_root *root =3D kernfs_root(parent); int error; =20 - down_read(&root->kernfs_rwsem); + down_read_kernfs_rwsem(parent, LOCK_SELF, 0); error =3D kernfs_get_target_path(parent, target, path); - up_read(&root->kernfs_rwsem); + up_read_kernfs_rwsem(parent); =20 return error; } diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 5bf9f02ce9dce..3b3c3e0b44083 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -179,6 +179,7 @@ struct kernfs_node { */ struct kernfs_node *parent; const char *name; + u8 unlock_parent; /* release parent's rwsem */ =20 struct rb_node rb; =20 @@ -237,9 +238,10 @@ struct kernfs_root { struct list_head supers; =20 wait_queue_head_t deactivate_waitq; - struct rw_semaphore kernfs_rwsem; struct kernfs_open_node_lock open_node_locks[NR_KERNFS_LOCKS]; struct kernfs_open_file_mutex open_file_mutex[NR_KERNFS_LOCKS]; + struct rw_semaphore supers_rwsem; + struct rw_semaphore kernfs_rwsem[NR_KERNFS_LOCKS]; }; =20 struct kernfs_open_file { @@ -619,5 +621,4 @@ static inline int kernfs_rename(struct kernfs_node *kn, { return kernfs_rename_ns(kn, new_parent, new_name, NULL); } - #endif /* __LINUX_KERNFS_H */ --=20 2.30.2