From nobody Wed Nov 27 23:27:17 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B176D1E0494; Mon, 7 Oct 2024 23:17:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728343050; cv=none; b=okzuDd0aPQdEBSzQn9ERVs5N4L0LJApJGmoGaCd2LGucIx7wv8q5wwVzTS8dGdbFrcSGSWuyVh8BdfowVhOjidA225u3mHK4R7DKhLXaSu/h9gtIho/ld6Z04VrS0ozMApw639GuZr75QpaIPl9S/uoBYj/iKroWphcx9BLy3Fg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728343050; c=relaxed/simple; bh=jg9YYJ2n+h2XnM2u1yxmf96fjN9kJTBzkxNLVdOLdx8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=iCvhxmSvGd6HDhK/sSkRWdc3x1Eg7vIaGX4H6Nyn5zrF8NFjU274zoP4ZMHYv5O3ceE1H5kobMrhfYehw3qxqJrHfP1zAmQDCkyKsccSUKrcTjviWt07pwwUNdCaxncdI3WLqPqbuuvexRGWwUt4G2gBGOv3xj87F0nlsAnEjdQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SzxIit9M; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SzxIit9M" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728343049; x=1759879049; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=jg9YYJ2n+h2XnM2u1yxmf96fjN9kJTBzkxNLVdOLdx8=; b=SzxIit9MglqaMZt5m+ua3+Av5yWCfGyMJSonusbZbeWyoKw/qZci8ITx 7YFJziAzcc12KMi14HaSJiejzh4tgyI8fmJvInuznAqc9VtK2i0V9cvVs 3TAKSxMnbpqfRc3tYb3Fq3VcSQf4rT2ndlBgWU7VvYjfA7uhRE8a+Hw8n t2cKM4v9Is8QNnegelhBJA8KrRZVyYO7A+BEo/RL2zIqT/Xxm2AKDCXFT 4ybhBOHYtx26K9SBszn3OdMitPsXY+So+WAhGsjzZ3L+kGnHoMJqfwUDn 79dgHOOLW64VdAeVk5u8Z6L61Q52rKu6tngSHbhdaAQhrgMLAiqE9F1wZ Q==; X-CSE-ConnectionGUID: xZAZItaWTzi2jTeBpP6/Ug== X-CSE-MsgGUID: hN4gl2psRF6jvoAKlqET/g== X-IronPort-AV: E=McAfee;i="6700,10204,11218"; a="26972698" X-IronPort-AV: E=Sophos;i="6.11,185,1725346800"; d="scan'208";a="26972698" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2024 16:17:28 -0700 X-CSE-ConnectionGUID: 0YpmcnalQCObtGqLvvvN2g== X-CSE-MsgGUID: u7VePHG1TF235eZnK3fjoA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,185,1725346800"; d="scan'208";a="76001828" Received: from ldmartin-desk2.corp.intel.com (HELO localhost) ([10.125.110.112]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2024 16:17:25 -0700 From: ira.weiny@intel.com Date: Mon, 07 Oct 2024 18:16:31 -0500 Subject: [PATCH v4 25/28] cxl/region: Read existing extents on region creation Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20241007-dcd-type2-upstream-v4-25-c261ee6eeded@intel.com> References: <20241007-dcd-type2-upstream-v4-0-c261ee6eeded@intel.com> In-Reply-To: <20241007-dcd-type2-upstream-v4-0-c261ee6eeded@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh , Jonathan Corbet , Andrew Morton Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-doc@vger.kernel.org, nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1728342968; l=7960; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=A2+A3Hotk6h64jwindc//KyNyqLvRNf9Nc6deyAZFGE=; b=59ZUWjPGAiAhIQPdcwdT52j17qC2l1B2CQi322TJGL7YZqOT7IzrWAsK/na/5AYhEmHAXCO8C 9RK5z8vUW6DBD/K4teI9706kowLN5ter+IhPwYZeFbIY35lC7woZU6E X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Dynamic capacity device extents may be left in an accepted state on a device due to an unexpected host crash. In this case it is expected that the creation of a new region on top of a DC partition can read those extents and surface them for continued use. Once all endpoint decoders are part of a region and the region is being realized, a read of the 'devices extent list' can reveal these previously accepted extents. CXL r3.1 specifies the mailbox call Get Dynamic Capacity Extent List for this purpose. The call returns all the extents for all dynamic capacity partitions. If the fabric manager is adding extents to any DCD partition, the extent list for the recovered region may change. In this case the query must retry. Upon retry the query could encounter extents which were accepted on a previous list query. Adding such extents is ignored without error because they are entirely within a previous accepted extent. The scan for existing extents races with the dax_cxl driver. This is synchronized through the region device lock. Extents which are found after the driver has loaded will surface through the normal notification path while extents seen prior to the driver are read during driver load. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Reviewed-by: Jonathan Cameron --- Changes: [iweiny: adjust for mailbox split] [djiang: Update commit messages] [djiang: s/cxl_read_extent_list/cxl_process_extent_list/] [djiang: #define CXL_READ_EXTENT_LIST_RETRY] --- drivers/cxl/core/core.h | 2 + drivers/cxl/core/mbox.c | 105 ++++++++++++++++++++++++++++++++++++++++++= ++++ drivers/cxl/core/region.c | 12 ++++++ drivers/cxl/cxlmem.h | 21 ++++++++++ 4 files changed, 140 insertions(+) diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 0eccdd0b9261..80d61f75161d 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -21,6 +21,8 @@ cxled_to_mds(struct cxl_endpoint_decoder *cxled) return container_of(cxlds, struct cxl_memdev_state, cxlds); } =20 +void cxl_process_extent_list(struct cxl_endpoint_decoder *cxled); + #ifdef CONFIG_CXL_REGION extern struct device_attribute dev_attr_create_pmem_region; extern struct device_attribute dev_attr_create_ram_region; diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index d66beec687a0..6b25d15403a3 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1697,6 +1697,111 @@ int cxl_dev_dynamic_capacity_identify(struct cxl_me= mdev_state *mds) } EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify, CXL); =20 +/* Return -EAGAIN if the extent list changes while reading */ +static int __cxl_process_extent_list(struct cxl_endpoint_decoder *cxled) +{ + u32 current_index, total_read, total_expected, initial_gen_num; + struct cxl_memdev_state *mds =3D cxled_to_mds(cxled); + struct cxl_mailbox *cxl_mbox =3D &mds->cxlds.cxl_mbox; + struct device *dev =3D mds->cxlds.dev; + struct cxl_mbox_cmd mbox_cmd; + u32 max_extent_count; + bool first =3D true; + + struct cxl_mbox_get_extent_out *extents __free(kfree) =3D + kvmalloc(cxl_mbox->payload_size, GFP_KERNEL); + if (!extents) + return -ENOMEM; + + total_read =3D 0; + current_index =3D 0; + total_expected =3D 0; + max_extent_count =3D (cxl_mbox->payload_size - sizeof(*extents)) / + sizeof(struct cxl_extent); + do { + struct cxl_mbox_get_extent_in get_extent; + u32 nr_returned, current_total, current_gen_num; + int rc; + + get_extent =3D (struct cxl_mbox_get_extent_in) { + .extent_cnt =3D max(max_extent_count, + total_expected - current_index), + .start_extent_index =3D cpu_to_le32(current_index), + }; + + mbox_cmd =3D (struct cxl_mbox_cmd) { + .opcode =3D CXL_MBOX_OP_GET_DC_EXTENT_LIST, + .payload_in =3D &get_extent, + .size_in =3D sizeof(get_extent), + .size_out =3D cxl_mbox->payload_size, + .payload_out =3D extents, + .min_out =3D 1, + }; + + rc =3D cxl_internal_send_cmd(cxl_mbox, &mbox_cmd); + if (rc < 0) + return rc; + + /* Save initial data */ + if (first) { + total_expected =3D le32_to_cpu(extents->total_extent_count); + initial_gen_num =3D le32_to_cpu(extents->generation_num); + first =3D false; + } + + nr_returned =3D le32_to_cpu(extents->returned_extent_count); + total_read +=3D nr_returned; + current_total =3D le32_to_cpu(extents->total_extent_count); + current_gen_num =3D le32_to_cpu(extents->generation_num); + + dev_dbg(dev, "Got extent list %d-%d of %d generation Num:%d\n", + current_index, total_read - 1, current_total, current_gen_num); + + if (current_gen_num !=3D initial_gen_num || total_expected !=3D current_= total) { + dev_dbg(dev, "Extent list change detected; gen %u !=3D %u : cnt %u !=3D= %u\n", + current_gen_num, initial_gen_num, + total_expected, current_total); + return -EAGAIN; + } + + for (int i =3D 0; i < nr_returned ; i++) { + struct cxl_extent *extent =3D &extents->extent[i]; + + dev_dbg(dev, "Processing extent %d/%d\n", + current_index + i, total_expected); + + rc =3D validate_add_extent(mds, extent); + if (rc) + continue; + } + + current_index +=3D nr_returned; + } while (total_expected > total_read); + + return 0; +} + +/** + * cxl_process_extent_list() - Read existing extents + * @cxled: Endpoint decoder which is part of a region + * + * Issue the Get Dynamic Capacity Extent List command to the device + * and add existing extents if found. + * + * A retry of 10 is somewhat arbitrary, however, extent changes should be + * relatively rare while bringing up a region. So 10 should be plenty. + */ +#define CXL_READ_EXTENT_LIST_RETRY 10 +void cxl_process_extent_list(struct cxl_endpoint_decoder *cxled) +{ + int retry =3D CXL_READ_EXTENT_LIST_RETRY; + int rc; + + do { + rc =3D __cxl_process_extent_list(cxled); + } while (rc =3D=3D -EAGAIN && retry--); +} + static int add_dpa_res(struct device *dev, struct resource *parent, struct resource *res, resource_size_t start, resource_size_t size, const char *type) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 6ae51fc2bdae..5ed4a77491e5 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -3190,6 +3190,15 @@ static int devm_cxl_add_pmem_region(struct cxl_regio= n *cxlr) return rc; } =20 +static void cxlr_add_existing_extents(struct cxl_region *cxlr) +{ + struct cxl_region_params *p =3D &cxlr->params; + int i; + + for (i =3D 0; i < p->nr_targets; i++) + cxl_process_extent_list(p->targets[i]); +} + static void cxlr_dax_unregister(void *_cxlr_dax) { struct cxl_dax_region *cxlr_dax =3D _cxlr_dax; @@ -3224,6 +3233,9 @@ static int devm_cxl_add_dax_region(struct cxl_region = *cxlr) dev_dbg(&cxlr->dev, "%s: register %s\n", dev_name(dev->parent), dev_name(dev)); =20 + if (cxlr->mode =3D=3D CXL_REGION_DC) + cxlr_add_existing_extents(cxlr); + return devm_add_action_or_reset(&cxlr->dev, cxlr_dax_unregister, cxlr_dax); err: diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index dd7cc0d373af..4272f134da8f 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -626,6 +626,27 @@ struct cxl_mbox_dc_response { } __packed extent_list[]; } __packed; =20 +/* + * Get Dynamic Capacity Extent List; Input Payload + * CXL rev 3.1 section 8.2.9.9.9.2; Table 8-166 + */ +struct cxl_mbox_get_extent_in { + __le32 extent_cnt; + __le32 start_extent_index; +} __packed; + +/* + * Get Dynamic Capacity Extent List; Output Payload + * CXL rev 3.1 section 8.2.9.9.9.2; Table 8-167 + */ +struct cxl_mbox_get_extent_out { + __le32 returned_extent_count; + __le32 total_extent_count; + __le32 generation_num; + u8 rsvd[4]; + struct cxl_extent extent[]; +} __packed; + struct cxl_mbox_get_supported_logs { __le16 entries; u8 rsvd[6]; --=20 2.46.0