[RESEND v13 11/25] cxl/pci: Log message if RAS registers are unmapped

Terry Bowman posted 25 patches 1 month, 1 week ago
[RESEND v13 11/25] cxl/pci: Log message if RAS registers are unmapped
Posted by Terry Bowman 1 month, 1 week ago
The CXL RAS handlers do not currently log if the RAS registers are
unmapped. This is needed in order to help debug CXL error handling. Update
the CXL driver to log a warning message if the RAS register block is
unmapped during RAS error handling.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>

---

Chan ges in v12->v13:
- Added Bens review-by
---
 drivers/cxl/core/ras.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 72908f3ced77..0320c391f201 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -165,8 +165,10 @@ void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
 	void __iomem *addr;
 	u32 status;
 
-	if (!ras_base)
+	if (!ras_base) {
+		dev_warn_once(dev, "CXL RAS register block is not mapped");
 		return;
+	}
 
 	addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
 	status = readl(addr);
@@ -204,8 +206,10 @@ bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
 	u32 status;
 	u32 fe;
 
-	if (!ras_base)
+	if (!ras_base) {
+		dev_warn_once(dev, "CXL RAS register block is not mapped");
 		return false;
+	}
 
 	addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
 	status = readl(addr);
-- 
2.34.1
Re: [RESEND v13 11/25] cxl/pci: Log message if RAS registers are unmapped
Posted by dan.j.williams@intel.com 4 weeks ago
Terry Bowman wrote:
> The CXL RAS handlers do not currently log if the RAS registers are
> unmapped. This is needed in order to help debug CXL error handling. Update
> the CXL driver to log a warning message if the RAS register block is
> unmapped during RAS error handling.

That does not tell me anything about why this patch is needed, how this
scenario is entered and why catching this late is ok.

I do not have a problem with the change:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

...but I would steer away from patches that just say "add debug, because
debug helps debug".

What is more interesting is a story like:

"I lost a bunch of time figuring out why error handling was not working
only to find that in $scenario the RAS registers are not mapped. Save
the next person time by logging this condition".

Otherwise, if I NAK this patch I have no sense that Linux is any worse
off, and fewer patches is a virtue worth considering.