Documentation/RAS/ras.rst | 26 ++++++++++++++++++++++++++ Documentation/index.rst | 1 + 2 files changed, 27 insertions(+) create mode 100644 Documentation/RAS/ras.rst
On Thu, Nov 02, 2023 at 11:42:22AM +0000, Muralidhara M K wrote:
> From: Muralidhara M K <muralidhara.mk@amd.com>
>
> AMD systems with Scalable MCA, each machine check error of a SMCA bank
> type has an associated bit position in the bank's control (CTL) register.
Ontop of this. It is long overdue:
---
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Tue, 28 Nov 2023 14:37:56 +0100
Add some initial RAS documentation. The expectation is for this to
collect all the user-visible features for interacting with the RAS
features of the kernel.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
---
Documentation/RAS/ras.rst | 26 ++++++++++++++++++++++++++
Documentation/index.rst | 1 +
2 files changed, 27 insertions(+)
create mode 100644 Documentation/RAS/ras.rst
diff --git a/Documentation/RAS/ras.rst b/Documentation/RAS/ras.rst
new file mode 100644
index 000000000000..2556b397cd27
--- /dev/null
+++ b/Documentation/RAS/ras.rst
@@ -0,0 +1,26 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Reliability, Availability and Serviceability features
+=====================================================
+
+This documents different aspects of the RAS functionality present in the
+kernel.
+
+Error decoding
+---------------
+
+* x86
+
+Error decoding on AMD systems should be done using the rasdaemon tool:
+https://github.com/mchehab/rasdaemon/
+
+While the daemon is running, it would automatically log and decode
+errors. If not, one can still decode such errors by supplying the
+hardware information from the error::
+
+ $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca
+
+Also, the user can pass particular family and model to decode the error
+string::
+
+ $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca --family <CPU Family> --model <CPU Model> --bank <BANK_NUM>
diff --git a/Documentation/index.rst b/Documentation/index.rst
index 9dfdc826618c..36e61783437c 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -113,6 +113,7 @@ to ReStructured Text format, or are simply too old.
:maxdepth: 1
staging/index
+ RAS/ras
Translations
--
2.42.0.rc0.25.ga82fb66fed25
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov <bp@alien8.de> writes: > On Thu, Nov 02, 2023 at 11:42:22AM +0000, Muralidhara M K wrote: >> From: Muralidhara M K <muralidhara.mk@amd.com> >> >> AMD systems with Scalable MCA, each machine check error of a SMCA bank >> type has an associated bit position in the bank's control (CTL) register. > > Ontop of this. It is long overdue: > > --- > From: "Borislav Petkov (AMD)" <bp@alien8.de> > Date: Tue, 28 Nov 2023 14:37:56 +0100 > > Add some initial RAS documentation. The expectation is for this to > collect all the user-visible features for interacting with the RAS > features of the kernel. > > Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> > --- > Documentation/RAS/ras.rst | 26 ++++++++++++++++++++++++++ > Documentation/index.rst | 1 + > 2 files changed, 27 insertions(+) > create mode 100644 Documentation/RAS/ras.rst I wish I'd been copied on this ... I've been working to get a handle on the top-level Documentation/ directories for a while, and would rather not see a new one added for this. Offhand, based on this first document, it looks like material that belongs under Documentation/admin-guide; can we move it there, please? Thanks, jon
On Tue, Jan 09, 2024 at 10:47:29AM -0700, Jonathan Corbet wrote:
> I wish I'd been copied on this ...
linux-doc was CCed:
https://lore.kernel.org/all/20231128142049.GTZWX3QQTSaQk%2F+u53@fat_crate.local/
Or did you prefer you directly?
I've been working to get a handle on
> the top-level Documentation/ directories for a while, and would rather
> not see a new one added for this. Offhand, based on this first
> document, it looks like material that belongs under
> Documentation/admin-guide; can we move it there, please?
Not really an admin guide thing - yes, based on the current content but
actually, the aim for this is to document all things RAS, so it is more
likely a subsystem thing. And all the subsystems are directories under
Documentation/.
So where do you want me to put it?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov <bp@alien8.de> writes: > On Tue, Jan 09, 2024 at 10:47:29AM -0700, Jonathan Corbet wrote: >> I wish I'd been copied on this ... > > linux-doc was CCed: > > https://lore.kernel.org/all/20231128142049.GTZWX3QQTSaQk%2F+u53@fat_crate.local/ > > Or did you prefer you directly? Lots of stuff goes to linux-doc, I can miss things. Of course, I miss things in my own email too...you know the drill... > I've been working to get a handle on >> the top-level Documentation/ directories for a while, and would rather >> not see a new one added for this. Offhand, based on this first >> document, it looks like material that belongs under >> Documentation/admin-guide; can we move it there, please? > > Not really an admin guide thing - yes, based on the current content but > actually, the aim for this is to document all things RAS, so it is more > likely a subsystem thing. And all the subsystems are directories under > Documentation/. > > So where do you want me to put it? The hope with all of this documentation thrashing has been to organize our docs with the *reader* in mind. "All things RAS" is convenient for RAS developers, but not for (say) a sysadmin trying to figure out how to make use of it. So I would really rather see RAS documentation placed under admin-guide or userspace-api as appropriate. Yes, there is a lot of existing documentation that still doesn't live up to this idea, but we can try to follow it for new stuff while the rest is (slowly) fixed up. Make sense? Thanks, jon
On Tue, Jan 09, 2024 at 12:44:41PM -0700, Jonathan Corbet wrote:
> Of course, I miss things in my own email too...you know the drill...
Yeah, tell me about it.
My train of thought with CCing maintainers in such cases usually is: I'd
CC the mailing list as I don't want to bother the maintainer - she/he gets
too much email anyway and this is an FYI thing anyway so she/he'll find
it in the archives eventually.
> Yes, there is a lot of existing documentation that still doesn't live up
> to this idea, but we can try to follow it for new stuff while the rest
> is (slowly) fixed up.
The problem I see here is that not all of the RAS stuff will be
"admin-guide" stuff but some design decisions we've made. I mean, if it
is a really curious admin, it'll fit her/his alley but it won't be
purely administrative tasks' descriptions.
In the end of the day, I don't really care where it is as long as it is
in one place and we can point people to it and say, here, that's why we
did it the way we did it and what you can do about it.
So I'm fine with admin-guide too - just pointing out a potential issue
I see.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Jan 09, 2024 at 09:04:34PM +0100, Borislav Petkov wrote:
> So I'm fine with admin-guide too - just pointing out a potential issue
> I see.
Ok, how does that look like?
I've merged it with ras.rst which we had there already and with some
more new documentation that is coming from:
https://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-amd-atl
Thx.
---
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Wed, 24 Jan 2024 13:37:52 +0100
Subject: [PATCH] Documentation: Move RAS section to admin-guide
This is where this stuff should be.
Requested-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
---
Documentation/RAS/index.rst | 14 --------------
.../{ => admin-guide}/RAS/address-translation.rst | 0
.../{ => admin-guide}/RAS/error-decoding.rst | 0
Documentation/admin-guide/RAS/index.rst | 7 +++++++
.../admin-guide/{ras.rst => RAS/main.rst} | 10 +++++++---
Documentation/admin-guide/index.rst | 2 +-
Documentation/index.rst | 1 -
7 files changed, 15 insertions(+), 19 deletions(-)
delete mode 100644 Documentation/RAS/index.rst
rename Documentation/{ => admin-guide}/RAS/address-translation.rst (100%)
rename Documentation/{ => admin-guide}/RAS/error-decoding.rst (100%)
create mode 100644 Documentation/admin-guide/RAS/index.rst
rename Documentation/admin-guide/{ras.rst => RAS/main.rst} (99%)
diff --git a/Documentation/RAS/index.rst b/Documentation/RAS/index.rst
deleted file mode 100644
index 2794c1816e90..000000000000
--- a/Documentation/RAS/index.rst
+++ /dev/null
@@ -1,14 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-===========================================================
-Reliability, Availability and Serviceability (RAS) features
-===========================================================
-
-This documents different aspects of the RAS functionality present in the
-kernel.
-
-.. toctree::
- :maxdepth: 2
-
- error-decoding
- address-translation
diff --git a/Documentation/RAS/address-translation.rst b/Documentation/admin-guide/RAS/address-translation.rst
similarity index 100%
rename from Documentation/RAS/address-translation.rst
rename to Documentation/admin-guide/RAS/address-translation.rst
diff --git a/Documentation/RAS/error-decoding.rst b/Documentation/admin-guide/RAS/error-decoding.rst
similarity index 100%
rename from Documentation/RAS/error-decoding.rst
rename to Documentation/admin-guide/RAS/error-decoding.rst
diff --git a/Documentation/admin-guide/RAS/index.rst b/Documentation/admin-guide/RAS/index.rst
new file mode 100644
index 000000000000..f4087040a7c0
--- /dev/null
+++ b/Documentation/admin-guide/RAS/index.rst
@@ -0,0 +1,7 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. toctree::
+ :maxdepth: 2
+
+ main
+ error-decoding
+ address-translation
diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/RAS/main.rst
similarity index 99%
rename from Documentation/admin-guide/ras.rst
rename to Documentation/admin-guide/RAS/main.rst
index 8e03751d126d..7ac1d4ccc509 100644
--- a/Documentation/admin-guide/ras.rst
+++ b/Documentation/admin-guide/RAS/main.rst
@@ -1,8 +1,12 @@
+.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
-============================================
-Reliability, Availability and Serviceability
-============================================
+==================================================
+Reliability, Availability and Serviceability (RAS)
+==================================================
+
+This documents different aspects of the RAS functionality present in the
+kernel.
RAS concepts
************
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index fb40a1f6f79e..dfc06fab9432 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -122,7 +122,7 @@ configure specific aspects of kernel behavior to your liking.
pmf
pnp
rapidio
- ras
+ RAS/index
rtc
serial-console
svga
diff --git a/Documentation/index.rst b/Documentation/index.rst
index 07f2aa07f0fa..9dfdc826618c 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -113,7 +113,6 @@ to ReStructured Text format, or are simply too old.
:maxdepth: 1
staging/index
- RAS/index
Translations
--
2.43.0
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Jan 24, 2024 at 01:40:30PM +0100, Borislav Petkov wrote:
> From: "Borislav Petkov (AMD)" <bp@alien8.de>
> Date: Wed, 24 Jan 2024 13:37:52 +0100
> Subject: [PATCH] Documentation: Move RAS section to admin-guide
>
> This is where this stuff should be.
>
> Requested-by: Jonathan Corbet <corbet@lwn.net>
> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> ---
> Documentation/RAS/index.rst | 14 --------------
> .../{ => admin-guide}/RAS/address-translation.rst | 0
> .../{ => admin-guide}/RAS/error-decoding.rst | 0
> Documentation/admin-guide/RAS/index.rst | 7 +++++++
> .../admin-guide/{ras.rst => RAS/main.rst} | 10 +++++++---
> Documentation/admin-guide/index.rst | 2 +-
> Documentation/index.rst | 1 -
> 7 files changed, 15 insertions(+), 19 deletions(-)
> delete mode 100644 Documentation/RAS/index.rst
> rename Documentation/{ => admin-guide}/RAS/address-translation.rst (100%)
> rename Documentation/{ => admin-guide}/RAS/error-decoding.rst (100%)
> create mode 100644 Documentation/admin-guide/RAS/index.rst
> rename Documentation/admin-guide/{ras.rst => RAS/main.rst} (99%)
Now queued.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hi, On 11/28/23 06:20, Borislav Petkov wrote: > On Thu, Nov 02, 2023 at 11:42:22AM +0000, Muralidhara M K wrote: >> From: Muralidhara M K <muralidhara.mk@amd.com> >> >> AMD systems with Scalable MCA, each machine check error of a SMCA bank >> type has an associated bit position in the bank's control (CTL) register. > > Ontop of this. It is long overdue: > > --- > From: "Borislav Petkov (AMD)" <bp@alien8.de> > Date: Tue, 28 Nov 2023 14:37:56 +0100 > > Add some initial RAS documentation. The expectation is for this to > collect all the user-visible features for interacting with the RAS > features of the kernel. > In general, does RAS include EDAC and MCE? Thanks. > Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> > --- > Documentation/RAS/ras.rst | 26 ++++++++++++++++++++++++++ > Documentation/index.rst | 1 + > 2 files changed, 27 insertions(+) > create mode 100644 Documentation/RAS/ras.rst > -- ~Randy
On Tue, Nov 28, 2023 at 09:04:22AM -0800, Randy Dunlap wrote:
> In general, does RAS include EDAC and MCE?
You can say that.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 11/28/2023 9:20 AM, Borislav Petkov wrote: > On Thu, Nov 02, 2023 at 11:42:22AM +0000, Muralidhara M K wrote: >> From: Muralidhara M K <muralidhara.mk@amd.com> >> >> AMD systems with Scalable MCA, each machine check error of a SMCA bank >> type has an associated bit position in the bank's control (CTL) register. > > Ontop of this. It is long overdue: > > --- > From: "Borislav Petkov (AMD)" <bp@alien8.de> > Date: Tue, 28 Nov 2023 14:37:56 +0100 > > Add some initial RAS documentation. The expectation is for this to > collect all the user-visible features for interacting with the RAS > features of the kernel. > > Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> > --- > Documentation/RAS/ras.rst | 26 ++++++++++++++++++++++++++ > Documentation/index.rst | 1 + > 2 files changed, 27 insertions(+) > create mode 100644 Documentation/RAS/ras.rst > > diff --git a/Documentation/RAS/ras.rst b/Documentation/RAS/ras.rst > new file mode 100644 > index 000000000000..2556b397cd27 > --- /dev/null > +++ b/Documentation/RAS/ras.rst > @@ -0,0 +1,26 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +Reliability, Availability and Serviceability features > +===================================================== > + > +This documents different aspects of the RAS functionality present in the > +kernel. > + > +Error decoding > +--------------- > + > +* x86 > + > +Error decoding on AMD systems should be done using the rasdaemon tool: > +https://github.com/mchehab/rasdaemon/ > + > +While the daemon is running, it would automatically log and decode > +errors. If not, one can still decode such errors by supplying the > +hardware information from the error:: > + > + $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca > + > +Also, the user can pass particular family and model to decode the error > +string:: > + > + $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca --family <CPU Family> --model <CPU Model> --bank <BANK_NUM> > diff --git a/Documentation/index.rst b/Documentation/index.rst > index 9dfdc826618c..36e61783437c 100644 > --- a/Documentation/index.rst > +++ b/Documentation/index.rst > @@ -113,6 +113,7 @@ to ReStructured Text format, or are simply too old. > :maxdepth: 1 > > staging/index > + RAS/ras > > > Translations Thanks for starting this. I'll add some notes for the AMD Address Translation Library in the next revision. Thanks, Yazen
© 2016 - 2025 Red Hat, Inc.