[RFC 00/10] Introduce In Field Scan driver

Jithu Joseph posted 10 patches 4 years, 3 months ago
There is a newer version of this series
Documentation/ABI/stable/sysfs-driver-ifs |  85 +++++
Documentation/x86/ifs.rst                 | 108 ++++++
Documentation/x86/index.rst               |   1 +
MAINTAINERS                               |   7 +
arch/x86/include/asm/microcode_intel.h    |   6 +
arch/x86/kernel/cpu/microcode/intel.c     |   8 +-
drivers/platform/x86/intel/Kconfig        |   1 +
drivers/platform/x86/intel/Makefile       |   1 +
drivers/platform/x86/intel/ifs/Kconfig    |   9 +
drivers/platform/x86/intel/ifs/Makefile   |   7 +
drivers/platform/x86/intel/ifs/core.c     | 387 +++++++++++++++++++++
drivers/platform/x86/intel/ifs/ifs.h      | 155 +++++++++
drivers/platform/x86/intel/ifs/load.c     | 299 ++++++++++++++++
drivers/platform/x86/intel/ifs/sysfs.c    | 394 ++++++++++++++++++++++
include/trace/events/ifs.h                |  38 +++
15 files changed, 1503 insertions(+), 3 deletions(-)
create mode 100644 Documentation/ABI/stable/sysfs-driver-ifs
create mode 100644 Documentation/x86/ifs.rst
create mode 100644 drivers/platform/x86/intel/ifs/Kconfig
create mode 100644 drivers/platform/x86/intel/ifs/Makefile
create mode 100644 drivers/platform/x86/intel/ifs/core.c
create mode 100644 drivers/platform/x86/intel/ifs/ifs.h
create mode 100644 drivers/platform/x86/intel/ifs/load.c
create mode 100644 drivers/platform/x86/intel/ifs/sysfs.c
create mode 100644 include/trace/events/ifs.h
[RFC 00/10] Introduce In Field Scan driver
Posted by Jithu Joseph 4 years, 3 months ago
Note to Maintainers:
Requesting x86 Maintainers to take a look at patch01 as it
touches arch/x86 portion of the kernel. Also would like to guide them
to patch07 which sets up hotplug notifiers and creates kthreads.

Patch 2/10 - Adds Documentation. Requesting Documentation maintainer to review it.

Requesting Greg KH to review the sysfs changes added by patch08.

Patch10 adds tracing support, requesting Steven Rostedt to review that.

Rest of the patches adds the IFS platform driver, requesting Platform driver maintainers
to review them.


In Field Scan (IFS) is a hardware feature to run circuit level tests on
a CPU core to detect problems that are not caught by parity or ECC checks.

Intel will provide a firmware file containing the scan tests.  Similar to
microcode there is a separate file for each family-model-stepping. The
tests in the file are divided into some number of "chunks" that can be
run individually.

The driver loads the tests into memory reserved BIOS local to each CPU
socket in a two step process using writes to MSRs to first load the
SHA hashes for the test. Then the tests themselves. Status MSRs provide
feedback on the success/failure of these steps.

Tests are run by synchronizing execution of all threads on a core and
then writing to the ACTIVATE_SCAN MSR on all threads. Instruction
execution continues when:

1) all tests have completed
2) execution was interrupted
3) a test detected a problem

In all cases reading the SCAN_STATUS MSR provides details on what
happened. Interrupted tests may be restarted.

The IFS driver provides interfaces from /sys to reload tests and to
control execution:

/sys/devices/system/cpu/ifs/reload
  Writing "1" to this file will reload the tests from
  /lib/firmware/intel/ifs/{ff-mm-ss}.scan

/sys/devices/system/cpu/ifs/run_test
  Writing "1" to this file will trigger a scan on each core
  sequentially by logical CPU number (when HT is enabled this only
  runs the tests once for each core)

/sys/devices/system/cpu/cpu#/ifs/run_test
  Writing "1" to one of these files will trigger a scan on just
  that core.

Results of the tests are also provided in /sys:

/sys/devices/system/cpu/ifs/status
  Global status. Will show the most serious status across
  all cores (fail > untested > pass)

/sys/devices/system/cpu/ifs/cpu_fail_list
/sys/devices/system/cpu/ifs/cpu_pass_list
/sys/devices/system/cpu/ifs/cpu_untested_list
  CPU lists showing which CPUs have which test status

/sys/devices/system/cpu/cpu#/ifs/status
  Status (pass/fail/untested) of each core

/sys/devices/system/cpu/cpu#/ifs/details
  Hex value of the SCAN_STATUS MSR for the most recent test on
  this core. Note that the error_code field may contain driver
  defined software code not defined in the Intel SDM.

Current driver limitations:

1) The ACTIVATE_SCAN MSR allows for running any consecutive subrange or
available tests. But the driver always tries to run all tests and only
uses the subrange feature to restart an interrupted test.

2) Hardware allows for some number of cores to be tested in parallel.
The driver does not make use of this, it only tests one core at a time.


Jithu Joseph (8):
  x86/microcode/intel: expose collect_cpu_info_early() for IFS
  platform/x86/intel/ifs: Add driver for In-Field Scan
  platform/x86/intel/ifs: Load IFS Image
  platform/x86/intel/ifs: Check IFS Image sanity
  platform/x86/intel/ifs: Authenticate and copy to secured memory
  platform/x86/intel/ifs: Create kthreads for online cpus for scan test
  platform/x86/intel/ifs: Add IFS sysfs interface
  platform/x86/intel/ifs: add ABI documentation for IFS

Tony Luck (2):
  Documentation: In-Field Scan
  trace: platform/x86/intel/ifs: Add trace point to track Intel IFS
    operations

 Documentation/ABI/stable/sysfs-driver-ifs |  85 +++++
 Documentation/x86/ifs.rst                 | 108 ++++++
 Documentation/x86/index.rst               |   1 +
 MAINTAINERS                               |   7 +
 arch/x86/include/asm/microcode_intel.h    |   6 +
 arch/x86/kernel/cpu/microcode/intel.c     |   8 +-
 drivers/platform/x86/intel/Kconfig        |   1 +
 drivers/platform/x86/intel/Makefile       |   1 +
 drivers/platform/x86/intel/ifs/Kconfig    |   9 +
 drivers/platform/x86/intel/ifs/Makefile   |   7 +
 drivers/platform/x86/intel/ifs/core.c     | 387 +++++++++++++++++++++
 drivers/platform/x86/intel/ifs/ifs.h      | 155 +++++++++
 drivers/platform/x86/intel/ifs/load.c     | 299 ++++++++++++++++
 drivers/platform/x86/intel/ifs/sysfs.c    | 394 ++++++++++++++++++++++
 include/trace/events/ifs.h                |  38 +++
 15 files changed, 1503 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-driver-ifs
 create mode 100644 Documentation/x86/ifs.rst
 create mode 100644 drivers/platform/x86/intel/ifs/Kconfig
 create mode 100644 drivers/platform/x86/intel/ifs/Makefile
 create mode 100644 drivers/platform/x86/intel/ifs/core.c
 create mode 100644 drivers/platform/x86/intel/ifs/ifs.h
 create mode 100644 drivers/platform/x86/intel/ifs/load.c
 create mode 100644 drivers/platform/x86/intel/ifs/sysfs.c
 create mode 100644 include/trace/events/ifs.h

-- 
2.17.1
Re: [RFC 00/10] Introduce In Field Scan driver
Posted by Greg KH 4 years, 3 months ago
On Tue, Mar 01, 2022 at 11:54:47AM -0800, Jithu Joseph wrote:
> Note to Maintainers:
> Requesting x86 Maintainers to take a look at patch01 as it
> touches arch/x86 portion of the kernel. Also would like to guide them
> to patch07 which sets up hotplug notifiers and creates kthreads.
> 
> Patch 2/10 - Adds Documentation. Requesting Documentation maintainer to review it.
> 
> Requesting Greg KH to review the sysfs changes added by patch08.

"RFC" means you are not comfortable submitting the changes yet, so you
don't need my review at this point in time.  Become confident in your
changes before asking for others to review the code please.

thanks,

greg k-h
Re: [RFC 00/10] Introduce In Field Scan driver
Posted by Steven Rostedt 4 years, 3 months ago
On Tue, 1 Mar 2022 21:10:20 +0100
Greg KH <gregkh@linuxfoundation.org> wrote:

> "RFC" means you are not comfortable submitting the changes yet, so you
> don't need my review at this point in time.  Become confident in your
> changes before asking for others to review the code please.

I guess you and I have a different understanding of RFC (Request for
Comments). As to me, comments are a form of review.

In other words, RFC to me means the review is "does this design look like
it will work", and we should be reviewing the design and overview of the
patches. Not the nitty gritty details (like missed error handling, unless
the design will prevent it). Although, you could add those comments in a
review.

When I post RFCs, it's not that I'm not comfortable submitting the change,
it's because I want to know if what I'm doing makes sense, and I might be
missing something that will make this effort in vain.

What ever happen to the "Submit early, submit often" mantra?

-- Steve
Re: [RFC 00/10] Introduce In Field Scan driver
Posted by Greg KH 4 years, 3 months ago
On Wed, Mar 02, 2022 at 10:33:13AM -0500, Steven Rostedt wrote:
> On Tue, 1 Mar 2022 21:10:20 +0100
> Greg KH <gregkh@linuxfoundation.org> wrote:
> 
> > "RFC" means you are not comfortable submitting the changes yet, so you
> > don't need my review at this point in time.  Become confident in your
> > changes before asking for others to review the code please.
> 
> I guess you and I have a different understanding of RFC (Request for
> Comments). As to me, comments are a form of review.
> 
> In other words, RFC to me means the review is "does this design look like
> it will work", and we should be reviewing the design and overview of the
> patches. Not the nitty gritty details (like missed error handling, unless
> the design will prevent it). Although, you could add those comments in a
> review.
> 
> When I post RFCs, it's not that I'm not comfortable submitting the change,
> it's because I want to know if what I'm doing makes sense, and I might be
> missing something that will make this effort in vain.
> 
> What ever happen to the "Submit early, submit often" mantra?

For patches from "experienced" submitters like this, with reviews from
other experienced reviewers already (look at the s-o-b chain here),
there's no excuse for it to be a RFC unless something is really odd as
the experienced reviewers should have already handled the "comments"
portion already.  Otherwise their review was kind of pointless, right?

I'm all for submitting early, but be confident!

Also, I have way too many non-RFC patches to review in my queue, so that
means any RFC patches fall to the bottom so I like to give people a
reason why I'm not reviewing them, like I did here.

thanks,

greg k-h
Re: [RFC 00/10] Introduce In Field Scan driver
Posted by Andy Lutomirski 4 years, 3 months ago

On Tue, Mar 1, 2022, at 11:54 AM, Jithu Joseph wrote:
> Note to Maintainers:
> Requesting x86 Maintainers to take a look at patch01 as it
> touches arch/x86 portion of the kernel. Also would like to guide them
> to patch07 which sets up hotplug notifiers and creates kthreads.
>
> Patch 2/10 - Adds Documentation. Requesting Documentation maintainer to 
> review it.
>
> Requesting Greg KH to review the sysfs changes added by patch08.
>
> Patch10 adds tracing support, requesting Steven Rostedt to review that.
>
> Rest of the patches adds the IFS platform driver, requesting Platform 
> driver maintainers
> to review them.
>
>
> In Field Scan (IFS) is a hardware feature to run circuit level tests on
> a CPU core to detect problems that are not caught by parity or ECC checks.
>
> Intel will provide a firmware file containing the scan tests.  Similar to
> microcode there is a separate file for each family-model-stepping. The
> tests in the file are divided into some number of "chunks" that can be
> run individually.
>
> The driver loads the tests into memory reserved BIOS local to each CPU
> socket in a two step process using writes to MSRs to first load the
> SHA hashes for the test. Then the tests themselves. Status MSRs provide
> feedback on the success/failure of these steps.
>
> Tests are run by synchronizing execution of all threads on a core and
> then writing to the ACTIVATE_SCAN MSR on all threads. Instruction
> execution continues when:
>
> 1) all tests have completed
> 2) execution was interrupted
> 3) a test detected a problem
>
> In all cases reading the SCAN_STATUS MSR provides details on what
> happened. Interrupted tests may be restarted.
>
> The IFS driver provides interfaces from /sys to reload tests and to
> control execution:
>
> /sys/devices/system/cpu/ifs/reload
>   Writing "1" to this file will reload the tests from
>   /lib/firmware/intel/ifs/{ff-mm-ss}.scan

IMO this interface is wrong.  /lib/firmware is for firmware (or ucode, etc) files that should be provided by a distribution and loaded, as needed, by a driver so the hardware can function.  This is not at all what IFS does. For IFS, an administrator wants to run a specific test, and the test blob is part of the instruction to run the test.  The distribution should not be involved, and this should work even on systems where /lib/firmware is immutable.

So either the blob should be written to a file in sysfs or it should be supplied by write or ioctl to a device node.

>
> /sys/devices/system/cpu/ifs/run_test
>   Writing "1" to this file will trigger a scan on each core
>   sequentially by logical CPU number (when HT is enabled this only
>   runs the tests once for each core)
>
> /sys/devices/system/cpu/cpu#/ifs/run_test
>   Writing "1" to one of these files will trigger a scan on just
>   that core.
>
> Results of the tests are also provided in /sys:
>
> /sys/devices/system/cpu/ifs/status
>   Global status. Will show the most serious status across
>   all cores (fail > untested > pass)
>
> /sys/devices/system/cpu/ifs/cpu_fail_list
> /sys/devices/system/cpu/ifs/cpu_pass_list
> /sys/devices/system/cpu/ifs/cpu_untested_list
>   CPU lists showing which CPUs have which test status
>
> /sys/devices/system/cpu/cpu#/ifs/status
>   Status (pass/fail/untested) of each core
>
> /sys/devices/system/cpu/cpu#/ifs/details
>   Hex value of the SCAN_STATUS MSR for the most recent test on
>   this core. Note that the error_code field may contain driver
>   defined software code not defined in the Intel SDM.
>
> Current driver limitations:
>
> 1) The ACTIVATE_SCAN MSR allows for running any consecutive subrange or
> available tests. But the driver always tries to run all tests and only
> uses the subrange feature to restart an interrupted test.
>
> 2) Hardware allows for some number of cores to be tested in parallel.
> The driver does not make use of this, it only tests one core at a time.
>
>
> Jithu Joseph (8):
>   x86/microcode/intel: expose collect_cpu_info_early() for IFS
>   platform/x86/intel/ifs: Add driver for In-Field Scan
>   platform/x86/intel/ifs: Load IFS Image
>   platform/x86/intel/ifs: Check IFS Image sanity
>   platform/x86/intel/ifs: Authenticate and copy to secured memory
>   platform/x86/intel/ifs: Create kthreads for online cpus for scan test
>   platform/x86/intel/ifs: Add IFS sysfs interface
>   platform/x86/intel/ifs: add ABI documentation for IFS
>
> Tony Luck (2):
>   Documentation: In-Field Scan
>   trace: platform/x86/intel/ifs: Add trace point to track Intel IFS
>     operations
>
>  Documentation/ABI/stable/sysfs-driver-ifs |  85 +++++
>  Documentation/x86/ifs.rst                 | 108 ++++++
>  Documentation/x86/index.rst               |   1 +
>  MAINTAINERS                               |   7 +
>  arch/x86/include/asm/microcode_intel.h    |   6 +
>  arch/x86/kernel/cpu/microcode/intel.c     |   8 +-
>  drivers/platform/x86/intel/Kconfig        |   1 +
>  drivers/platform/x86/intel/Makefile       |   1 +
>  drivers/platform/x86/intel/ifs/Kconfig    |   9 +
>  drivers/platform/x86/intel/ifs/Makefile   |   7 +
>  drivers/platform/x86/intel/ifs/core.c     | 387 +++++++++++++++++++++
>  drivers/platform/x86/intel/ifs/ifs.h      | 155 +++++++++
>  drivers/platform/x86/intel/ifs/load.c     | 299 ++++++++++++++++
>  drivers/platform/x86/intel/ifs/sysfs.c    | 394 ++++++++++++++++++++++
>  include/trace/events/ifs.h                |  38 +++
>  15 files changed, 1503 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/ABI/stable/sysfs-driver-ifs
>  create mode 100644 Documentation/x86/ifs.rst
>  create mode 100644 drivers/platform/x86/intel/ifs/Kconfig
>  create mode 100644 drivers/platform/x86/intel/ifs/Makefile
>  create mode 100644 drivers/platform/x86/intel/ifs/core.c
>  create mode 100644 drivers/platform/x86/intel/ifs/ifs.h
>  create mode 100644 drivers/platform/x86/intel/ifs/load.c
>  create mode 100644 drivers/platform/x86/intel/ifs/sysfs.c
>  create mode 100644 include/trace/events/ifs.h
>
> -- 
> 2.17.1
Re: [RFC 00/10] Introduce In Field Scan driver
Posted by Williams, Dan J 4 years, 3 months ago
On Tue, 2022-03-01 at 11:54 -0800, Jithu Joseph wrote:
> Note to Maintainers:
> Requesting x86 Maintainers to take a look at patch01 as it
> touches arch/x86 portion of the kernel. Also would like to guide them
> to patch07 which sets up hotplug notifiers and creates kthreads.
> 
> Patch 2/10 - Adds Documentation. Requesting Documentation maintainer to review it.
> 
> Requesting Greg KH to review the sysfs changes added by patch08.
> 
> Patch10 adds tracing support, requesting Steven Rostedt to review that.
> 
> Rest of the patches adds the IFS platform driver, requesting Platform driver maintainers
> to review them.
> 
> 
> In Field Scan (IFS) is a hardware feature to run circuit level tests on
> a CPU core to detect problems that are not caught by parity or ECC checks.
> 
> Intel will provide a firmware file containing the scan tests.  Similar to
> microcode there is a separate file for each family-model-stepping. The
> tests in the file are divided into some number of "chunks" that can be
> run individually.
> 
> The driver loads the tests into memory reserved BIOS local to each CPU
> socket in a two step process using writes to MSRs to first load the
> SHA hashes for the test. Then the tests themselves. Status MSRs provide
> feedback on the success/failure of these steps.
> 
> Tests are run by synchronizing execution of all threads on a core and
> then writing to the ACTIVATE_SCAN MSR on all threads. Instruction
> execution continues when:
> 
> 1) all tests have completed
> 2) execution was interrupted
> 3) a test detected a problem
> 
> In all cases reading the SCAN_STATUS MSR provides details on what
> happened. Interrupted tests may be restarted.

Can you say a bit about what motivates upstream to want to carry this
support? For example, if the test content comes from out of tree (i.e.
there is no source for tests other then a location under a URL on
github.com/intel), and nothing in the kernel consumes the results, then
what breaks if that blob from the test URL also has a a few source
files and a Kbuild file to produce a .ko alongside the .scan file? This
is more a comment on the cover letter basics than an actual proposal to
change the distribution model. Just, in general, the cover letter
clarifies why upstream should care about the patches.