[PATCH v3] checkpatch: Check for missing sentinels in ID arrays

Brian Norris posted 1 patch 3 months, 1 week ago
There is a newer version of this series
scripts/checkpatch.pl | 82 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 82 insertions(+)
[PATCH v3] checkpatch: Check for missing sentinels in ID arrays
Posted by Brian Norris 3 months, 1 week ago
All of the ID tables based on <linux/mod_devicetable.h> (of_device_id,
pci_device_id, ...) require their arrays to end in an empty sentinel
value. That's usually spelled with an empty initializer entry (e.g.,
"{}"), but also sometimes with explicit 0 entries, field initializers
(e.g., '.id = ""'), or even a macro entry (like PCMCIA_DEVICE_NULL).

Without a sentinel, device-matching code may read out of bounds.

I've found a number of such bugs in driver reviews, and we even
occasionally commit one to the tree. See commit 5751eee5c620 ("i2c:
nomadik: Add missing sentinel to match table") for example.

Teach checkpatch to find these ID tables, and complain if it looks like
there wasn't a sentinel value.

Test output:

  $ git format-patch -1 a0d15cc47f29be6d --stdout | scripts/checkpatch.pl -
  ERROR: missing sentinel in ID array
  #57: FILE: drivers/i2c/busses/i2c-nomadik.c:1073:
  +static const struct of_device_id nmk_i2c_eyeq_match_table[] = {
   	{
   		.compatible = "XXXXXXXXXXXXXXXXXX",
   		.data = (void *)(NMK_I2C_EYEQ_FLAG_32B_BUS | NMK_I2C_EYEQ_FLAG_IS_EYEQ5),
   	},
   };

  total: 1 errors, 0 warnings, 66 lines checked

  NOTE: For some of the reported defects, checkpatch may be able to
        mechanically convert to the typical style using --fix or --fix-inplace.

  "[PATCH] i2c: nomadik: switch from of_device_is_compatible() to" has style problems, please review.

  NOTE: If any of the errors are false positives, please report
        them to the maintainer, see CHECKPATCH in MAINTAINERS.

When run across the entire tree (scripts/checkpatch.pl -q --types
MISSING_SENTINEL -f ...), false positives exist:

* where macros are used that hide the table from analysis
  (e.g., drivers/gpu/drm/radeon/radeon_drv.c / radeon_PCI_IDS).
  There are fewer than 5 of these.

* where such tables are processed correctly via ARRAY_SIZE() (fewer than
  5 instances). This is by far not the typical usage of *_device_id
  arrays.

* some odd parsing artifacts, where ctx_statement_block() seems to quit
  in the middle of a block due to #if/#else/#endif.

Signed-off-by: Brian Norris <briannorris@chromium.org>
---
 v1: https://lore.kernel.org/all/20241218232716.3624531-1-briannorris@chromium.org/

Changes in v3:
 * actually CC LKML this time (sorry, I accidentally sent v2 directly to
   maintainers)
 * no other change

Changes in v2:
 * add a few ID types I missed (i2c, i3c, ieee1394); this time I parsed:
     grep 'struct .*_device_id {' include/linux/mod_devicetable.h
 * account for some alternative sentinels (e.g.,
   ISAPNP_DEVICE_SINGLE_END)

 scripts/checkpatch.pl | 82 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 664f7b7a622c..cfa7db17fe0b 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -685,6 +685,64 @@ our $tracing_logging_tags = qr{(?xi:
 	[\.\!:\s]*
 )};
 
+# Device ID types from include/linux/mod_devicetable.h.
+our $dev_id_types = qr{(?x:
+	acpi |
+	ap |
+	apr |
+	auxiliary |
+	bcma |
+	ccw |
+	cdx |
+	coreboot |
+	css |
+	dfl |
+	eisa |
+	fsl_mc |
+	hda |
+	hid |
+	hv_vmbus |
+	i2c |
+	i3c |
+	ieee1394 |
+	input |
+	ipack |
+	isapnp |
+	ishtp |
+	mcb |
+	mdio |
+	mei_cl |
+	mhi |
+	mips_cdmm |
+	of |
+	parisc |
+	pci |
+	pci_epf |
+	pcmcia |
+	platform |
+	pnp |
+	pnp_card |
+	rio |
+	rpmsg |
+	sdio |
+	sdw |
+	serio |
+	slim |
+	spi |
+	spmi |
+	ssam |
+	ssb |
+	tee_client |
+	typec |
+	ulpi |
+	usb |
+	vchiq |
+	vio |
+	virtio |
+	wmi |
+	zorro
+)_device_id};
+
 sub edit_distance_min {
 	my (@arr) = @_;
 	my $len = scalar @arr;
@@ -7678,6 +7736,30 @@ sub process {
 			WARN("DUPLICATED_SYSCTL_CONST",
 				"duplicated sysctl range checking value '$1', consider using the shared one in include/linux/sysctl.h\n" . $herecurr);
 		}
+
+# Check that *_device_id tables have sentinel entries.
+		if (defined $stat && $line =~ /struct $dev_id_types .*\[\] = \{/) {
+			my $stripped = $stat;
+
+			# Strip diff line prefixes.
+			$stripped =~ s/(^|\n)./$1/g;
+			# Line continuations.
+			$stripped =~ s/\\\n/\n/g;
+			# Strip whitespace, empty strings, zeroes, and commas.
+			$stripped =~ s/""//g;
+			$stripped =~ s/0x0//g;
+			$stripped =~ s/[\s$;,0]//g;
+			# Strip field assignments.
+			$stripped =~ s/\.$Ident=//g;
+
+			if (!(substr($stripped, -4) eq "{}};" ||
+			      substr($stripped, -6) eq "{{}}};" ||
+			      $stripped =~ /PCMCIA_DEVICE_NULL};$/ ||
+			      $stripped =~ /ISAPNP_DEVICE_SINGLE_END}};$/ ||
+			      $stripped =~ /ISAPNP_CARD_END}};$/)) {
+				ERROR("MISSING_SENTINEL", "missing sentinel in ID array\n" . "$here\n$stat\n");
+			}
+		}
 	}
 
 	# If we have no input at all, then there is nothing to report on
-- 
2.50.0.727.gbf7dc18ff4-goog
Re: [PATCH v3] checkpatch: Check for missing sentinels in ID arrays
Posted by Joe Perches 3 months, 1 week ago
On Tue, 2025-07-01 at 11:34 -0700, Brian Norris wrote:
> All of the ID tables based on <linux/mod_devicetable.h> (of_device_id,
> pci_device_id, ...) require their arrays to end in an empty sentinel
> value. That's usually spelled with an empty initializer entry (e.g.,
> "{}"), but also sometimes with explicit 0 entries, field initializers
> (e.g., '.id = ""'), or even a macro entry (like PCMCIA_DEVICE_NULL).
> 
> Without a sentinel, device-matching code may read out of bounds.
> 
> I've found a number of such bugs in driver reviews, and we even
> occasionally commit one to the tree. See commit 5751eee5c620 ("i2c:
> nomadik: Add missing sentinel to match table") for example.
> 
> Teach checkpatch to find these ID tables, and complain if it looks like
> there wasn't a sentinel value.
[]
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> @@ -685,6 +685,64 @@ our $tracing_logging_tags = qr{(?xi:
>  	[\.\!:\s]*
>  )};
>  
> +# Device ID types from include/linux/mod_devicetable.h.
> +our $dev_id_types = qr{(?x:

[ a long list ...]

> +)_device_id};

This list seems unmaintainable.

Does something like '\b[a-z]\w*_device_id\b'
match too many other non-sentinal uses?

$ git grep -P -o -h '\b\w+_device_id\b' -- '*.[ch]' | sort | uniq | wc -l
288

> @@ -7678,6 +7736,30 @@ sub process {
[]
> +# Check that *_device_id tables have sentinel entries.
> +		if (defined $stat && $line =~ /struct $dev_id_types .*\[\] = \{/) {

Spacing isn't guaranteed so perhaps

$line =~ /struct\s+$dev_id_types\s+\w+\s*\[\s*\]\s*=\s*\{/
Re: [PATCH v3] checkpatch: Check for missing sentinels in ID arrays
Posted by Brian Norris 3 months, 1 week ago
Hi Joe,

On Wed, Jul 02, 2025 at 02:19:45AM -0700, Joe Perches wrote:
> On Tue, 2025-07-01 at 11:34 -0700, Brian Norris wrote:
> > All of the ID tables based on <linux/mod_devicetable.h> (of_device_id,
> > pci_device_id, ...) require their arrays to end in an empty sentinel
> > value. That's usually spelled with an empty initializer entry (e.g.,
> > "{}"), but also sometimes with explicit 0 entries, field initializers
> > (e.g., '.id = ""'), or even a macro entry (like PCMCIA_DEVICE_NULL).
> > 
> > Without a sentinel, device-matching code may read out of bounds.
> > 
> > I've found a number of such bugs in driver reviews, and we even
> > occasionally commit one to the tree. See commit 5751eee5c620 ("i2c:
> > nomadik: Add missing sentinel to match table") for example.
> > 
> > Teach checkpatch to find these ID tables, and complain if it looks like
> > there wasn't a sentinel value.
> []
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> []
> > @@ -685,6 +685,64 @@ our $tracing_logging_tags = qr{(?xi:
> >  	[\.\!:\s]*
> >  )};
> >  
> > +# Device ID types from include/linux/mod_devicetable.h.
> > +our $dev_id_types = qr{(?x:
> 
> [ a long list ...]
> 
> > +)_device_id};
> 
> This list seems unmaintainable.

That seems debatable to me, as we've only had ~20 additions in the last
decade:

$ git log --oneline --since='10 years' -S '_device_id' ./include/linux/mod_devicetable.h | wc -l
20

It's pretty easy to script too. But anyway, I'm not the maintainer, so
I can try your suggestions.

> Does something like '\b[a-z]\w*_device_id\b'
> match too many other non-sentinal uses?
> 
> $ git grep -P -o -h '\b\w+_device_id\b' -- '*.[ch]' | sort | uniq | wc -l
> 288

Just inspecting the tree for struct types that match *_device_id, I see
that there are 21 of them that are not in mod_devicetable.h, and:

* the large majority of them are of the same sentinel style, and
  probably should be in mod_devicetable.h anyway
* of the relatively few that are not quite the same style, there's:
  - struct gio_device_id, which oddly uses '0xff' as a sentinel
  - struct ddb_device_id, which is confined to 1 file, and uses
    ARRAY_SIZE(). This triggers a false positive.
  - a few types like struct pcan_ufd_device_id that might technically
    match, but don't have the same sorts of tables and so are benign
* running checkpatch over the source tree only shows ~1 additional
  false positive; the aforementioned struct ddb_device_id

So on the whole, it's probably not too much of a win to enumerate a list
of types, and I'll just go with your regex instead.

> > @@ -7678,6 +7736,30 @@ sub process {
> []
> > +# Check that *_device_id tables have sentinel entries.
> > +		if (defined $stat && $line =~ /struct $dev_id_types .*\[\] = \{/) {
> 
> Spacing isn't guaranteed so perhaps
> 
> $line =~ /struct\s+$dev_id_types\s+\w+\s*\[\s*\]\s*=\s*\{/

Sure.

Brian