[PATCH v3 07/22] checkpatch: Warn on page table access without accessors

Samuel Holland posted 22 patches 3 weeks, 5 days ago
[PATCH v3 07/22] checkpatch: Warn on page table access without accessors
Posted by Samuel Holland 3 weeks, 5 days ago
Architectures may have special rules for accessing the hardware page
tables (for example, atomicity/ordering requirements), so the generic MM
code provides the pXXp_get() and set_pXX() hooks for architectures to
implement. These accessor functions are often omitted where a raw
pointer dereference is believed to be safe (i.e. race-free). However,
RISC-V needs to use these hooks to rewrite the page table values at
read/write time on some platforms. A raw pointer dereference will no
longer produce the correct value on those platforms, so the generic code
must always use the accessor functions.

sparse can only report improper pointer dereferences if every page table
pointer (variable, function argument, struct member) is individually
marked with an attribute (similar to __user). So while this is possible,
it would require invasive changes across all architectures. Instead, as
an immediate first solution, add a checkpatch warning that will
generally catch the prohibited pointer dereferences. Architecture code
is ignored, as the raw dereferences may be safe on some architectures.


Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

Changes in v3:
 - New patch for v3

 scripts/checkpatch.pl | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 92669904eecc..55984d7361ea 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -7721,6 +7721,13 @@ sub process {
 				ERROR("MISSING_SENTINEL", "missing sentinel in ID array\n" . "$here\n$stat\n");
 			}
 		}
+
+# check for raw dereferences of hardware page table pointers
+		if ($realfile !~ m@^arch/@ &&
+		    $line =~ /(?<!pte_t |p[mu4g]d_t |izeof\()\*\(?(vmf(\.|->))?(pte|p[mu4g]d)p?\b/) {
+			WARN("PAGE_TABLE_ACCESSORS",
+			     "Use $3p_get()/set_$3() instead of dereferencing page table pointers\n" . $herecurr);
+		}
 	}
 
 	# If we have no input at all, then there is nothing to report on
-- 
2.47.2
Re: [PATCH v3 07/22] checkpatch: Warn on page table access without accessors
Posted by Joe Perches 3 weeks, 5 days ago
On Wed, 2025-11-12 at 17:45 -0800, Samuel Holland wrote:
> Architectures may have special rules for accessing the hardware page
> tables (for example, atomicity/ordering requirements), so the generic MM
> code provides the pXXp_get() and set_pXX() hooks for architectures to
> implement. These accessor functions are often omitted where a raw
> pointer dereference is believed to be safe (i.e. race-free). However,
> RISC-V needs to use these hooks to rewrite the page table values at
> read/write time on some platforms. A raw pointer dereference will no
> longer produce the correct value on those platforms, so the generic code
> must always use the accessor functions.
[]
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> @@ -7721,6 +7721,13 @@ sub process {
>  				ERROR("MISSING_SENTINEL", "missing sentinel in ID array\n" . "$here\n$stat\n");
>  			}
>  		}
> +
> +# check for raw dereferences of hardware page table pointers
> +		if ($realfile !~ m@^arch/@ &&
> +		    $line =~ /(?<!pte_t |p[mu4g]d_t |izeof\()\*\(?(vmf(\.|->))?(pte|p[mu4g]d)p?\b/) {
> +			WARN("PAGE_TABLE_ACCESSORS",
> +			     "Use $3p_get()/set_$3() instead of dereferencing page table pointers\n" . $herecurr);
> +		}
>  	}

Seems like a lot of matches

$ git grep -P '(?<!pte_t |p[mu4g]d_t |izeof\()\*\(?(vmf(\.|->))?(pte|p[mu4g]d)p?\b' | \
  grep -v '^arch/' | wc -l
766

Is this really appropriate?

trivia:

izeof is really odd looking.  I'd prefer sizeof.
Re: [PATCH v3 07/22] checkpatch: Warn on page table access without accessors
Posted by Samuel Holland 3 weeks, 5 days ago
On 2025-11-12 8:21 PM, Joe Perches wrote:
> On Wed, 2025-11-12 at 17:45 -0800, Samuel Holland wrote:
>> Architectures may have special rules for accessing the hardware page
>> tables (for example, atomicity/ordering requirements), so the generic MM
>> code provides the pXXp_get() and set_pXX() hooks for architectures to
>> implement. These accessor functions are often omitted where a raw
>> pointer dereference is believed to be safe (i.e. race-free). However,
>> RISC-V needs to use these hooks to rewrite the page table values at
>> read/write time on some platforms. A raw pointer dereference will no
>> longer produce the correct value on those platforms, so the generic code
>> must always use the accessor functions.
> []
>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> []
>> @@ -7721,6 +7721,13 @@ sub process {
>>  				ERROR("MISSING_SENTINEL", "missing sentinel in ID array\n" . "$here\n$stat\n");
>>  			}
>>  		}
>> +
>> +# check for raw dereferences of hardware page table pointers
>> +		if ($realfile !~ m@^arch/@ &&
>> +		    $line =~ /(?<!pte_t |p[mu4g]d_t |izeof\()\*\(?(vmf(\.|->))?(pte|p[mu4g]d)p?\b/) {
>> +			WARN("PAGE_TABLE_ACCESSORS",
>> +			     "Use $3p_get()/set_$3() instead of dereferencing page table pointers\n" . $herecurr);
>> +		}
>>  	}
> 
> Seems like a lot of matches
> 
> $ git grep -P '(?<!pte_t |p[mu4g]d_t |izeof\()\*\(?(vmf(\.|->))?(pte|p[mu4g]d)p?\b' | \
>   grep -v '^arch/' | wc -l
> 766
> 
> Is this really appropriate?

Other patches in this series remove 277 of these matches. But it looks like a
couple of driver systems (iommu, dm) use variables that match this pattern as
well. Limiting the check to include/ and mm/ avoids the false positives.

> trivia:
> 
> izeof is really odd looking.  I'd prefer sizeof.

I agree, but my perl complains "Variable length lookbehind not implemented in
regex m/(?<!pte_t |p[mu4g]d_t |sizeof\()\*\(?(vmf(\.|->))?(pte|p[mu4g]d)p?\b/ at
scripts/checkpatch.pl line 7727." I could split this into two patterns, which
would also solve the false positives with multiple variable declarations.

Regards,
Samuel
Re: [PATCH v3 07/22] checkpatch: Warn on page table access without accessors
Posted by David Hildenbrand (Red Hat) 3 weeks, 4 days ago
On 13.11.25 03:36, Samuel Holland wrote:
> On 2025-11-12 8:21 PM, Joe Perches wrote:
>> On Wed, 2025-11-12 at 17:45 -0800, Samuel Holland wrote:
>>> Architectures may have special rules for accessing the hardware page
>>> tables (for example, atomicity/ordering requirements), so the generic MM
>>> code provides the pXXp_get() and set_pXX() hooks for architectures to
>>> implement. These accessor functions are often omitted where a raw
>>> pointer dereference is believed to be safe (i.e. race-free). However,
>>> RISC-V needs to use these hooks to rewrite the page table values at
>>> read/write time on some platforms. A raw pointer dereference will no
>>> longer produce the correct value on those platforms, so the generic code
>>> must always use the accessor functions.
>> []
>>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>> []
>>> @@ -7721,6 +7721,13 @@ sub process {
>>>   				ERROR("MISSING_SENTINEL", "missing sentinel in ID array\n" . "$here\n$stat\n");
>>>   			}
>>>   		}
>>> +
>>> +# check for raw dereferences of hardware page table pointers
>>> +		if ($realfile !~ m@^arch/@ &&
>>> +		    $line =~ /(?<!pte_t |p[mu4g]d_t |izeof\()\*\(?(vmf(\.|->))?(pte|p[mu4g]d)p?\b/) {
>>> +			WARN("PAGE_TABLE_ACCESSORS",
>>> +			     "Use $3p_get()/set_$3() instead of dereferencing page table pointers\n" . $herecurr);
>>> +		}
>>>   	}
>>
>> Seems like a lot of matches
>>
>> $ git grep -P '(?<!pte_t |p[mu4g]d_t |izeof\()\*\(?(vmf(\.|->))?(pte|p[mu4g]d)p?\b' | \
>>    grep -v '^arch/' | wc -l
>> 766

That is indeed concerning.

I recall that we discussed an alternative approach with Ryan in the 
past: I don't remember all the details, but essentially it was about 
using separate types, such that dereferencing would not get you the type 
the other functions would be expecting. Such that the compiler will bark 
when you try to dereference.


-- 
Cheers

David