[PATCH v4 0/4] Don't create Python bytecode when building the kernel

Mauro Carvalho Chehab posted 4 patches 7 months, 3 weeks ago
.gitignore                    | 1 +
Documentation/Makefile        | 5 ++---
Makefile                      | 5 +++++
drivers/gpu/drm/Makefile      | 2 +-
drivers/gpu/drm/i915/Makefile | 2 +-
include/drm/Makefile          | 2 +-
scripts/Makefile.build        | 2 +-
scripts/find-unused-docs.sh   | 2 +-
8 files changed, 13 insertions(+), 8 deletions(-)
[PATCH v4 0/4] Don't create Python bytecode when building the kernel
Posted by Mauro Carvalho Chehab 7 months, 3 weeks ago
As reported by Andy, the Kernel build system runs kernel-doc script for DRM,
when W=1. Due to Python's normal behavior, its JIT compiler will create
a bytecode and store it under scripts/lib/*/__pycache__.  As one may be using
O= and even having the sources on a read-only mount point, disable its
creation during build time.

This is done by adding PYTHONDONTWRITEBYTECODE=1 on every place
where the script is called within Kbuild and when called via another script.
 
This only solves half of the issue though, as one may be manually running
the script by hand, without asking Python to not store any bytecode.
This should be OK, but afterwards, git status will list the __pycache__ as
not committed. To prevent that, add *.pyc to .gitignore.

This series contain 4 patches:

- patch 1 adjusts a variable that pass extra data to scripts/kerneldoc.py;
- patch 2moves scripts/kernel-doc location to the main makefile
  and exports it, as scripts/Makefile.build will need it;
- patch 3 disables __pycache__ generation and ensure that the entire Kbuild
  will use KERNELDOC var for the location of kernel-doc;
- patch 4 adds *.pyc at the list of object files to be ignored.

---

v4:
- placed *.pyc at the alphabetical order at the final patch

v3:
- move KERNELDOC to the main Makefile;
- get rid of the badly-named KERNELDOC_CONF var.

v2:
- added a .gitignore file;
- add PYTHONDONTWRITEBYTECODE=1 to the places where kernel-doc
  is called.

Mauro Carvalho Chehab (4):
  docs: Makefile: get rid of KERNELDOC_CONF env variable
  Makefile: move KERNELDOC macro to the main Makefile
  scripts/kernel-doc.py: don't create *.pyc files
  .gitignore: ignore Python compiled bytecode

 .gitignore                    | 1 +
 Documentation/Makefile        | 5 ++---
 Makefile                      | 5 +++++
 drivers/gpu/drm/Makefile      | 2 +-
 drivers/gpu/drm/i915/Makefile | 2 +-
 include/drm/Makefile          | 2 +-
 scripts/Makefile.build        | 2 +-
 scripts/find-unused-docs.sh   | 2 +-
 8 files changed, 13 insertions(+), 8 deletions(-)

-- 
2.49.0
Re: [PATCH v4 0/4] Don't create Python bytecode when building the kernel
Posted by Akira Yokosawa 7 months, 3 weeks ago
Hi Andy,

Responding to Mauro's cover-letter of v4 at:

    https://lore.kernel.org/cover.1745453655.git.mchehab+huawei@kernel.org/

, which did not CC'd to you.

On Thu, 24 Apr 2025 08:16:20 +0800, Mauro Carvalho Chehab wrote:
> As reported by Andy, the Kernel build system runs kernel-doc script for DRM,
> when W=1. Due to Python's normal behavior, its JIT compiler will create
> a bytecode and store it under scripts/lib/*/__pycache__.  As one may be using
> O= and even having the sources on a read-only mount point, disable its
> creation during build time.
> 
> This is done by adding PYTHONDONTWRITEBYTECODE=1 on every place
> where the script is called within Kbuild and when called via another script.
>  
> This only solves half of the issue though, as one may be manually running
> the script by hand, without asking Python to not store any bytecode.
> This should be OK, but afterwards, git status will list the __pycache__ as
> not committed. To prevent that, add *.pyc to .gitignore.
> 
> This series contain 4 patches:
> 
> - patch 1 adjusts a variable that pass extra data to scripts/kerneldoc.py;
> - patch 2moves scripts/kernel-doc location to the main makefile
>   and exports it, as scripts/Makefile.build will need it;
> - patch 3 disables __pycache__ generation and ensure that the entire Kbuild
>   will use KERNELDOC var for the location of kernel-doc;
> - patch 4 adds *.pyc at the list of object files to be ignored.

I see Jon has merged them all, but responding here anyway.

In https://lore.kernel.org/Z_zYXAJcTD-c3xTe@black.fi.intel.com/, you said:

> This started well, until it becomes a scripts/lib/kdoc.
> So, it makes the `make O=...` builds dirty *). Please make sure this doesn't leave
> "disgusting turd" )as said by Linus) in the clean tree.
>
>*) it creates that __pycache__ disaster. And no, .gitignore IS NOT a solution.w

Andy, I don't agree with your words "__pycache__ disaster" and 
".gitignore IS NOT a solution".

Running "find . -name ".gitignore" -exec grep -nH --null -F -e ".pyc" \{\} +"
under today's Linus master returns this:

-------------------------------------------------------------
./scripts/gdb/linux/.gitignore:2:*.pyc
./drivers/comedi/drivers/ni_routing/tools/.gitignore:3:*.pyc
./tools/perf/.gitignore:32:*.pyc
./tools/testing/selftests/tc-testing/.gitignore:3:*.pyc
./Documentation/.gitignore:3:*.pyc
-------------------------------------------------------------

, and they have been working perfectly.

Having seen your response at https://lore.kernel.org/aAoERIArkvj497ns@smile.fi.intel.com/ :

> I tried before, but I admit, that I have missed something. It was a mess
> in that case. Now I probably can't repeat as I don't remember what was
> the environment and settings I had that time. I'm really glad to see it
> is working this way!

, I'm guessing you had a traumatic experience caused by python's bytecode
caching in the past.  Do you still believe ".gitignore IS NOT a solution"?

From my viewpoint, applying only 4/4 of this series is the right thing to do.

Bothering with might-become-incompatilbe-in-the-future python environment
variables in kernel Makefiles looks over-engineering to me.
Also, as Mauro says in 3/4, it is incomplete in that it does not cover
the cases where those scripts are invoked outside of kernel build.
And it will interfere with existing developers who want the benefit of
bytecode caching.

I'm not precluding the possibility of incoherent bytecode cache; for example
by using a shared kernel source tree among several developers, and only
one of them (owner) has a write permission of it.  In that case, said
owner might update the tree without running relevant python scripts.

I don't know if python can notice outdated cache and disregard it.

In such a situation, setting PYTHONPYCACHEPREFIX as an environment
variable should help, for sure, but only in such special cases.

Andy, what do you say if I ask reverts of 1/4, 2/4/, and 3/4?

Regards,
Akira
Re: [PATCH v4 0/4] Don't create Python bytecode when building the kernel
Posted by Mauro Carvalho Chehab 7 months, 3 weeks ago
Hi Akira,

Em Sat, 26 Apr 2025 11:39:05 +0900
Akira Yokosawa <akiyks@gmail.com> escreveu:

> Bothering with might-become-incompatilbe-in-the-future python environment
> variables in kernel Makefiles looks over-engineering to me.
> Also, as Mauro says in 3/4, it is incomplete in that it does not cover
> the cases where those scripts are invoked outside of kernel build.
> And it will interfere with existing developers who want the benefit of
> bytecode caching.
> 
> I'm not precluding the possibility of incoherent bytecode cache; for example
> by using a shared kernel source tree among several developers, and only
> one of them (owner) has a write permission of it.  In that case, said
> owner might update the tree without running relevant python scripts.
> 
> I don't know if python can notice outdated cache and disregard it.
> 
> In such a situation, setting PYTHONPYCACHEPREFIX as an environment
> variable should help, for sure, but only in such special cases.
> 
> Andy, what do you say if I ask reverts of 1/4, 2/4/, and 3/4?

Patches 1 and 2 are, IMO, needed anyway, as they fix a problem:
KERNELDOC environment is not used consistently.

Now, patch 3 is the one that may require more thinking.

I agree with Andy that, when O=<dir> is used, nothing shall be
written to source dir.

There are a couple of reasons for that:

1. source dir may be read only;
2. one may want to do cross compilation and use multiple output
   directories, one for each version;
3. the source dir could be mapped via NFS to multiple machines
   with different architectures.

For (3), it could mean that multiple machines may have different
Python versions, so, sharing the Python bytecode from source dir doesn't
sound a good idea. Also, I'm not sure if the pyc from different archs
would be identical.

With that, there are two options:

a. disable cache;
b. set PYTHONCACHEPREFIX.

We're currently doing (a). I guess everybody agrees that this is
is not ideal.

So, ideally, we should move to (b). For Spinx, the easiest solution
is just to place it under Documentation/output, but this is not
generic enough: ideally, we should revert patch 3 and set
PYTHONCACHEPREFIX when O is used. Eventually, we can apply my
patch for Documentation/output, while we craft such logic.

Regards,
Mauro
Re: [PATCH v4 0/4] Don't create Python bytecode when building the kernel
Posted by Andy Shevchenko 7 months, 2 weeks ago
On Sat, Apr 26, 2025 at 08:57:08PM +0800, Mauro Carvalho Chehab wrote:
> Em Sat, 26 Apr 2025 11:39:05 +0900
> Akira Yokosawa <akiyks@gmail.com> escreveu:
> 
> > Bothering with might-become-incompatilbe-in-the-future python environment
> > variables in kernel Makefiles looks over-engineering to me.
> > Also, as Mauro says in 3/4, it is incomplete in that it does not cover
> > the cases where those scripts are invoked outside of kernel build.
> > And it will interfere with existing developers who want the benefit of
> > bytecode caching.
> > 
> > I'm not precluding the possibility of incoherent bytecode cache; for example
> > by using a shared kernel source tree among several developers, and only
> > one of them (owner) has a write permission of it.  In that case, said
> > owner might update the tree without running relevant python scripts.
> > 
> > I don't know if python can notice outdated cache and disregard it.
> > 
> > In such a situation, setting PYTHONPYCACHEPREFIX as an environment
> > variable should help, for sure, but only in such special cases.
> > 
> > Andy, what do you say if I ask reverts of 1/4, 2/4/, and 3/4?
> 
> Patches 1 and 2 are, IMO, needed anyway, as they fix a problem:
> KERNELDOC environment is not used consistently.
> 
> Now, patch 3 is the one that may require more thinking.
> 
> I agree with Andy that, when O=<dir> is used, nothing shall be
> written to source dir.
> 
> There are a couple of reasons for that:
> 
> 1. source dir may be read only;
> 2. one may want to do cross compilation and use multiple output
>    directories, one for each version;
> 3. the source dir could be mapped via NFS to multiple machines
>    with different architectures.
> 
> For (3), it could mean that multiple machines may have different
> Python versions, so, sharing the Python bytecode from source dir doesn't
> sound a good idea. Also, I'm not sure if the pyc from different archs
> would be identical.
> 
> With that, there are two options:
> 
> a. disable cache;
> b. set PYTHONCACHEPREFIX.

Thanks, Mauro, for replying. I'm with you on all of it.

> We're currently doing (a). I guess everybody agrees that this is
> is not ideal.

Yes, I also prefer to have cache working if it's possible. The only BUT here is
that users should not suffer from it.

> So, ideally, we should move to (b). For Spinx, the easiest solution
> is just to place it under Documentation/output, but this is not
> generic enough: ideally, we should revert patch 3 and set
> PYTHONCACHEPREFIX when O is used. Eventually, we can apply my
> patch for Documentation/output, while we craft such logic.

-- 
With Best Regards,
Andy Shevchenko
Re: [PATCH v4 0/4] Don't create Python bytecode when building the kernel
Posted by Jonathan Corbet 7 months, 3 weeks ago
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> As reported by Andy, the Kernel build system runs kernel-doc script for DRM,
> when W=1. Due to Python's normal behavior, its JIT compiler will create
> a bytecode and store it under scripts/lib/*/__pycache__.  As one may be using
> O= and even having the sources on a read-only mount point, disable its
> creation during build time.
>
> This is done by adding PYTHONDONTWRITEBYTECODE=1 on every place
> where the script is called within Kbuild and when called via another script.
>  
> This only solves half of the issue though, as one may be manually running
> the script by hand, without asking Python to not store any bytecode.
> This should be OK, but afterwards, git status will list the __pycache__ as
> not committed. To prevent that, add *.pyc to .gitignore.
>
> This series contain 4 patches:
>
> - patch 1 adjusts a variable that pass extra data to scripts/kerneldoc.py;
> - patch 2moves scripts/kernel-doc location to the main makefile
>   and exports it, as scripts/Makefile.build will need it;
> - patch 3 disables __pycache__ generation and ensure that the entire Kbuild
>   will use KERNELDOC var for the location of kernel-doc;
> - patch 4 adds *.pyc at the list of object files to be ignored.

I've applied the set, thanks.

jon