.../admin-guide/kernel-parameters.txt | 16 ++++++++++++++ drivers/char/mem.c | 21 +++++++++++++++++++ 2 files changed, 37 insertions(+)
When CONFIG_STRICT_DEVMEM is enabled, writing to /dev/mem to override
kernel data for debugging purposes is prohibited. This configuration is
always enabled on our production servers. However, there are times when we
need to use the crash utility to modify kernel data to analyze complex
issues.
As suggested by Ingo, we can add a boot time knob of soft-enabling it.
Therefore, a new parameter "strict_devmem=" is added. The reuslt are as
follows,
- Before this change
crash> wr panic_on_oops 0
wr: cannot write to /proc/kcore <<<< failed
- After this change
- default
crash> wr panic_on_oops 0
wr: cannot write to /proc/kcore <<<< failed
- strict_devmem=off
crash> p panic_on_oops
panic_on_oops = $1 = 1
crash> wr panic_on_oops 0
crash> p panic_on_oops
panic_on_oops = $2 = 0 <<<< succeeded
- strict_devmem=invalid
[ 0.230052] Invalid option string for strict_devmem: 'invalid'
crash> wr panic_on_oops 0
wr: cannot write to /proc/kcore <<<< failed
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
.../admin-guide/kernel-parameters.txt | 16 ++++++++++++++
drivers/char/mem.c | 21 +++++++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1518343bbe22..7fe0f66d0dfb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6563,6 +6563,22 @@
them frequently to increase the rate of SLB faults
on kernel addresses.
+ strict_devmem=
+ [KNL] Under CONFIG_STRICT_DEVMEM, whether strict devmem
+ is enabled for this boot. Strict devmem checking is used
+ to protect the userspace (root) access to all of memory,
+ including kernel and userspace memory. Accidental access
+ to this is obviously disastrous, but specific access can
+ be used by people debugging the kernel. Note that with
+ PAT support enabled, even in this case there are
+ restrictions on /dev/mem use due to the cache aliasing
+ requirements.
+ on If IO_STRICT_DEVMEM=n, the /dev/mem file only allows
+ userspace access to PCI space and the BIOS code and data
+ regions. This is sufficient for dosemu and X and all
+ common users of /dev/mem. (default)
+ off Disable strict devmem checks.
+
sunrpc.min_resvport=
sunrpc.max_resvport=
[NFS,SUNRPC]
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 169eed162a7f..bfaeefce4709 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -57,16 +57,24 @@ static inline int valid_mmap_phys_addr_range(unsigned long pfn, size_t size)
#endif
#ifdef CONFIG_STRICT_DEVMEM
+static DEFINE_STATIC_KEY_FALSE_RO(bypass_strict_devmem);
+
static inline int page_is_allowed(unsigned long pfn)
{
+ if (static_branch_unlikely(&bypass_strict_devmem))
+ return 1;
return devmem_is_allowed(pfn);
}
+
static inline int range_is_allowed(unsigned long pfn, unsigned long size)
{
u64 from = ((u64)pfn) << PAGE_SHIFT;
u64 to = from + size;
u64 cursor = from;
+ if (static_branch_unlikely(&bypass_strict_devmem))
+ return 1;
+
while (cursor < to) {
if (!devmem_is_allowed(pfn))
return 0;
@@ -75,6 +83,19 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
}
return 1;
}
+
+static bool enable_strict_devmem __initdata = true;
+static int __init parse_strict_devmem(char *str)
+{
+ if (kstrtobool(str, &enable_strict_devmem))
+ pr_warn("Invalid option string for strict_devmem: '%s'\n",
+ str);
+ if (enable_strict_devmem == false)
+ static_branch_enable(&bypass_strict_devmem);
+ return 1;
+}
+
+__setup("strict_devmem=", parse_strict_devmem);
#else
static inline int page_is_allowed(unsigned long pfn)
{
--
2.43.5
On 20.11.24 13:28, Yafang Shao wrote: > When CONFIG_STRICT_DEVMEM is enabled, writing to /dev/mem to override > kernel data for debugging purposes is prohibited. This configuration is > always enabled on our production servers. However, there are times when we > need to use the crash utility to modify kernel data to analyze complex > issues. > > As suggested by Ingo, we can add a boot time knob of soft-enabling it. > Therefore, a new parameter "strict_devmem=" is added. The reuslt are as > follows, > > - Before this change > crash> wr panic_on_oops 0 > wr: cannot write to /proc/kcore <<<< failed > > - After this change > - default > crash> wr panic_on_oops 0 > wr: cannot write to /proc/kcore <<<< failed > > - strict_devmem=off > crash> p panic_on_oops > panic_on_oops = $1 = 1 > crash> wr panic_on_oops 0 > crash> p panic_on_oops > panic_on_oops = $2 = 0 <<<< succeeded > > - strict_devmem=invalid > [ 0.230052] Invalid option string for strict_devmem: 'invalid' > crash> wr panic_on_oops 0 > wr: cannot write to /proc/kcore <<<< failed > > Suggested-by: Ingo Molnar <mingo@kernel.org> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > --- > .../admin-guide/kernel-parameters.txt | 16 ++++++++++++++ > drivers/char/mem.c | 21 +++++++++++++++++++ > 2 files changed, 37 insertions(+) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 1518343bbe22..7fe0f66d0dfb 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -6563,6 +6563,22 @@ > them frequently to increase the rate of SLB faults > on kernel addresses. > > + strict_devmem= > + [KNL] Under CONFIG_STRICT_DEVMEM, whether strict devmem > + is enabled for this boot. Strict devmem checking is used > + to protect the userspace (root) access to all of memory, > + including kernel and userspace memory. Accidental access > + to this is obviously disastrous, but specific access can > + be used by people debugging the kernel. Note that with > + PAT support enabled, even in this case there are > + restrictions on /dev/mem use due to the cache aliasing > + requirements. > + on If IO_STRICT_DEVMEM=n, the /dev/mem file only allows > + userspace access to PCI space and the BIOS code and data > + regions. This is sufficient for dosemu and X and all > + common users of /dev/mem. (default) > + off Disable strict devmem checks. > + > sunrpc.min_resvport= > sunrpc.max_resvport= > [NFS,SUNRPC] This will allow to violate EXCLUSIVE_SYSTEM_RAM, and I am afraid I don't enjoy seeing devmem handling+config getting more complicated. -- Cheers, David / dhildenb
On Thu, Nov 21, 2024 at 4:51 PM David Hildenbrand <david@redhat.com> wrote: > > On 20.11.24 13:28, Yafang Shao wrote: > > When CONFIG_STRICT_DEVMEM is enabled, writing to /dev/mem to override > > kernel data for debugging purposes is prohibited. This configuration is > > always enabled on our production servers. However, there are times when we > > need to use the crash utility to modify kernel data to analyze complex > > issues. > > > > As suggested by Ingo, we can add a boot time knob of soft-enabling it. > > Therefore, a new parameter "strict_devmem=" is added. The reuslt are as > > follows, > > > > - Before this change > > crash> wr panic_on_oops 0 > > wr: cannot write to /proc/kcore <<<< failed > > > > - After this change > > - default > > crash> wr panic_on_oops 0 > > wr: cannot write to /proc/kcore <<<< failed > > > > - strict_devmem=off > > crash> p panic_on_oops > > panic_on_oops = $1 = 1 > > crash> wr panic_on_oops 0 > > crash> p panic_on_oops > > panic_on_oops = $2 = 0 <<<< succeeded > > > > - strict_devmem=invalid > > [ 0.230052] Invalid option string for strict_devmem: 'invalid' > > crash> wr panic_on_oops 0 > > wr: cannot write to /proc/kcore <<<< failed > > > > Suggested-by: Ingo Molnar <mingo@kernel.org> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > --- > > .../admin-guide/kernel-parameters.txt | 16 ++++++++++++++ > > drivers/char/mem.c | 21 +++++++++++++++++++ > > 2 files changed, 37 insertions(+) > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > index 1518343bbe22..7fe0f66d0dfb 100644 > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -6563,6 +6563,22 @@ > > them frequently to increase the rate of SLB faults > > on kernel addresses. > > > > + strict_devmem= > > + [KNL] Under CONFIG_STRICT_DEVMEM, whether strict devmem > > + is enabled for this boot. Strict devmem checking is used > > + to protect the userspace (root) access to all of memory, > > + including kernel and userspace memory. Accidental access > > + to this is obviously disastrous, but specific access can > > + be used by people debugging the kernel. Note that with > > + PAT support enabled, even in this case there are > > + restrictions on /dev/mem use due to the cache aliasing > > + requirements. > > + on If IO_STRICT_DEVMEM=n, the /dev/mem file only allows > > + userspace access to PCI space and the BIOS code and data > > + regions. This is sufficient for dosemu and X and all > > + common users of /dev/mem. (default) > > + off Disable strict devmem checks. > > + > > sunrpc.min_resvport= > > sunrpc.max_resvport= > > [NFS,SUNRPC] > > This will allow to violate EXCLUSIVE_SYSTEM_RAM, and I am afraid I don't > enjoy seeing devmem handling+config getting more complicated. That poses a challenge. Perhaps we should also consider disabling functions that rely on EXCLUSIVE_SYSTEM_RAM when strict_devmem=off, but implementing such a change seems overly complex. Our primary goal is to temporarily bypass STRICT_DEVMEM for live kernel debugging. In an earlier version, I proposed making the fucntion devmem_is_allowed() error-injectable, but Ingo pointed out that it violates the principles of STRICT_DEVMEM. Do you have any suggestions on enabling write access to /dev/mem in debugging tools like the crash utility, while maintaining compatibility with the existing rules? -- Regards Yafang
On Thu, Nov 21, 2024 at 10:31:12PM +0800, Yafang Shao wrote: > On Thu, Nov 21, 2024 at 4:51 PM David Hildenbrand <david@redhat.com> wrote: > > > > On 20.11.24 13:28, Yafang Shao wrote: > > > When CONFIG_STRICT_DEVMEM is enabled, writing to /dev/mem to override > > > kernel data for debugging purposes is prohibited. This configuration is > > > always enabled on our production servers. However, there are times when we > > > need to use the crash utility to modify kernel data to analyze complex > > > issues. > > > > > > As suggested by Ingo, we can add a boot time knob of soft-enabling it. > > > Therefore, a new parameter "strict_devmem=" is added. The reuslt are as > > > follows, > > > > > > - Before this change > > > crash> wr panic_on_oops 0 > > > wr: cannot write to /proc/kcore <<<< failed > > > > > > - After this change > > > - default > > > crash> wr panic_on_oops 0 > > > wr: cannot write to /proc/kcore <<<< failed > > > > > > - strict_devmem=off > > > crash> p panic_on_oops > > > panic_on_oops = $1 = 1 > > > crash> wr panic_on_oops 0 > > > crash> p panic_on_oops > > > panic_on_oops = $2 = 0 <<<< succeeded > > > > > > - strict_devmem=invalid > > > [ 0.230052] Invalid option string for strict_devmem: 'invalid' > > > crash> wr panic_on_oops 0 > > > wr: cannot write to /proc/kcore <<<< failed > > > > > > Suggested-by: Ingo Molnar <mingo@kernel.org> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > > --- > > > .../admin-guide/kernel-parameters.txt | 16 ++++++++++++++ > > > drivers/char/mem.c | 21 +++++++++++++++++++ > > > 2 files changed, 37 insertions(+) > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > > index 1518343bbe22..7fe0f66d0dfb 100644 > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > > @@ -6563,6 +6563,22 @@ > > > them frequently to increase the rate of SLB faults > > > on kernel addresses. > > > > > > + strict_devmem= > > > + [KNL] Under CONFIG_STRICT_DEVMEM, whether strict devmem > > > + is enabled for this boot. Strict devmem checking is used > > > + to protect the userspace (root) access to all of memory, > > > + including kernel and userspace memory. Accidental access > > > + to this is obviously disastrous, but specific access can > > > + be used by people debugging the kernel. Note that with > > > + PAT support enabled, even in this case there are > > > + restrictions on /dev/mem use due to the cache aliasing > > > + requirements. > > > + on If IO_STRICT_DEVMEM=n, the /dev/mem file only allows > > > + userspace access to PCI space and the BIOS code and data > > > + regions. This is sufficient for dosemu and X and all > > > + common users of /dev/mem. (default) > > > + off Disable strict devmem checks. > > > + > > > sunrpc.min_resvport= > > > sunrpc.max_resvport= > > > [NFS,SUNRPC] > > > > This will allow to violate EXCLUSIVE_SYSTEM_RAM, and I am afraid I don't > > enjoy seeing devmem handling+config getting more complicated. > > That poses a challenge. Perhaps we should also consider disabling > functions that rely on EXCLUSIVE_SYSTEM_RAM when strict_devmem=off, > but implementing such a change seems overly complex. > > Our primary goal is to temporarily bypass STRICT_DEVMEM for live > kernel debugging. In an earlier version, I proposed making the > fucntion devmem_is_allowed() error-injectable, but Ingo pointed out > that it violates the principles of STRICT_DEVMEM. I think that "primary goal" is the problem here. We don't want to do that, at all, for all the reasons why we implemented STRICT_DEVMEM and for why people enable it. Either you enable it because you want the protection and "security" it provides, or you do not. Don't try to work around it please. > Do you have any suggestions on enabling write access to /dev/mem in > debugging tools like the crash utility, while maintaining > compatibility with the existing rules? I think you just don't provide write access to /dev/mem for debugging tools as it's a huge security hole that people realized and have plugged up. If you want to provide access to this for "debugging" then just don't enable that option and live with the risk involved, I don't see how you can have it both ways. sorry, greg k-h
On Thu, Nov 21, 2024 at 11:15 PM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Thu, Nov 21, 2024 at 10:31:12PM +0800, Yafang Shao wrote: > > On Thu, Nov 21, 2024 at 4:51 PM David Hildenbrand <david@redhat.com> wrote: > > > > > > On 20.11.24 13:28, Yafang Shao wrote: > > > > When CONFIG_STRICT_DEVMEM is enabled, writing to /dev/mem to override > > > > kernel data for debugging purposes is prohibited. This configuration is > > > > always enabled on our production servers. However, there are times when we > > > > need to use the crash utility to modify kernel data to analyze complex > > > > issues. > > > > > > > > As suggested by Ingo, we can add a boot time knob of soft-enabling it. > > > > Therefore, a new parameter "strict_devmem=" is added. The reuslt are as > > > > follows, > > > > > > > > - Before this change > > > > crash> wr panic_on_oops 0 > > > > wr: cannot write to /proc/kcore <<<< failed > > > > > > > > - After this change > > > > - default > > > > crash> wr panic_on_oops 0 > > > > wr: cannot write to /proc/kcore <<<< failed > > > > > > > > - strict_devmem=off > > > > crash> p panic_on_oops > > > > panic_on_oops = $1 = 1 > > > > crash> wr panic_on_oops 0 > > > > crash> p panic_on_oops > > > > panic_on_oops = $2 = 0 <<<< succeeded > > > > > > > > - strict_devmem=invalid > > > > [ 0.230052] Invalid option string for strict_devmem: 'invalid' > > > > crash> wr panic_on_oops 0 > > > > wr: cannot write to /proc/kcore <<<< failed > > > > > > > > Suggested-by: Ingo Molnar <mingo@kernel.org> > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > > > --- > > > > .../admin-guide/kernel-parameters.txt | 16 ++++++++++++++ > > > > drivers/char/mem.c | 21 +++++++++++++++++++ > > > > 2 files changed, 37 insertions(+) > > > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > > > index 1518343bbe22..7fe0f66d0dfb 100644 > > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > > > @@ -6563,6 +6563,22 @@ > > > > them frequently to increase the rate of SLB faults > > > > on kernel addresses. > > > > > > > > + strict_devmem= > > > > + [KNL] Under CONFIG_STRICT_DEVMEM, whether strict devmem > > > > + is enabled for this boot. Strict devmem checking is used > > > > + to protect the userspace (root) access to all of memory, > > > > + including kernel and userspace memory. Accidental access > > > > + to this is obviously disastrous, but specific access can > > > > + be used by people debugging the kernel. Note that with > > > > + PAT support enabled, even in this case there are > > > > + restrictions on /dev/mem use due to the cache aliasing > > > > + requirements. > > > > + on If IO_STRICT_DEVMEM=n, the /dev/mem file only allows > > > > + userspace access to PCI space and the BIOS code and data > > > > + regions. This is sufficient for dosemu and X and all > > > > + common users of /dev/mem. (default) > > > > + off Disable strict devmem checks. > > > > + > > > > sunrpc.min_resvport= > > > > sunrpc.max_resvport= > > > > [NFS,SUNRPC] > > > > > > This will allow to violate EXCLUSIVE_SYSTEM_RAM, and I am afraid I don't > > > enjoy seeing devmem handling+config getting more complicated. > > > > That poses a challenge. Perhaps we should also consider disabling > > functions that rely on EXCLUSIVE_SYSTEM_RAM when strict_devmem=off, > > but implementing such a change seems overly complex. > > > > Our primary goal is to temporarily bypass STRICT_DEVMEM for live > > kernel debugging. In an earlier version, I proposed making the > > fucntion devmem_is_allowed() error-injectable, but Ingo pointed out > > that it violates the principles of STRICT_DEVMEM. > > I think that "primary goal" is the problem here. We don't want to do > that, at all, for all the reasons why we implemented STRICT_DEVMEM and > for why people enable it. > > Either you enable it because you want the protection and "security" it > provides, or you do not. Don't try to work around it please. > > > Do you have any suggestions on enabling write access to /dev/mem in > > debugging tools like the crash utility, while maintaining > > compatibility with the existing rules? > > I think you just don't provide write access to /dev/mem for debugging > tools as it's a huge security hole that people realized and have plugged > up. If you want to provide access to this for "debugging" then just > don't enable that option and live with the risk involved, I don't see > how you can have it both ways. I don’t quite see how STRICT_DEVMEM could pose a significant security concern. If you’re root, you already have the ability to do whatever you want on the system if you’re determined to. This option primarily serves to prevent reckless or accidental writes to kernel memory. As I understand it, STRICT_DEVMEM is more about enabling functionality for features like page table checking and virtio_mem than about enforcing security. -- Regards Yafang
On 22.11.24 03:26, Yafang Shao wrote: > On Thu, Nov 21, 2024 at 11:15 PM Greg KH <gregkh@linuxfoundation.org> wrote: >> >> On Thu, Nov 21, 2024 at 10:31:12PM +0800, Yafang Shao wrote: >>> On Thu, Nov 21, 2024 at 4:51 PM David Hildenbrand <david@redhat.com> wrote: >>>> >>>> On 20.11.24 13:28, Yafang Shao wrote: >>>>> When CONFIG_STRICT_DEVMEM is enabled, writing to /dev/mem to override >>>>> kernel data for debugging purposes is prohibited. This configuration is >>>>> always enabled on our production servers. However, there are times when we >>>>> need to use the crash utility to modify kernel data to analyze complex >>>>> issues. >>>>> >>>>> As suggested by Ingo, we can add a boot time knob of soft-enabling it. >>>>> Therefore, a new parameter "strict_devmem=" is added. The reuslt are as >>>>> follows, >>>>> >>>>> - Before this change >>>>> crash> wr panic_on_oops 0 >>>>> wr: cannot write to /proc/kcore <<<< failed >>>>> >>>>> - After this change >>>>> - default >>>>> crash> wr panic_on_oops 0 >>>>> wr: cannot write to /proc/kcore <<<< failed >>>>> >>>>> - strict_devmem=off >>>>> crash> p panic_on_oops >>>>> panic_on_oops = $1 = 1 >>>>> crash> wr panic_on_oops 0 >>>>> crash> p panic_on_oops >>>>> panic_on_oops = $2 = 0 <<<< succeeded >>>>> >>>>> - strict_devmem=invalid >>>>> [ 0.230052] Invalid option string for strict_devmem: 'invalid' >>>>> crash> wr panic_on_oops 0 >>>>> wr: cannot write to /proc/kcore <<<< failed >>>>> >>>>> Suggested-by: Ingo Molnar <mingo@kernel.org> >>>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> >>>>> --- >>>>> .../admin-guide/kernel-parameters.txt | 16 ++++++++++++++ >>>>> drivers/char/mem.c | 21 +++++++++++++++++++ >>>>> 2 files changed, 37 insertions(+) >>>>> >>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >>>>> index 1518343bbe22..7fe0f66d0dfb 100644 >>>>> --- a/Documentation/admin-guide/kernel-parameters.txt >>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>>>> @@ -6563,6 +6563,22 @@ >>>>> them frequently to increase the rate of SLB faults >>>>> on kernel addresses. >>>>> >>>>> + strict_devmem= >>>>> + [KNL] Under CONFIG_STRICT_DEVMEM, whether strict devmem >>>>> + is enabled for this boot. Strict devmem checking is used >>>>> + to protect the userspace (root) access to all of memory, >>>>> + including kernel and userspace memory. Accidental access >>>>> + to this is obviously disastrous, but specific access can >>>>> + be used by people debugging the kernel. Note that with >>>>> + PAT support enabled, even in this case there are >>>>> + restrictions on /dev/mem use due to the cache aliasing >>>>> + requirements. >>>>> + on If IO_STRICT_DEVMEM=n, the /dev/mem file only allows >>>>> + userspace access to PCI space and the BIOS code and data >>>>> + regions. This is sufficient for dosemu and X and all >>>>> + common users of /dev/mem. (default) >>>>> + off Disable strict devmem checks. >>>>> + >>>>> sunrpc.min_resvport= >>>>> sunrpc.max_resvport= >>>>> [NFS,SUNRPC] >>>> >>>> This will allow to violate EXCLUSIVE_SYSTEM_RAM, and I am afraid I don't >>>> enjoy seeing devmem handling+config getting more complicated. >>> >>> That poses a challenge. Perhaps we should also consider disabling >>> functions that rely on EXCLUSIVE_SYSTEM_RAM when strict_devmem=off, >>> but implementing such a change seems overly complex. >>> >>> Our primary goal is to temporarily bypass STRICT_DEVMEM for live >>> kernel debugging. In an earlier version, I proposed making the >>> fucntion devmem_is_allowed() error-injectable, but Ingo pointed out >>> that it violates the principles of STRICT_DEVMEM. >> >> I think that "primary goal" is the problem here. We don't want to do >> that, at all, for all the reasons why we implemented STRICT_DEVMEM and >> for why people enable it. >> >> Either you enable it because you want the protection and "security" it >> provides, or you do not. Don't try to work around it please. >> >>> Do you have any suggestions on enabling write access to /dev/mem in >>> debugging tools like the crash utility, while maintaining >>> compatibility with the existing rules? >> >> I think you just don't provide write access to /dev/mem for debugging >> tools as it's a huge security hole that people realized and have plugged >> up. If you want to provide access to this for "debugging" then just >> don't enable that option and live with the risk involved, I don't see >> how you can have it both ways. > > I don’t quite see how STRICT_DEVMEM could pose a significant security > concern. If you’re root, you already have the ability to do whatever > you want on the system if you’re determined to. This option primarily > serves to prevent reckless or accidental writes to kernel memory. > > As I understand it, STRICT_DEVMEM is more about enabling functionality > for features like page table checking and virtio_mem than about > enforcing security. If you look at the history, there were all mechanisms added way after STRICT_DEVMEM. I mean, just take a look at who relies on STRICT_DEVMEM. HARDENED_USERCOPY in security/Kconfig .. So NACK. -- Cheers, David / dhildenb
On Fri, Nov 22, 2024 at 7:00 PM David Hildenbrand <david@redhat.com> wrote: > > On 22.11.24 03:26, Yafang Shao wrote: > > On Thu, Nov 21, 2024 at 11:15 PM Greg KH <gregkh@linuxfoundation.org> wrote: > >> > >> On Thu, Nov 21, 2024 at 10:31:12PM +0800, Yafang Shao wrote: > >>> On Thu, Nov 21, 2024 at 4:51 PM David Hildenbrand <david@redhat.com> wrote: > >>>> > >>>> On 20.11.24 13:28, Yafang Shao wrote: > >>>>> When CONFIG_STRICT_DEVMEM is enabled, writing to /dev/mem to override > >>>>> kernel data for debugging purposes is prohibited. This configuration is > >>>>> always enabled on our production servers. However, there are times when we > >>>>> need to use the crash utility to modify kernel data to analyze complex > >>>>> issues. > >>>>> > >>>>> As suggested by Ingo, we can add a boot time knob of soft-enabling it. > >>>>> Therefore, a new parameter "strict_devmem=" is added. The reuslt are as > >>>>> follows, > >>>>> > >>>>> - Before this change > >>>>> crash> wr panic_on_oops 0 > >>>>> wr: cannot write to /proc/kcore <<<< failed > >>>>> > >>>>> - After this change > >>>>> - default > >>>>> crash> wr panic_on_oops 0 > >>>>> wr: cannot write to /proc/kcore <<<< failed > >>>>> > >>>>> - strict_devmem=off > >>>>> crash> p panic_on_oops > >>>>> panic_on_oops = $1 = 1 > >>>>> crash> wr panic_on_oops 0 > >>>>> crash> p panic_on_oops > >>>>> panic_on_oops = $2 = 0 <<<< succeeded > >>>>> > >>>>> - strict_devmem=invalid > >>>>> [ 0.230052] Invalid option string for strict_devmem: 'invalid' > >>>>> crash> wr panic_on_oops 0 > >>>>> wr: cannot write to /proc/kcore <<<< failed > >>>>> > >>>>> Suggested-by: Ingo Molnar <mingo@kernel.org> > >>>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > >>>>> --- > >>>>> .../admin-guide/kernel-parameters.txt | 16 ++++++++++++++ > >>>>> drivers/char/mem.c | 21 +++++++++++++++++++ > >>>>> 2 files changed, 37 insertions(+) > >>>>> > >>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > >>>>> index 1518343bbe22..7fe0f66d0dfb 100644 > >>>>> --- a/Documentation/admin-guide/kernel-parameters.txt > >>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt > >>>>> @@ -6563,6 +6563,22 @@ > >>>>> them frequently to increase the rate of SLB faults > >>>>> on kernel addresses. > >>>>> > >>>>> + strict_devmem= > >>>>> + [KNL] Under CONFIG_STRICT_DEVMEM, whether strict devmem > >>>>> + is enabled for this boot. Strict devmem checking is used > >>>>> + to protect the userspace (root) access to all of memory, > >>>>> + including kernel and userspace memory. Accidental access > >>>>> + to this is obviously disastrous, but specific access can > >>>>> + be used by people debugging the kernel. Note that with > >>>>> + PAT support enabled, even in this case there are > >>>>> + restrictions on /dev/mem use due to the cache aliasing > >>>>> + requirements. > >>>>> + on If IO_STRICT_DEVMEM=n, the /dev/mem file only allows > >>>>> + userspace access to PCI space and the BIOS code and data > >>>>> + regions. This is sufficient for dosemu and X and all > >>>>> + common users of /dev/mem. (default) > >>>>> + off Disable strict devmem checks. > >>>>> + > >>>>> sunrpc.min_resvport= > >>>>> sunrpc.max_resvport= > >>>>> [NFS,SUNRPC] > >>>> > >>>> This will allow to violate EXCLUSIVE_SYSTEM_RAM, and I am afraid I don't > >>>> enjoy seeing devmem handling+config getting more complicated. > >>> > >>> That poses a challenge. Perhaps we should also consider disabling > >>> functions that rely on EXCLUSIVE_SYSTEM_RAM when strict_devmem=off, > >>> but implementing such a change seems overly complex. > >>> > >>> Our primary goal is to temporarily bypass STRICT_DEVMEM for live > >>> kernel debugging. In an earlier version, I proposed making the > >>> fucntion devmem_is_allowed() error-injectable, but Ingo pointed out > >>> that it violates the principles of STRICT_DEVMEM. > >> > >> I think that "primary goal" is the problem here. We don't want to do > >> that, at all, for all the reasons why we implemented STRICT_DEVMEM and > >> for why people enable it. > >> > >> Either you enable it because you want the protection and "security" it > >> provides, or you do not. Don't try to work around it please. > >> > >>> Do you have any suggestions on enabling write access to /dev/mem in > >>> debugging tools like the crash utility, while maintaining > >>> compatibility with the existing rules? > >> > >> I think you just don't provide write access to /dev/mem for debugging > >> tools as it's a huge security hole that people realized and have plugged > >> up. If you want to provide access to this for "debugging" then just > >> don't enable that option and live with the risk involved, I don't see > >> how you can have it both ways. > > > > I don’t quite see how STRICT_DEVMEM could pose a significant security > > concern. If you’re root, you already have the ability to do whatever > > you want on the system if you’re determined to. This option primarily > > serves to prevent reckless or accidental writes to kernel memory. > > > > As I understand it, STRICT_DEVMEM is more about enabling functionality > > for features like page table checking and virtio_mem than about > > enforcing security. > > If you look at the history, there were all mechanisms added way after > STRICT_DEVMEM. > > I mean, just take a look at who relies on STRICT_DEVMEM. > > HARDENED_USERCOPY in security/Kconfig .. At the very least, there’s the “hardened_usercopy=” option available for users. -- Regards Yafang
On 21.11.24 16:14, Greg KH wrote: > On Thu, Nov 21, 2024 at 10:31:12PM +0800, Yafang Shao wrote: >> On Thu, Nov 21, 2024 at 4:51 PM David Hildenbrand <david@redhat.com> wrote: >>> >>> On 20.11.24 13:28, Yafang Shao wrote: >>>> When CONFIG_STRICT_DEVMEM is enabled, writing to /dev/mem to override >>>> kernel data for debugging purposes is prohibited. This configuration is >>>> always enabled on our production servers. However, there are times when we >>>> need to use the crash utility to modify kernel data to analyze complex >>>> issues. >>>> >>>> As suggested by Ingo, we can add a boot time knob of soft-enabling it. >>>> Therefore, a new parameter "strict_devmem=" is added. The reuslt are as >>>> follows, >>>> >>>> - Before this change >>>> crash> wr panic_on_oops 0 >>>> wr: cannot write to /proc/kcore <<<< failed >>>> >>>> - After this change >>>> - default >>>> crash> wr panic_on_oops 0 >>>> wr: cannot write to /proc/kcore <<<< failed >>>> >>>> - strict_devmem=off >>>> crash> p panic_on_oops >>>> panic_on_oops = $1 = 1 >>>> crash> wr panic_on_oops 0 >>>> crash> p panic_on_oops >>>> panic_on_oops = $2 = 0 <<<< succeeded >>>> >>>> - strict_devmem=invalid >>>> [ 0.230052] Invalid option string for strict_devmem: 'invalid' >>>> crash> wr panic_on_oops 0 >>>> wr: cannot write to /proc/kcore <<<< failed >>>> >>>> Suggested-by: Ingo Molnar <mingo@kernel.org> >>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> >>>> --- >>>> .../admin-guide/kernel-parameters.txt | 16 ++++++++++++++ >>>> drivers/char/mem.c | 21 +++++++++++++++++++ >>>> 2 files changed, 37 insertions(+) >>>> >>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >>>> index 1518343bbe22..7fe0f66d0dfb 100644 >>>> --- a/Documentation/admin-guide/kernel-parameters.txt >>>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>>> @@ -6563,6 +6563,22 @@ >>>> them frequently to increase the rate of SLB faults >>>> on kernel addresses. >>>> >>>> + strict_devmem= >>>> + [KNL] Under CONFIG_STRICT_DEVMEM, whether strict devmem >>>> + is enabled for this boot. Strict devmem checking is used >>>> + to protect the userspace (root) access to all of memory, >>>> + including kernel and userspace memory. Accidental access >>>> + to this is obviously disastrous, but specific access can >>>> + be used by people debugging the kernel. Note that with >>>> + PAT support enabled, even in this case there are >>>> + restrictions on /dev/mem use due to the cache aliasing >>>> + requirements. >>>> + on If IO_STRICT_DEVMEM=n, the /dev/mem file only allows >>>> + userspace access to PCI space and the BIOS code and data >>>> + regions. This is sufficient for dosemu and X and all >>>> + common users of /dev/mem. (default) >>>> + off Disable strict devmem checks. >>>> + >>>> sunrpc.min_resvport= >>>> sunrpc.max_resvport= >>>> [NFS,SUNRPC] >>> >>> This will allow to violate EXCLUSIVE_SYSTEM_RAM, and I am afraid I don't >>> enjoy seeing devmem handling+config getting more complicated. >> >> That poses a challenge. Perhaps we should also consider disabling >> functions that rely on EXCLUSIVE_SYSTEM_RAM when strict_devmem=off, >> but implementing such a change seems overly complex. >> >> Our primary goal is to temporarily bypass STRICT_DEVMEM for live >> kernel debugging. In an earlier version, I proposed making the >> fucntion devmem_is_allowed() error-injectable, but Ingo pointed out >> that it violates the principles of STRICT_DEVMEM. > > I think that "primary goal" is the problem here. We don't want to do > that, at all, for all the reasons why we implemented STRICT_DEVMEM and > for why people enable it. +1 > > Either you enable it because you want the protection and "security" it > provides, or you do not. Don't try to work around it please. > >> Do you have any suggestions on enabling write access to /dev/mem in >> debugging tools like the crash utility, while maintaining >> compatibility with the existing rules? > > I think you just don't provide write access to /dev/mem for debugging > tools as it's a huge security hole that people realized and have plugged > up. If you want to provide access to this for "debugging" then just > don't enable that option and live with the risk involved, I don't see > how you can have it both ways. Exactly. And I think a reasonable approach would be to have a debug kernel around into which you can boot, and make sure the debug kernel has such security features turned off. If you rely on distros, maybe you could convince the distro to ship the debug kernel with STRICT_DEVMEM off. I just checked RHEL9, and it only seems to be off in debug kernels on arm64 and s390x (IIUC). Maybe there is a reason we don't even want that off on debug kernels on x86_64, or nobody requested it so far, because using the crash utility with write access on a live system ... is a rather weird ... debugging mechanism in 2024 IMHO. -- Cheers, David / dhildenb
On Thu, Nov 21, 2024 at 11:23 PM David Hildenbrand <david@redhat.com> wrote: > > On 21.11.24 16:14, Greg KH wrote: > > On Thu, Nov 21, 2024 at 10:31:12PM +0800, Yafang Shao wrote: > >> On Thu, Nov 21, 2024 at 4:51 PM David Hildenbrand <david@redhat.com> wrote: > >>> > >>> On 20.11.24 13:28, Yafang Shao wrote: > >>>> When CONFIG_STRICT_DEVMEM is enabled, writing to /dev/mem to override > >>>> kernel data for debugging purposes is prohibited. This configuration is > >>>> always enabled on our production servers. However, there are times when we > >>>> need to use the crash utility to modify kernel data to analyze complex > >>>> issues. > >>>> > >>>> As suggested by Ingo, we can add a boot time knob of soft-enabling it. > >>>> Therefore, a new parameter "strict_devmem=" is added. The reuslt are as > >>>> follows, > >>>> > >>>> - Before this change > >>>> crash> wr panic_on_oops 0 > >>>> wr: cannot write to /proc/kcore <<<< failed > >>>> > >>>> - After this change > >>>> - default > >>>> crash> wr panic_on_oops 0 > >>>> wr: cannot write to /proc/kcore <<<< failed > >>>> > >>>> - strict_devmem=off > >>>> crash> p panic_on_oops > >>>> panic_on_oops = $1 = 1 > >>>> crash> wr panic_on_oops 0 > >>>> crash> p panic_on_oops > >>>> panic_on_oops = $2 = 0 <<<< succeeded > >>>> > >>>> - strict_devmem=invalid > >>>> [ 0.230052] Invalid option string for strict_devmem: 'invalid' > >>>> crash> wr panic_on_oops 0 > >>>> wr: cannot write to /proc/kcore <<<< failed > >>>> > >>>> Suggested-by: Ingo Molnar <mingo@kernel.org> > >>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > >>>> --- > >>>> .../admin-guide/kernel-parameters.txt | 16 ++++++++++++++ > >>>> drivers/char/mem.c | 21 +++++++++++++++++++ > >>>> 2 files changed, 37 insertions(+) > >>>> > >>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > >>>> index 1518343bbe22..7fe0f66d0dfb 100644 > >>>> --- a/Documentation/admin-guide/kernel-parameters.txt > >>>> +++ b/Documentation/admin-guide/kernel-parameters.txt > >>>> @@ -6563,6 +6563,22 @@ > >>>> them frequently to increase the rate of SLB faults > >>>> on kernel addresses. > >>>> > >>>> + strict_devmem= > >>>> + [KNL] Under CONFIG_STRICT_DEVMEM, whether strict devmem > >>>> + is enabled for this boot. Strict devmem checking is used > >>>> + to protect the userspace (root) access to all of memory, > >>>> + including kernel and userspace memory. Accidental access > >>>> + to this is obviously disastrous, but specific access can > >>>> + be used by people debugging the kernel. Note that with > >>>> + PAT support enabled, even in this case there are > >>>> + restrictions on /dev/mem use due to the cache aliasing > >>>> + requirements. > >>>> + on If IO_STRICT_DEVMEM=n, the /dev/mem file only allows > >>>> + userspace access to PCI space and the BIOS code and data > >>>> + regions. This is sufficient for dosemu and X and all > >>>> + common users of /dev/mem. (default) > >>>> + off Disable strict devmem checks. > >>>> + > >>>> sunrpc.min_resvport= > >>>> sunrpc.max_resvport= > >>>> [NFS,SUNRPC] > >>> > >>> This will allow to violate EXCLUSIVE_SYSTEM_RAM, and I am afraid I don't > >>> enjoy seeing devmem handling+config getting more complicated. > >> > >> That poses a challenge. Perhaps we should also consider disabling > >> functions that rely on EXCLUSIVE_SYSTEM_RAM when strict_devmem=off, > >> but implementing such a change seems overly complex. > >> > >> Our primary goal is to temporarily bypass STRICT_DEVMEM for live > >> kernel debugging. In an earlier version, I proposed making the > >> fucntion devmem_is_allowed() error-injectable, but Ingo pointed out > >> that it violates the principles of STRICT_DEVMEM. > > > > I think that "primary goal" is the problem here. We don't want to do > > that, at all, for all the reasons why we implemented STRICT_DEVMEM and > > for why people enable it. > > +1 > > > > > Either you enable it because you want the protection and "security" it > > provides, or you do not. Don't try to work around it please. > > > >> Do you have any suggestions on enabling write access to /dev/mem in > >> debugging tools like the crash utility, while maintaining > >> compatibility with the existing rules? > > > > I think you just don't provide write access to /dev/mem for debugging > > tools as it's a huge security hole that people realized and have plugged > > up. If you want to provide access to this for "debugging" then just > > don't enable that option and live with the risk involved, I don't see > > how you can have it both ways. > > Exactly. And I think a reasonable approach would be to have a debug > kernel around into which you can boot, and make sure the debug kernel > has such security features turned off. > > If you rely on distros, maybe you could convince the distro to ship the > debug kernel with STRICT_DEVMEM off. I just checked RHEL9, and it only > seems to be off in debug kernels on arm64 and s390x (IIUC). Maybe there > is a reason we don't even want that off on debug kernels on x86_64, or > nobody requested it so far, because using the crash utility with write > access on a live system ... is a rather weird ... debugging mechanism in > 2024 IMHO. It seems I might be a bit outdated. Could you share how you typically modify a live system these days? Are you using live patching, writing kernel modules, or perhaps some clever tools or techniques I'm not familiar with? -- Regards Yafang
On 22.11.24 03:14, Yafang Shao wrote: > On Thu, Nov 21, 2024 at 11:23 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 21.11.24 16:14, Greg KH wrote: >>> On Thu, Nov 21, 2024 at 10:31:12PM +0800, Yafang Shao wrote: >>>> On Thu, Nov 21, 2024 at 4:51 PM David Hildenbrand <david@redhat.com> wrote: >>>>> >>>>> On 20.11.24 13:28, Yafang Shao wrote: >>>>>> When CONFIG_STRICT_DEVMEM is enabled, writing to /dev/mem to override >>>>>> kernel data for debugging purposes is prohibited. This configuration is >>>>>> always enabled on our production servers. However, there are times when we >>>>>> need to use the crash utility to modify kernel data to analyze complex >>>>>> issues. >>>>>> >>>>>> As suggested by Ingo, we can add a boot time knob of soft-enabling it. >>>>>> Therefore, a new parameter "strict_devmem=" is added. The reuslt are as >>>>>> follows, >>>>>> >>>>>> - Before this change >>>>>> crash> wr panic_on_oops 0 >>>>>> wr: cannot write to /proc/kcore <<<< failed >>>>>> >>>>>> - After this change >>>>>> - default >>>>>> crash> wr panic_on_oops 0 >>>>>> wr: cannot write to /proc/kcore <<<< failed >>>>>> >>>>>> - strict_devmem=off >>>>>> crash> p panic_on_oops >>>>>> panic_on_oops = $1 = 1 >>>>>> crash> wr panic_on_oops 0 >>>>>> crash> p panic_on_oops >>>>>> panic_on_oops = $2 = 0 <<<< succeeded >>>>>> >>>>>> - strict_devmem=invalid >>>>>> [ 0.230052] Invalid option string for strict_devmem: 'invalid' >>>>>> crash> wr panic_on_oops 0 >>>>>> wr: cannot write to /proc/kcore <<<< failed >>>>>> >>>>>> Suggested-by: Ingo Molnar <mingo@kernel.org> >>>>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> >>>>>> --- >>>>>> .../admin-guide/kernel-parameters.txt | 16 ++++++++++++++ >>>>>> drivers/char/mem.c | 21 +++++++++++++++++++ >>>>>> 2 files changed, 37 insertions(+) >>>>>> >>>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >>>>>> index 1518343bbe22..7fe0f66d0dfb 100644 >>>>>> --- a/Documentation/admin-guide/kernel-parameters.txt >>>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>>>>> @@ -6563,6 +6563,22 @@ >>>>>> them frequently to increase the rate of SLB faults >>>>>> on kernel addresses. >>>>>> >>>>>> + strict_devmem= >>>>>> + [KNL] Under CONFIG_STRICT_DEVMEM, whether strict devmem >>>>>> + is enabled for this boot. Strict devmem checking is used >>>>>> + to protect the userspace (root) access to all of memory, >>>>>> + including kernel and userspace memory. Accidental access >>>>>> + to this is obviously disastrous, but specific access can >>>>>> + be used by people debugging the kernel. Note that with >>>>>> + PAT support enabled, even in this case there are >>>>>> + restrictions on /dev/mem use due to the cache aliasing >>>>>> + requirements. >>>>>> + on If IO_STRICT_DEVMEM=n, the /dev/mem file only allows >>>>>> + userspace access to PCI space and the BIOS code and data >>>>>> + regions. This is sufficient for dosemu and X and all >>>>>> + common users of /dev/mem. (default) >>>>>> + off Disable strict devmem checks. >>>>>> + >>>>>> sunrpc.min_resvport= >>>>>> sunrpc.max_resvport= >>>>>> [NFS,SUNRPC] >>>>> >>>>> This will allow to violate EXCLUSIVE_SYSTEM_RAM, and I am afraid I don't >>>>> enjoy seeing devmem handling+config getting more complicated. >>>> >>>> That poses a challenge. Perhaps we should also consider disabling >>>> functions that rely on EXCLUSIVE_SYSTEM_RAM when strict_devmem=off, >>>> but implementing such a change seems overly complex. >>>> >>>> Our primary goal is to temporarily bypass STRICT_DEVMEM for live >>>> kernel debugging. In an earlier version, I proposed making the >>>> fucntion devmem_is_allowed() error-injectable, but Ingo pointed out >>>> that it violates the principles of STRICT_DEVMEM. >>> >>> I think that "primary goal" is the problem here. We don't want to do >>> that, at all, for all the reasons why we implemented STRICT_DEVMEM and >>> for why people enable it. >> >> +1 >> >>> >>> Either you enable it because you want the protection and "security" it >>> provides, or you do not. Don't try to work around it please. >>> >>>> Do you have any suggestions on enabling write access to /dev/mem in >>>> debugging tools like the crash utility, while maintaining >>>> compatibility with the existing rules? >>> >>> I think you just don't provide write access to /dev/mem for debugging >>> tools as it's a huge security hole that people realized and have plugged >>> up. If you want to provide access to this for "debugging" then just >>> don't enable that option and live with the risk involved, I don't see >>> how you can have it both ways. >> >> Exactly. And I think a reasonable approach would be to have a debug >> kernel around into which you can boot, and make sure the debug kernel >> has such security features turned off. >> >> If you rely on distros, maybe you could convince the distro to ship the >> debug kernel with STRICT_DEVMEM off. I just checked RHEL9, and it only >> seems to be off in debug kernels on arm64 and s390x (IIUC). Maybe there >> is a reason we don't even want that off on debug kernels on x86_64, or >> nobody requested it so far, because using the crash utility with write >> access on a live system ... is a rather weird ... debugging mechanism in >> 2024 IMHO. > > It seems I might be a bit outdated. > Could you share how you typically modify a live system these days? Are > you using live patching, writing kernel modules, or perhaps some > clever tools or techniques I'm not familiar with? I think modifying live systems is something people usually don't do anymore. The common debugging workflow is to use kdump and analyze it offline. I mean, people like me working for distributions analyze *a lot* of issues, and never really rely on /dev/mem or crash on a production system. Well, and apparently not even in debug kernels where some of them have STRICT_DEVMEM enabled. If you find yourself having to modify a live production system, you are probably something wrong. If you really want to modify your live system, there is kdb/kgdb. Alternatively, use a debug kernel where you disable security/safety mechanisms. -- Cheers, David / dhildenb
On Fri, Nov 22, 2024 at 6:58 PM David Hildenbrand <david@redhat.com> wrote: > > On 22.11.24 03:14, Yafang Shao wrote: > > On Thu, Nov 21, 2024 at 11:23 PM David Hildenbrand <david@redhat.com> wrote: > >> > >> On 21.11.24 16:14, Greg KH wrote: > >>> On Thu, Nov 21, 2024 at 10:31:12PM +0800, Yafang Shao wrote: > >>>> On Thu, Nov 21, 2024 at 4:51 PM David Hildenbrand <david@redhat.com> wrote: > >>>>> > >>>>> On 20.11.24 13:28, Yafang Shao wrote: > >>>>>> When CONFIG_STRICT_DEVMEM is enabled, writing to /dev/mem to override > >>>>>> kernel data for debugging purposes is prohibited. This configuration is > >>>>>> always enabled on our production servers. However, there are times when we > >>>>>> need to use the crash utility to modify kernel data to analyze complex > >>>>>> issues. > >>>>>> > >>>>>> As suggested by Ingo, we can add a boot time knob of soft-enabling it. > >>>>>> Therefore, a new parameter "strict_devmem=" is added. The reuslt are as > >>>>>> follows, > >>>>>> > >>>>>> - Before this change > >>>>>> crash> wr panic_on_oops 0 > >>>>>> wr: cannot write to /proc/kcore <<<< failed > >>>>>> > >>>>>> - After this change > >>>>>> - default > >>>>>> crash> wr panic_on_oops 0 > >>>>>> wr: cannot write to /proc/kcore <<<< failed > >>>>>> > >>>>>> - strict_devmem=off > >>>>>> crash> p panic_on_oops > >>>>>> panic_on_oops = $1 = 1 > >>>>>> crash> wr panic_on_oops 0 > >>>>>> crash> p panic_on_oops > >>>>>> panic_on_oops = $2 = 0 <<<< succeeded > >>>>>> > >>>>>> - strict_devmem=invalid > >>>>>> [ 0.230052] Invalid option string for strict_devmem: 'invalid' > >>>>>> crash> wr panic_on_oops 0 > >>>>>> wr: cannot write to /proc/kcore <<<< failed > >>>>>> > >>>>>> Suggested-by: Ingo Molnar <mingo@kernel.org> > >>>>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > >>>>>> --- > >>>>>> .../admin-guide/kernel-parameters.txt | 16 ++++++++++++++ > >>>>>> drivers/char/mem.c | 21 +++++++++++++++++++ > >>>>>> 2 files changed, 37 insertions(+) > >>>>>> > >>>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > >>>>>> index 1518343bbe22..7fe0f66d0dfb 100644 > >>>>>> --- a/Documentation/admin-guide/kernel-parameters.txt > >>>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt > >>>>>> @@ -6563,6 +6563,22 @@ > >>>>>> them frequently to increase the rate of SLB faults > >>>>>> on kernel addresses. > >>>>>> > >>>>>> + strict_devmem= > >>>>>> + [KNL] Under CONFIG_STRICT_DEVMEM, whether strict devmem > >>>>>> + is enabled for this boot. Strict devmem checking is used > >>>>>> + to protect the userspace (root) access to all of memory, > >>>>>> + including kernel and userspace memory. Accidental access > >>>>>> + to this is obviously disastrous, but specific access can > >>>>>> + be used by people debugging the kernel. Note that with > >>>>>> + PAT support enabled, even in this case there are > >>>>>> + restrictions on /dev/mem use due to the cache aliasing > >>>>>> + requirements. > >>>>>> + on If IO_STRICT_DEVMEM=n, the /dev/mem file only allows > >>>>>> + userspace access to PCI space and the BIOS code and data > >>>>>> + regions. This is sufficient for dosemu and X and all > >>>>>> + common users of /dev/mem. (default) > >>>>>> + off Disable strict devmem checks. > >>>>>> + > >>>>>> sunrpc.min_resvport= > >>>>>> sunrpc.max_resvport= > >>>>>> [NFS,SUNRPC] > >>>>> > >>>>> This will allow to violate EXCLUSIVE_SYSTEM_RAM, and I am afraid I don't > >>>>> enjoy seeing devmem handling+config getting more complicated. > >>>> > >>>> That poses a challenge. Perhaps we should also consider disabling > >>>> functions that rely on EXCLUSIVE_SYSTEM_RAM when strict_devmem=off, > >>>> but implementing such a change seems overly complex. > >>>> > >>>> Our primary goal is to temporarily bypass STRICT_DEVMEM for live > >>>> kernel debugging. In an earlier version, I proposed making the > >>>> fucntion devmem_is_allowed() error-injectable, but Ingo pointed out > >>>> that it violates the principles of STRICT_DEVMEM. > >>> > >>> I think that "primary goal" is the problem here. We don't want to do > >>> that, at all, for all the reasons why we implemented STRICT_DEVMEM and > >>> for why people enable it. > >> > >> +1 > >> > >>> > >>> Either you enable it because you want the protection and "security" it > >>> provides, or you do not. Don't try to work around it please. > >>> > >>>> Do you have any suggestions on enabling write access to /dev/mem in > >>>> debugging tools like the crash utility, while maintaining > >>>> compatibility with the existing rules? > >>> > >>> I think you just don't provide write access to /dev/mem for debugging > >>> tools as it's a huge security hole that people realized and have plugged > >>> up. If you want to provide access to this for "debugging" then just > >>> don't enable that option and live with the risk involved, I don't see > >>> how you can have it both ways. > >> > >> Exactly. And I think a reasonable approach would be to have a debug > >> kernel around into which you can boot, and make sure the debug kernel > >> has such security features turned off. > >> > >> If you rely on distros, maybe you could convince the distro to ship the > >> debug kernel with STRICT_DEVMEM off. I just checked RHEL9, and it only > >> seems to be off in debug kernels on arm64 and s390x (IIUC). Maybe there > >> is a reason we don't even want that off on debug kernels on x86_64, or > >> nobody requested it so far, because using the crash utility with write > >> access on a live system ... is a rather weird ... debugging mechanism in > >> 2024 IMHO. > > > > It seems I might be a bit outdated. > > Could you share how you typically modify a live system these days? Are > > you using live patching, writing kernel modules, or perhaps some > > clever tools or techniques I'm not familiar with? > > I think modifying live systems is something people usually don't do > anymore. The common debugging workflow is to use kdump and analyze it > offline. > > I mean, people like me working for distributions analyze *a lot* of > issues, and never really rely on /dev/mem or crash on a production > system. Well, and apparently not even in debug kernels where some of > them have STRICT_DEVMEM enabled. > > If you find yourself having to modify a live production system, you are > probably something wrong. > > If you really want to modify your live system, there is kdb/kgdb. > Alternatively, use a debug kernel where you disable security/safety > mechanisms. On a live system, you can experiment and try different approaches for verification. However, with a dead system, you're left without any options to test or debug. In any case, thank you for your suggestion. -- Regards Yafang
On 22.11.24 12:50, Yafang Shao wrote: > On Fri, Nov 22, 2024 at 6:58 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 22.11.24 03:14, Yafang Shao wrote: >>> On Thu, Nov 21, 2024 at 11:23 PM David Hildenbrand <david@redhat.com> wrote: >>>> >>>> On 21.11.24 16:14, Greg KH wrote: >>>>> On Thu, Nov 21, 2024 at 10:31:12PM +0800, Yafang Shao wrote: >>>>>> On Thu, Nov 21, 2024 at 4:51 PM David Hildenbrand <david@redhat.com> wrote: >>>>>>> >>>>>>> On 20.11.24 13:28, Yafang Shao wrote: >>>>>>>> When CONFIG_STRICT_DEVMEM is enabled, writing to /dev/mem to override >>>>>>>> kernel data for debugging purposes is prohibited. This configuration is >>>>>>>> always enabled on our production servers. However, there are times when we >>>>>>>> need to use the crash utility to modify kernel data to analyze complex >>>>>>>> issues. >>>>>>>> >>>>>>>> As suggested by Ingo, we can add a boot time knob of soft-enabling it. >>>>>>>> Therefore, a new parameter "strict_devmem=" is added. The reuslt are as >>>>>>>> follows, >>>>>>>> >>>>>>>> - Before this change >>>>>>>> crash> wr panic_on_oops 0 >>>>>>>> wr: cannot write to /proc/kcore <<<< failed >>>>>>>> >>>>>>>> - After this change >>>>>>>> - default >>>>>>>> crash> wr panic_on_oops 0 >>>>>>>> wr: cannot write to /proc/kcore <<<< failed >>>>>>>> >>>>>>>> - strict_devmem=off >>>>>>>> crash> p panic_on_oops >>>>>>>> panic_on_oops = $1 = 1 >>>>>>>> crash> wr panic_on_oops 0 >>>>>>>> crash> p panic_on_oops >>>>>>>> panic_on_oops = $2 = 0 <<<< succeeded >>>>>>>> >>>>>>>> - strict_devmem=invalid >>>>>>>> [ 0.230052] Invalid option string for strict_devmem: 'invalid' >>>>>>>> crash> wr panic_on_oops 0 >>>>>>>> wr: cannot write to /proc/kcore <<<< failed >>>>>>>> >>>>>>>> Suggested-by: Ingo Molnar <mingo@kernel.org> >>>>>>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> >>>>>>>> --- >>>>>>>> .../admin-guide/kernel-parameters.txt | 16 ++++++++++++++ >>>>>>>> drivers/char/mem.c | 21 +++++++++++++++++++ >>>>>>>> 2 files changed, 37 insertions(+) >>>>>>>> >>>>>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >>>>>>>> index 1518343bbe22..7fe0f66d0dfb 100644 >>>>>>>> --- a/Documentation/admin-guide/kernel-parameters.txt >>>>>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>>>>>>> @@ -6563,6 +6563,22 @@ >>>>>>>> them frequently to increase the rate of SLB faults >>>>>>>> on kernel addresses. >>>>>>>> >>>>>>>> + strict_devmem= >>>>>>>> + [KNL] Under CONFIG_STRICT_DEVMEM, whether strict devmem >>>>>>>> + is enabled for this boot. Strict devmem checking is used >>>>>>>> + to protect the userspace (root) access to all of memory, >>>>>>>> + including kernel and userspace memory. Accidental access >>>>>>>> + to this is obviously disastrous, but specific access can >>>>>>>> + be used by people debugging the kernel. Note that with >>>>>>>> + PAT support enabled, even in this case there are >>>>>>>> + restrictions on /dev/mem use due to the cache aliasing >>>>>>>> + requirements. >>>>>>>> + on If IO_STRICT_DEVMEM=n, the /dev/mem file only allows >>>>>>>> + userspace access to PCI space and the BIOS code and data >>>>>>>> + regions. This is sufficient for dosemu and X and all >>>>>>>> + common users of /dev/mem. (default) >>>>>>>> + off Disable strict devmem checks. >>>>>>>> + >>>>>>>> sunrpc.min_resvport= >>>>>>>> sunrpc.max_resvport= >>>>>>>> [NFS,SUNRPC] >>>>>>> >>>>>>> This will allow to violate EXCLUSIVE_SYSTEM_RAM, and I am afraid I don't >>>>>>> enjoy seeing devmem handling+config getting more complicated. >>>>>> >>>>>> That poses a challenge. Perhaps we should also consider disabling >>>>>> functions that rely on EXCLUSIVE_SYSTEM_RAM when strict_devmem=off, >>>>>> but implementing such a change seems overly complex. >>>>>> >>>>>> Our primary goal is to temporarily bypass STRICT_DEVMEM for live >>>>>> kernel debugging. In an earlier version, I proposed making the >>>>>> fucntion devmem_is_allowed() error-injectable, but Ingo pointed out >>>>>> that it violates the principles of STRICT_DEVMEM. >>>>> >>>>> I think that "primary goal" is the problem here. We don't want to do >>>>> that, at all, for all the reasons why we implemented STRICT_DEVMEM and >>>>> for why people enable it. >>>> >>>> +1 >>>> >>>>> >>>>> Either you enable it because you want the protection and "security" it >>>>> provides, or you do not. Don't try to work around it please. >>>>> >>>>>> Do you have any suggestions on enabling write access to /dev/mem in >>>>>> debugging tools like the crash utility, while maintaining >>>>>> compatibility with the existing rules? >>>>> >>>>> I think you just don't provide write access to /dev/mem for debugging >>>>> tools as it's a huge security hole that people realized and have plugged >>>>> up. If you want to provide access to this for "debugging" then just >>>>> don't enable that option and live with the risk involved, I don't see >>>>> how you can have it both ways. >>>> >>>> Exactly. And I think a reasonable approach would be to have a debug >>>> kernel around into which you can boot, and make sure the debug kernel >>>> has such security features turned off. >>>> >>>> If you rely on distros, maybe you could convince the distro to ship the >>>> debug kernel with STRICT_DEVMEM off. I just checked RHEL9, and it only >>>> seems to be off in debug kernels on arm64 and s390x (IIUC). Maybe there >>>> is a reason we don't even want that off on debug kernels on x86_64, or >>>> nobody requested it so far, because using the crash utility with write >>>> access on a live system ... is a rather weird ... debugging mechanism in >>>> 2024 IMHO. >>> >>> It seems I might be a bit outdated. >>> Could you share how you typically modify a live system these days? Are >>> you using live patching, writing kernel modules, or perhaps some >>> clever tools or techniques I'm not familiar with? >> >> I think modifying live systems is something people usually don't do >> anymore. The common debugging workflow is to use kdump and analyze it >> offline. >> >> I mean, people like me working for distributions analyze *a lot* of >> issues, and never really rely on /dev/mem or crash on a production >> system. Well, and apparently not even in debug kernels where some of >> them have STRICT_DEVMEM enabled. >> >> If you find yourself having to modify a live production system, you are >> probably something wrong. >> >> If you really want to modify your live system, there is kdb/kgdb. >> Alternatively, use a debug kernel where you disable security/safety >> mechanisms. > > On a live system, you can experiment and try different approaches for > verification. However, with a dead system, you're left without any > options to test or debug. Yes, but I am saying that this is barely used. (I, for my part still am most efficient with kdump+straight printks :) ) So even with the option you propose, you'd still have to reboot into a kernel with strict_devmem= set, at which point you can reboot into a proper debug kernel. ... unless I am missing something important. -- Cheers, David / dhildenb
© 2016 - 2025 Red Hat, Inc.