[PATCH] mm: Keep memory type same on DEVMEM Page-Fault

buddy.zhang posted 1 patch 1 year, 1 month ago
mm/memory.c | 3 +++
1 file changed, 3 insertions(+)
[PATCH] mm: Keep memory type same on DEVMEM Page-Fault
Posted by buddy.zhang 1 year, 1 month ago
On X86 architecture, supports memory type on Page-table, such as
PTE is PAT/PCD/PWD, which can setup up Memory Type as WC/WB/WT/UC etc.
Then, Virtual address from userspace or kernel space can map to
same physical page, if each page table has different memory type,
then it's confused to have more memory type for same physical page.

On DEVMEM, the 'remap_pfn_range()' keep memory type same on different
mapping. But if it happen on Page-Fault route, such as code:

 19 static vm_fault_t vm_fault(struct vm_fault *vmf)
 20 {
 21         struct vm_area_struct *vma = vmf->vma;
 22         unsigned long address = vmf->address;
 23         struct page *fault_page;
 24         unsigned long pfn;
 25         int r;
 26
 27         /* Allocate Page as DEVMEM */
 28         fault_page = alloc_page(GFP_KERNEL);
 29         if (!fault_page) {
 30                 printk("ERROR: NO Free Memory from DEVMEM.\n");
 31                 r = -ENOMEM;
 32                 goto err_alloc;
 33         }
 34         pfn = page_to_pfn(fault_page);
 35
 36         /* Clear PAT Attribute */
 37         pgprot_val(vma->vm_page_prot) &= ~(_PAGE_PCD | _PAGE_PWT | _PAGE_PAT);
 38
 39         /* Change Memory Type for Direct-Mapping Area */
 40         arch_io_reserve_memtype_wc(PFN_PHYS(pfn), PAGE_SIZE);
 41         pgprot_val(vma->vm_page_prot) |= cachemode2protval(_PAGE_CACHE_MODE_WT);
 42
 43         /* Establish pte and INC _mapcount for page */
 44         vm_flags_set(vma, VM_MIXEDMAP);
 45         if (vm_insert_page(vma, address, fault_page))
 46                 return -EAGAIN;
 47
 48         /* Add refcount for page */
 49         atomic_inc(&fault_page->_refcount);
 50         /* bind fault page */
 51         vmf->page = fault_page;
 52
 53         return 0;
 54
 55 err_alloc:
 56         return r;
 57 }
 58
 59 static const struct vm_operations_struct BiscuitOS_vm_ops = {
 60         .fault  = vm_fault,
 61 };
 62
 63 static int BiscuitOS_mmap(struct file *filp, struct vm_area_struct *vma)
 64 {
 65         /* setup vm_ops */
 66         vma->vm_ops = &BiscuitOS_vm_ops;
 67
 68         return 0;
 69 }

If invoke arch_io_reserve_memtype_wc() on Line-40, and modify memory type
as WC for Direct-Mapping area, and then setup meory type as WT on Line-41,
then invoke 'vm_insert_page()' to create mapping, so you can see:

    | <----- Usespace -----> | <- Kernel space -> |
----+------+---+-------------+---+---+------------+--
    |      |   |             |   |   |            |
----+------+---+-------------+---+---+------------+--
           WT|                     |WC
             o-------o    o--------o
                   WT|    |WC
                     V    V
-------------------+--------+------------------------
                   | DEVMEM |
-------------------+--------+------------------------
Physical Address Space

For this case, OS should check memory type before mapping on 'vm_insert_page()',
and keep memory type same, so add check on function:

07 int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
08                         struct page *page)
09 {
10         if (addr < vma->vm_start || addr >= vma->vm_end)
11                 return -EFAULT;
12         if (!page_count(page))
13                 return -EINVAL;
14         if (!(vma->vm_flags & VM_MIXEDMAP)) {
15                 BUG_ON(mmap_read_trylock(vma->vm_mm));
16                 BUG_ON(vma->vm_flags & VM_PFNMAP);
17                 vm_flags_set(vma, VM_MIXEDMAP);
18         }
19         if (track_pfn_remap(vma, &vma->vm_page_prot,
20                         page_to_pfn(page), addr, PAGE_SIZE))
21                 return -EINVAL;
22         return insert_page(vma, addr, page, vma->vm_page_prot);
23 }

And line 19 to 21, when mapping different memory type on this route, the
'track_pfn_remap()' will notify error and change request as current, e.g.

x86/PAT: APP:88 map pfn RAM range req write-through for [mem 0x025c1000-0x025c1fff], got write-combining

And then, we can keep memory type same on Page-fault route for DEVMEM, the end:

    | <----- Usespace -----> | <- Kernel space -> |
----+------+---+-------------+---+---+------------+--
    |      |   |             |   |   |            |
----+------+---+-------------+---+---+------------+--
           WT|                     |WC
             o---(X)----o----------o
                        |WC
                        V
-------------------+--------+------------------------
                   | DEVMEM |
-------------------+--------+------------------------

Signed-off-by: buddy.zhang@biscuitos.cn
---
 mm/memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index f456f3b5049c..ed3d09f513f1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1989,6 +1989,9 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
 		BUG_ON(vma->vm_flags & VM_PFNMAP);
 		vm_flags_set(vma, VM_MIXEDMAP);
 	}
+	if (track_pfn_remap(vma, &vma->vm_page_prot,
+			page_to_pfn(page), addr, PAGE_SIZE))
+		return -EINVAL;
 	return insert_page(vma, addr, page, vma->vm_page_prot);
 }
 EXPORT_SYMBOL(vm_insert_page);
-- 
2.25.1
Re: [PATCH] mm: Keep memory type same on DEVMEM Page-Fault
Posted by Andrew Morton 1 year ago
On Sun, 19 Mar 2023 11:37:50 +0800 "buddy.zhang" <buddy.zhang@biscuitos.cn> wrote:

> On X86 architecture, supports memory type on Page-table, such as
> PTE is PAT/PCD/PWD, which can setup up Memory Type as WC/WB/WT/UC etc.
> Then, Virtual address from userspace or kernel space can map to
> same physical page, if each page table has different memory type,
> then it's confused to have more memory type for same physical page.

Thanks.  Nobody has worked on this code for a long time.  I'll cc a few
folks who may be able to comment.


> On DEVMEM, the 'remap_pfn_range()' keep memory type same on different
> mapping. But if it happen on Page-Fault route, such as code:
> 
>  19 static vm_fault_t vm_fault(struct vm_fault *vmf)
>  20 {
>  21         struct vm_area_struct *vma = vmf->vma;
>  22         unsigned long address = vmf->address;
>  23         struct page *fault_page;
>  24         unsigned long pfn;
>  25         int r;
>  26
>  27         /* Allocate Page as DEVMEM */
>  28         fault_page = alloc_page(GFP_KERNEL);
>  29         if (!fault_page) {
>  30                 printk("ERROR: NO Free Memory from DEVMEM.\n");
>  31                 r = -ENOMEM;
>  32                 goto err_alloc;
>  33         }
>  34         pfn = page_to_pfn(fault_page);
>  35
>  36         /* Clear PAT Attribute */
>  37         pgprot_val(vma->vm_page_prot) &= ~(_PAGE_PCD | _PAGE_PWT | _PAGE_PAT);
>  38
>  39         /* Change Memory Type for Direct-Mapping Area */
>  40         arch_io_reserve_memtype_wc(PFN_PHYS(pfn), PAGE_SIZE);
>  41         pgprot_val(vma->vm_page_prot) |= cachemode2protval(_PAGE_CACHE_MODE_WT);
>  42
>  43         /* Establish pte and INC _mapcount for page */
>  44         vm_flags_set(vma, VM_MIXEDMAP);
>  45         if (vm_insert_page(vma, address, fault_page))
>  46                 return -EAGAIN;
>  47
>  48         /* Add refcount for page */
>  49         atomic_inc(&fault_page->_refcount);
>  50         /* bind fault page */
>  51         vmf->page = fault_page;
>  52
>  53         return 0;
>  54
>  55 err_alloc:
>  56         return r;
>  57 }
>  58
>  59 static const struct vm_operations_struct BiscuitOS_vm_ops = {
>  60         .fault  = vm_fault,
>  61 };
>  62
>  63 static int BiscuitOS_mmap(struct file *filp, struct vm_area_struct *vma)
>  64 {
>  65         /* setup vm_ops */
>  66         vma->vm_ops = &BiscuitOS_vm_ops;
>  67
>  68         return 0;
>  69 }
> 
> If invoke arch_io_reserve_memtype_wc() on Line-40, and modify memory type
> as WC for Direct-Mapping area, and then setup meory type as WT on Line-41,
> then invoke 'vm_insert_page()' to create mapping, so you can see:
> 
>     | <----- Usespace -----> | <- Kernel space -> |
> ----+------+---+-------------+---+---+------------+--
>     |      |   |             |   |   |            |
> ----+------+---+-------------+---+---+------------+--
>            WT|                     |WC
>              o-------o    o--------o
>                    WT|    |WC
>                      V    V
> -------------------+--------+------------------------
>                    | DEVMEM |
> -------------------+--------+------------------------
> Physical Address Space
> 
> For this case, OS should check memory type before mapping on 'vm_insert_page()',
> and keep memory type same, so add check on function:
> 
> 07 int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
> 08                         struct page *page)
> 09 {
> 10         if (addr < vma->vm_start || addr >= vma->vm_end)
> 11                 return -EFAULT;
> 12         if (!page_count(page))
> 13                 return -EINVAL;
> 14         if (!(vma->vm_flags & VM_MIXEDMAP)) {
> 15                 BUG_ON(mmap_read_trylock(vma->vm_mm));
> 16                 BUG_ON(vma->vm_flags & VM_PFNMAP);
> 17                 vm_flags_set(vma, VM_MIXEDMAP);
> 18         }
> 19         if (track_pfn_remap(vma, &vma->vm_page_prot,
> 20                         page_to_pfn(page), addr, PAGE_SIZE))
> 21                 return -EINVAL;
> 22         return insert_page(vma, addr, page, vma->vm_page_prot);
> 23 }
> 
> And line 19 to 21, when mapping different memory type on this route, the
> 'track_pfn_remap()' will notify error and change request as current, e.g.
> 
> x86/PAT: APP:88 map pfn RAM range req write-through for [mem 0x025c1000-0x025c1fff], got write-combining
> 
> And then, we can keep memory type same on Page-fault route for DEVMEM, the end:
> 
>     | <----- Usespace -----> | <- Kernel space -> |
> ----+------+---+-------------+---+---+------------+--
>     |      |   |             |   |   |            |
> ----+------+---+-------------+---+---+------------+--
>            WT|                     |WC
>              o---(X)----o----------o
>                         |WC
>                         V
> -------------------+--------+------------------------
>                    | DEVMEM |
> -------------------+--------+------------------------
> 
> Signed-off-by: buddy.zhang@biscuitos.cn
> ---
>  mm/memory.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index f456f3b5049c..ed3d09f513f1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1989,6 +1989,9 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
>  		BUG_ON(vma->vm_flags & VM_PFNMAP);
>  		vm_flags_set(vma, VM_MIXEDMAP);
>  	}
> +	if (track_pfn_remap(vma, &vma->vm_page_prot,
> +			page_to_pfn(page), addr, PAGE_SIZE))
> +		return -EINVAL;
>  	return insert_page(vma, addr, page, vma->vm_page_prot);
>  }
>  EXPORT_SYMBOL(vm_insert_page);
> -- 
> 2.25.1
>
Re: [PATCH] mm: Keep memory type same on DEVMEM Page-Fault
Posted by Andrew Morton 1 year ago
On Wed, 12 Apr 2023 14:22:36 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Sun, 19 Mar 2023 11:37:50 +0800 "buddy.zhang" <buddy.zhang@biscuitos.cn> wrote:
> 
> > On X86 architecture, supports memory type on Page-table, such as
> > PTE is PAT/PCD/PWD, which can setup up Memory Type as WC/WB/WT/UC etc.
> > Then, Virtual address from userspace or kernel space can map to
> > same physical page, if each page table has different memory type,
> > then it's confused to have more memory type for same physical page.
> 
> Thanks.  Nobody has worked on this code for a long time.  I'll cc a few
> folks who may be able to comment.
> 

Well that didn't go very well.

Buddy, can you please explain what are the user-visible effects of the
bug?  Does the kernel crash?  Memory corruption, etc?  Thanks.
Re: [PATCH] mm: Keep memory type same on DEVMEM Page-Fault
Posted by Andrew Morton 10 months ago
On Sun, 16 Apr 2023 11:39:44 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 12 Apr 2023 14:22:36 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Sun, 19 Mar 2023 11:37:50 +0800 "buddy.zhang" <buddy.zhang@biscuitos.cn> wrote:
> > 
> > > On X86 architecture, supports memory type on Page-table, such as
> > > PTE is PAT/PCD/PWD, which can setup up Memory Type as WC/WB/WT/UC etc.
> > > Then, Virtual address from userspace or kernel space can map to
> > > same physical page, if each page table has different memory type,
> > > then it's confused to have more memory type for same physical page.
> > 
> > Thanks.  Nobody has worked on this code for a long time.  I'll cc a few
> > folks who may be able to comment.
> > 
> 
> Well that didn't go very well.
> 
> Buddy, can you please explain what are the user-visible effects of the
> bug?  Does the kernel crash?  Memory corruption, etc?  Thanks.
> 

Anyone?
Re: [PATCH] mm: Keep memory type same on DEVMEM Page-Fault
Posted by David Hildenbrand 9 months, 3 weeks ago
On 19.06.23 22:14, Andrew Morton wrote:
> On Sun, 16 Apr 2023 11:39:44 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> 
>> On Wed, 12 Apr 2023 14:22:36 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
>>
>>> On Sun, 19 Mar 2023 11:37:50 +0800 "buddy.zhang" <buddy.zhang@biscuitos.cn> wrote:
>>>
>>>> On X86 architecture, supports memory type on Page-table, such as
>>>> PTE is PAT/PCD/PWD, which can setup up Memory Type as WC/WB/WT/UC etc.
>>>> Then, Virtual address from userspace or kernel space can map to
>>>> same physical page, if each page table has different memory type,
>>>> then it's confused to have more memory type for same physical page.
>>>
>>> Thanks.  Nobody has worked on this code for a long time.  I'll cc a few
>>> folks who may be able to comment.
>>>
>>
>> Well that didn't go very well.
>>
>> Buddy, can you please explain what are the user-visible effects of the
>> bug?  Does the kernel crash?  Memory corruption, etc?  Thanks.
>>
> 
> Anyone?
> 

With a clear problem description, ad requested by you, I could be 
motivated to review this and understand the details :)

-- 
Cheers,

David / dhildenb