[PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()

Fabio M. De Francesco posted 3 patches 3 years, 11 months ago
fs/btrfs/inode.c |  6 +++---
fs/btrfs/lzo.c   | 28 ++++++++++++----------------
fs/btrfs/zlib.c  | 40 ++++++++++++++++++++--------------------
3 files changed, 35 insertions(+), 39 deletions(-)
[PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()
Posted by Fabio M. De Francesco 3 years, 11 months ago
This is the first series of patches aimed towards the conversion of Btrfs
filesystem from the use of kmap() to kmap_local_page().

The use of kmap() is being deprecated in favor of kmap_local_page() where
it is feasible. With kmap_local_page() the mapping is per thread, CPU
local and not globally visible.

Therefore, use kmap_local_page() / kunmap_local() in Btrfs wherever the
mappings are per thread and not globally visible.

Tested on QEMU + KVM 32 bits VM with 4GB of RAM and HIGHMEM64G enabled.

tweed32:~ # uname -a
Linux tweed32 5.18.0-torvalds-debug-x86_32+ #2 SMP PREEMPT_DYNAMIC Tue \
May 31 15:20:07 CEST 2022 i686 athlon i386 GNU/Linux

tweed32:~ # btrfs check -p ~zoek/dev/btrfs.file
Opening filesystem to check...
Checking filesystem on /home/zoek/dev/btrfs.file
UUID: 897d65c5-1167-45b4-b811-2bfe74a320ca
[1/7] checking root items                      (0:00:00 elapsed, 1774 items checked)
[2/7] checking extents                         (0:00:00 elapsed, 135 items checked)
[3/7] checking free space tree                 (0:00:00 elapsed, 4 items checked)
[4/7] checking fs roots                        (0:00:00 elapsed, 104 items checked)
[5/7] checking csums (without verifying data)  (0:00:00 elapsed, 205 items checked)
[6/7] checking root refs                       (0:00:00 elapsed, 3 items checked)
[7/7] checking quota groups skipped (not enabled on this FS)
found 47394816 bytes used, no error found
total csum bytes: 44268
total tree bytes: 2064384
total fs tree bytes: 1720320
total extent tree bytes: 180224
btree space waste bytes: 465350
file data blocks allocated: 45330432
 referenced 45330432

Fabio M. De Francesco (3):
  btrfs: Replace kmap() with kmap_local_page() in inode.c
  btrfs: Replace kmap() with kmap_local_page() in lzo.c
  btrfs: Replace kmap() with kmap_local_page() in zlib.c

 fs/btrfs/inode.c |  6 +++---
 fs/btrfs/lzo.c   | 28 ++++++++++++----------------
 fs/btrfs/zlib.c  | 40 ++++++++++++++++++++--------------------
 3 files changed, 35 insertions(+), 39 deletions(-)

-- 
2.36.1
Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()
Posted by Christoph Hellwig 3 years, 11 months ago
Turns out that while this looks good, it actually crashes when
running xfstests.  I think this is due to the fact that kmap sets
the page address, which kmap_local_page does not.

btrfs/150 1s ... [  168.252943] run fstests btrfs/150 at 2022-06-02 15:17:11
[  169.462292] BTRFS info (device vdb): flagging fs with big metadata feature
[  169.463728] BTRFS info (device vdb): disk space caching is enabled
[  169.464953] BTRFS info (device vdb): has skinny extents
[  170.596218] BTRFS: device fsid 37c6bae1-d3e5-47f8-87d5-87cd7240a1b4
devid 1 transid 5 /dev)
[  170.599471] BTRFS: device fsid 37c6bae1-d3e5-47f8-87d5-87cd7240a1b4 devid 2 transid 5 /dev)
[  170.657170] BTRFS info (device vdc): flagging fs with big metadata feature
[  170.659509] BTRFS info (device vdc): use zlib compression, level 3
[  170.661190] BTRFS info (device vdc): disk space caching is enabled
[  170.670706] BTRFS info (device vdc): has skinny extents
[  170.714181] BTRFS info (device vdc): checking UUID tree
[  170.735058] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  170.736478] #PF: supervisor read access in kernel mode
[  170.737457] #PF: error_code(0x0000) - not-present page
[  170.738529] PGD 0 P4D 0 
[  170.739211] Oops: 0000 [#1] PREEMPT SMP PTI
[  170.740101] CPU: 0 PID: 43 Comm: kworker/u4:3 Not tainted 5.18.0-rc7+ #1539
[  170.741478] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[  170.743246] Workqueue: btrfs-delalloc btrfs_work_helper
[  170.744282] RIP: 0010:zlib_compress_pages+0x128/0x670
[  170.745346] Code: 00 00 00 16 00 00 48 01 e8 31 ed 48 c1 f8 06 48 c1 e0 0c 48 01 f8 49 89 0
[  170.749042] RSP: 0018:ffffc9000038bc70 EFLAGS: 00010286
[  170.750037] RAX: 0000000000000001 RBX: ffffc9000038bdb8 RCX: 0000000000000001
[  170.751351] RDX: 0000000000002000 RSI: ffffffff82f532fb RDI: ffff888000000000
[  170.752695] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
[  170.754106] R10: 0000000000000000 R11: ffff8881039a5b30 R12: ffff888107befb48
[  170.755449] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[  170.756922] FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
[  170.758642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  170.759714] CR2: 0000000000000008 CR3: 000000010ab60000 CR4: 00000000000006f0
[  170.761082] Call Trace:
[  170.761553]  <TASK>
[  170.761968]  ? _raw_spin_unlock+0x24/0x40
[  170.762776]  btrfs_compress_pages+0xda/0x120
[  170.763682]  compress_file_range+0x3b9/0x840
[  170.764570]  async_cow_start+0xd/0x30
[  170.765308]  ? submit_compressed_extents+0x3c0/0x3c0
[  170.766241]  btrfs_work_helper+0xf5/0x3f0
[  170.767009]  ? lock_is_held_type+0xe2/0x140
[  170.767817]  process_one_work+0x239/0x550
[  170.768633]  ? process_one_work+0x550/0x550
[  170.769447]  worker_thread+0x4d/0x3a0
[  170.770210]  ? process_one_work+0x550/0x550
[  170.771019]  kthread+0xe2/0x110
[  170.771623]  ? kthread_complete_and_exit+0x20/0x20
[  170.772697]  ret_from_fork+0x22/0x30
[  170.773454]  </TASK>
Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()
Posted by David Sterba 3 years, 11 months ago
On Thu, Jun 02, 2022 at 08:20:08AM -0700, Christoph Hellwig wrote:
> Turns out that while this looks good, it actually crashes when
> running xfstests.  I think this is due to the fact that kmap sets
> the page address, which kmap_local_page does not.
> 
> btrfs/150 1s ... [  168.252943] run fstests btrfs/150 at 2022-06-02 15:17:11
> [  169.462292] BTRFS info (device vdb): flagging fs with big metadata feature
> [  169.463728] BTRFS info (device vdb): disk space caching is enabled
> [  169.464953] BTRFS info (device vdb): has skinny extents
> [  170.596218] BTRFS: device fsid 37c6bae1-d3e5-47f8-87d5-87cd7240a1b4
> devid 1 transid 5 /dev)
> [  170.599471] BTRFS: device fsid 37c6bae1-d3e5-47f8-87d5-87cd7240a1b4 devid 2 transid 5 /dev)
> [  170.657170] BTRFS info (device vdc): flagging fs with big metadata feature
> [  170.659509] BTRFS info (device vdc): use zlib compression, level 3
> [  170.661190] BTRFS info (device vdc): disk space caching is enabled
> [  170.670706] BTRFS info (device vdc): has skinny extents
> [  170.714181] BTRFS info (device vdc): checking UUID tree
> [  170.735058] BUG: kernel NULL pointer dereference, address: 0000000000000008
> [  170.736478] #PF: supervisor read access in kernel mode
> [  170.737457] #PF: error_code(0x0000) - not-present page
> [  170.738529] PGD 0 P4D 0 
> [  170.739211] Oops: 0000 [#1] PREEMPT SMP PTI
> [  170.740101] CPU: 0 PID: 43 Comm: kworker/u4:3 Not tainted 5.18.0-rc7+ #1539
> [  170.741478] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> [  170.743246] Workqueue: btrfs-delalloc btrfs_work_helper
> [  170.744282] RIP: 0010:zlib_compress_pages+0x128/0x670

I've just hit the crash too, so I've removed the patches from misc-next
until there's fixed version.
Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()
Posted by Fabio M. De Francesco 3 years, 10 months ago
On giovedì 2 giugno 2022 18:28:40 CEST David Sterba wrote:
> On Thu, Jun 02, 2022 at 08:20:08AM -0700, Christoph Hellwig wrote:
> > Turns out that while this looks good, it actually crashes when
> > running xfstests.  I think this is due to the fact that kmap sets
> > the page address, which kmap_local_page does not.
> > 
> > btrfs/150 1s ... [  168.252943] run fstests btrfs/150 at 2022-06-02 
15:17:11
> > [  169.462292] BTRFS info (device vdb): flagging fs with big metadata 
feature
> > [  169.463728] BTRFS info (device vdb): disk space caching is enabled
> > [  169.464953] BTRFS info (device vdb): has skinny extents
> > [  170.596218] BTRFS: device fsid 37c6bae1-d3e5-47f8-87d5-87cd7240a1b4
> > devid 1 transid 5 /dev)
> > [  170.599471] BTRFS: device fsid 37c6bae1-d3e5-47f8-87d5-87cd7240a1b4 
devid 2 transid 5 /dev)
> > [  170.657170] BTRFS info (device vdc): flagging fs with big metadata 
feature
> > [  170.659509] BTRFS info (device vdc): use zlib compression, level 3
> > [  170.661190] BTRFS info (device vdc): disk space caching is enabled
> > [  170.670706] BTRFS info (device vdc): has skinny extents
> > [  170.714181] BTRFS info (device vdc): checking UUID tree
> > [  170.735058] BUG: kernel NULL pointer dereference, address: 
0000000000000008
> > [  170.736478] #PF: supervisor read access in kernel mode
> > [  170.737457] #PF: error_code(0x0000) - not-present page
> > [  170.738529] PGD 0 P4D 0 
> > [  170.739211] Oops: 0000 [#1] PREEMPT SMP PTI
> > [  170.740101] CPU: 0 PID: 43 Comm: kworker/u4:3 Not tainted 5.18.0-
rc7+ #1539
> > [  170.741478] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.14.0-2 04/01/2014
> > [  170.743246] Workqueue: btrfs-delalloc btrfs_work_helper
> > [  170.744282] RIP: 0010:zlib_compress_pages+0x128/0x670
> 
> I've just hit the crash too, so I've removed the patches from misc-next
> until there's fixed version.
> 
Finally I've been able to run xfstests on a QEMU + KVM 32 bits VM. Since I 
have very little experience with filesystems it was a bit hard to setup and 
run.

I can confirm that the problems are only from conversions in patch 3/3. 

Since I've been spending long time to setup xfstests and make it run, I 
haven't yet had time to address the issued in patch 3/3 and making the 
further changes I've been asked for.

Can you please start with taking only patches 1/3 and 2/3 and dropping 3/3?
I'd really appreciate it because, if you can, I'll see that part of my work 
has already been helpful somewhat and have no need to carry those patches 
on to the next series :-) 

The next series will contain everything it has been left out, so that it 
will complete the conversions to local mappings in fs/btrfs. However I will  
need some days more to be done with this task.

I'd like to share the output of xfstests on "compress" group + test 150 
(which is not in that group, although it is the test that both you and 
Christoph ran and which seems to be the only one failing).
 
tweed32:/usr/lib/xfstests # ./check -g compress
FSTYP         -- btrfs
PLATFORM      -- Linux/i686 tweed32 5.18.0-torvalds-debug-x86_32+ #7 SMP 
PREEMPT_DYNAMIC Sat Jun 4 23:47:27 CEST 2022
MKFS_OPTIONS  -- /dev/loop1
MOUNT_OPTIONS -- /dev/loop1 /mnt/scratch

btrfs/024 2s ...  6s
btrfs/026 5s ...  8s
btrfs/037 3s ...  5s
btrfs/038 3s ...  6s
btrfs/041 3s ...  5s
btrfs/062        39s
btrfs/063        21s
btrfs/067        34s
btrfs/068        12s
btrfs/070       [not run] btrfs and this test needs 5 or more disks in 
SCRATCH_DEV_POOL
btrfs/071       [not run] btrfs and this test needs 5 or more disks in 
SCRATCH_DEV_POOL
btrfs/072        37s
btrfs/073        22s
btrfs/074        39s
btrfs/076 2s ...  2s
btrfs/103 3s ...  3s
btrfs/106 3s ...  3s
btrfs/109 2s ...  3s
btrfs/113 3s ...  3s
btrfs/138 47s ...  48s
btrfs/149 3s ...  3s
btrfs/183 3s ...  3s
btrfs/205 3s ...  4s
btrfs/234 4s ...  5s
btrfs/246 3s ...  2s
btrfs/251 2s ...  3s
Ran: btrfs/024 btrfs/026 btrfs/037 btrfs/038 btrfs/041 btrfs/062 btrfs/063 
btrfs/067 btrfs/068 btrfs/070 btrfs/071 btrfs/072 btrfs/073 btrfs/074 
btrfs/076 btrfs/103 btrfs/106 btrfs/109 btrfs/113 btrfs/138 btrfs/149 
btrfs/183 btrfs/205 btrfs/234 btrfs/246 btrfs/251
Not run: btrfs/070 btrfs/071
Passed all 26 tests

tweed32:/usr/lib/xfstests # ./check tests/btrfs/150
FSTYP         -- btrfs
PLATFORM      -- Linux/i686 tweed32 5.18.0-torvalds-debug-x86_32+ #7 SMP 
PREEMPT_DYNAMIC Sat Jun 4 23:47:27 CEST 2022
MKFS_OPTIONS  -- /dev/loop1
MOUNT_OPTIONS -- /dev/loop1 /mnt/scratch

btrfs/150 2s ...  2s
Ran: btrfs/150
Passed all 1 tests

tweed32:/usr/lib/xfstests # cat results/btrfs/150.dmesg
[ 2461.321352] run fstests btrfs/150 at 2022-06-05 16:11:18
[ 2461.718527] BTRFS: device fsid 9337d043-0ca4-4890-b294-7e9505ee13b9 
devid 1 transid 6 /dev/loop1 scanned by systemd-udevd (31060)
[ 2461.721721] BTRFS: device fsid 9337d043-0ca4-4890-b294-7e9505ee13b9 
devid 2 transid 6 /dev/loop2 scanned by mkfs.btrfs (31508)
[ 2461.730540] BTRFS info (device loop1): flagging fs with big metadata 
feature
[ 2461.730543] BTRFS info (device loop1): use zlib compression, level 3
[ 2461.730544] BTRFS info (device loop1): using free space tree
[ 2461.730545] BTRFS info (device loop1): has skinny extents
[ 2461.739677] BTRFS info (device loop1): checking UUID tree
[ 2461.916511] BTRFS info (device loop1): flagging fs with big metadata 
feature
[ 2461.916515] BTRFS info (device loop1): using free space tree
[ 2461.916516] BTRFS info (device loop1): has skinny extents

Please let me know if you require further tests on patches 1/3 and 2/3 and 
on the next series.

Thanks,

Fabio

P.S.: The rest of this message has some details about the kernel and the 
CPU and memory of the VM these tests have been run on. I suppose these 
stats are not so much interesting, so feel free to skip them :-)

tweed32:/usr/lib/xfstests # uname -a
Linux tweed32 5.18.0-torvalds-debug-x86_32+ #7 SMP PREEMPT_DYNAMIC Sat Jun 
4 23:47:27 CEST 2022 i686 athlon i386 GNU/Linux

tweed32:/usr/lib/xfstests # cat /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 6
model name      : QEMU Virtual CPU version 2.5+
stepping        : 3
microcode       : 0x1000065
cpu MHz         : 3699.978
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fdiv_bug        : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 4
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic sep pge cmov pat mmx 
fxsr sse sse2 cpuid tsc_known_freq pni x2apic hypervisor vmmcall
bugs            : fxsave_leak sysret_ss_attrs spectre_v1 spectre_v2 
spec_store_bypass
bogomips        : 7402.28
clflush size    : 32
cache_alignment : 32
address sizes   : 36 bits physical, 32 bits virtual
power management:

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 6
model name      : QEMU Virtual CPU version 2.5+

OK, no need to go on with the other CPUs. 
The total amount of virtual CPU is 4. 

tweed32:/usr/lib/xfstests # cat /proc/meminfo 
MemTotal:        4011272 kB
MemFree:         2949956 kB
MemAvailable:    3148368 kB
Buffers:           15700 kB
Cached:           298640 kB
SwapCached:            0 kB
Active:           115992 kB
Inactive:         727008 kB
Active(anon):      21300 kB
Inactive(anon):   538392 kB
Active(file):      94692 kB
Inactive(file):   188616 kB
Unevictable:        3088 kB
Mlocked:               0 kB
HighTotal:       3286900 kB <- Highmem looks enabled properly
HighFree:        2436812 kB
LowTotal:         724372 kB
LowFree:          513144 kB
SwapTotal:       2098152 kB
SwapFree:        2094536 kB
Dirty:               188 kB
Writeback:             0 kB
AnonPages:        531784 kB
Mapped:           218340 kB
Shmem:             31032 kB
KReclaimable:      63896 kB
Slab:             181828 kB
SReclaimable:      63896 kB
SUnreclaim:       117932 kB
KernelStack:        4280 kB
PageTables:        15992 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4103788 kB
Committed_AS:    4012104 kB
VmallocTotal:     122880 kB
VmallocUsed:        1468 kB
VmallocChunk:          0 kB
Percpu:             6304 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      907256 kB
DirectMap2M:           0 kB
Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()
Posted by David Sterba 3 years, 10 months ago
On Sun, Jun 05, 2022 at 05:11:48PM +0200, Fabio M. De Francesco wrote:
> On giovedì 2 giugno 2022 18:28:40 CEST David Sterba wrote:
> > On Thu, Jun 02, 2022 at 08:20:08AM -0700, Christoph Hellwig wrote:
> > I've just hit the crash too, so I've removed the patches from misc-next
> > until there's fixed version.
> > 
> Finally I've been able to run xfstests on a QEMU + KVM 32 bits VM. Since I 
> have very little experience with filesystems it was a bit hard to setup and 
> run.
> 
> I can confirm that the problems are only from conversions in patch 3/3. 

Thanks.

> Since I've been spending long time to setup xfstests and make it run, I 
> haven't yet had time to address the issued in patch 3/3 and making the 
> further changes I've been asked for.
> 
> Can you please start with taking only patches 1/3 and 2/3 and dropping 3/3?
> I'd really appreciate it because, if you can, I'll see that part of my work 
> has already been helpful somewhat and have no need to carry those patches 
> on to the next series :-) 

Yes I can pick 1 and 2. Removing the whole series is needed in case it
crashes tests as it affects everybody.
Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()
Posted by Fabio M. De Francesco 3 years, 10 months ago
On lunedì 6 giugno 2022 12:32:44 CEST David Sterba wrote:
>
> [snip] 
> 
> Yes I can pick 1 and 2. Removing the whole series is needed in case it
> crashes tests as it affects everybody.
> 
Thanks!

If everything goes smoothly, you will receive the remaining conversions 
within the next days. I still have to make several of them  (both from 
kmap() as well as from kmap_local()) and the ones already done and placed 
in my queue still need to be properly tested.

Again, thanks,

Fabio
Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()
Posted by Ira Weiny 3 years, 11 months ago
On Thu, Jun 02, 2022 at 08:20:08AM -0700, Christoph Hellwig wrote:
> Turns out that while this looks good, it actually crashes when
> running xfstests.  I think this is due to the fact that kmap sets
> the page address, which kmap_local_page does not.

:-(

I know that Fabio is working on getting xfstests set up and we have been
discussing the use of page address in the fs/btrfs code.

Stay tuned,
Ira

> 
> btrfs/150 1s ... [  168.252943] run fstests btrfs/150 at 2022-06-02 15:17:11
> [  169.462292] BTRFS info (device vdb): flagging fs with big metadata feature
> [  169.463728] BTRFS info (device vdb): disk space caching is enabled
> [  169.464953] BTRFS info (device vdb): has skinny extents
> [  170.596218] BTRFS: device fsid 37c6bae1-d3e5-47f8-87d5-87cd7240a1b4
> devid 1 transid 5 /dev)
> [  170.599471] BTRFS: device fsid 37c6bae1-d3e5-47f8-87d5-87cd7240a1b4 devid 2 transid 5 /dev)
> [  170.657170] BTRFS info (device vdc): flagging fs with big metadata feature
> [  170.659509] BTRFS info (device vdc): use zlib compression, level 3
> [  170.661190] BTRFS info (device vdc): disk space caching is enabled
> [  170.670706] BTRFS info (device vdc): has skinny extents
> [  170.714181] BTRFS info (device vdc): checking UUID tree
> [  170.735058] BUG: kernel NULL pointer dereference, address: 0000000000000008
> [  170.736478] #PF: supervisor read access in kernel mode
> [  170.737457] #PF: error_code(0x0000) - not-present page
> [  170.738529] PGD 0 P4D 0 
> [  170.739211] Oops: 0000 [#1] PREEMPT SMP PTI
> [  170.740101] CPU: 0 PID: 43 Comm: kworker/u4:3 Not tainted 5.18.0-rc7+ #1539
> [  170.741478] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> [  170.743246] Workqueue: btrfs-delalloc btrfs_work_helper
> [  170.744282] RIP: 0010:zlib_compress_pages+0x128/0x670
> [  170.745346] Code: 00 00 00 16 00 00 48 01 e8 31 ed 48 c1 f8 06 48 c1 e0 0c 48 01 f8 49 89 0
> [  170.749042] RSP: 0018:ffffc9000038bc70 EFLAGS: 00010286
> [  170.750037] RAX: 0000000000000001 RBX: ffffc9000038bdb8 RCX: 0000000000000001
> [  170.751351] RDX: 0000000000002000 RSI: ffffffff82f532fb RDI: ffff888000000000
> [  170.752695] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
> [  170.754106] R10: 0000000000000000 R11: ffff8881039a5b30 R12: ffff888107befb48
> [  170.755449] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
> [  170.756922] FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
> [  170.758642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  170.759714] CR2: 0000000000000008 CR3: 000000010ab60000 CR4: 00000000000006f0
> [  170.761082] Call Trace:
> [  170.761553]  <TASK>
> [  170.761968]  ? _raw_spin_unlock+0x24/0x40
> [  170.762776]  btrfs_compress_pages+0xda/0x120
> [  170.763682]  compress_file_range+0x3b9/0x840
> [  170.764570]  async_cow_start+0xd/0x30
> [  170.765308]  ? submit_compressed_extents+0x3c0/0x3c0
> [  170.766241]  btrfs_work_helper+0xf5/0x3f0
> [  170.767009]  ? lock_is_held_type+0xe2/0x140
> [  170.767817]  process_one_work+0x239/0x550
> [  170.768633]  ? process_one_work+0x550/0x550
> [  170.769447]  worker_thread+0x4d/0x3a0
> [  170.770210]  ? process_one_work+0x550/0x550
> [  170.771019]  kthread+0xe2/0x110
> [  170.771623]  ? kthread_complete_and_exit+0x20/0x20
> [  170.772697]  ret_from_fork+0x22/0x30
> [  170.773454]  </TASK>
>
Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()
Posted by David Sterba 3 years, 11 months ago
On Tue, May 31, 2022 at 04:53:32PM +0200, Fabio M. De Francesco wrote:
> This is the first series of patches aimed towards the conversion of Btrfs
> filesystem from the use of kmap() to kmap_local_page().

We've already had patches converting kmaps and you're changing the last
ones, so this is could be the last series, with two exceptions.

1) You've changed lzo.c and zlib. but the same kmap/kunmap pattern is
   used in zstd.c.

2) kmap_atomic in inode.c, so that's technically not kmap but it's said
   to be deprecated and also can be replaced by kmap_local_page. The
   context in check_compressed_csum is atomic (in end io) so the kmap
   hasn't been used there.

> tweed32:~ # btrfs check -p ~zoek/dev/btrfs.file

That won't verify if the kmap conversion is OK, it's a runtime thing
while 'check' verifies the data on device. Have you run any kind of
stress test with enabled compression before running the check?

> Opening filesystem to check...
> Checking filesystem on /home/zoek/dev/btrfs.file
> UUID: 897d65c5-1167-45b4-b811-2bfe74a320ca
> [1/7] checking root items                      (0:00:00 elapsed, 1774 items checked)
> [2/7] checking extents                         (0:00:00 elapsed, 135 items checked)
> [3/7] checking free space tree                 (0:00:00 elapsed, 4 items checked)
> [4/7] checking fs roots                        (0:00:00 elapsed, 104 items checked)
> [5/7] checking csums (without verifying data)  (0:00:00 elapsed, 205 items checked)
> [6/7] checking root refs                       (0:00:00 elapsed, 3 items checked)
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 47394816 bytes used, no error found
> total csum bytes: 44268
> total tree bytes: 2064384
> total fs tree bytes: 1720320
> total extent tree bytes: 180224
> btree space waste bytes: 465350
> file data blocks allocated: 45330432
>  referenced 45330432
> 
> Fabio M. De Francesco (3):
>   btrfs: Replace kmap() with kmap_local_page() in inode.c
>   btrfs: Replace kmap() with kmap_local_page() in lzo.c
>   btrfs: Replace kmap() with kmap_local_page() in zlib.c

Please send patches converting zstd.c and the remaining kmap_atomic
usage in inode.c, otherwise the 3 patches are now in misc-next, thanks.
Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()
Posted by Fabio M. De Francesco 3 years, 11 months ago
On mercoledì 1 giugno 2022 15:25:45 CEST David Sterba wrote:
> On Tue, May 31, 2022 at 04:53:32PM +0200, Fabio M. De Francesco wrote:
> > This is the first series of patches aimed towards the conversion of 
Btrfs
> > filesystem from the use of kmap() to kmap_local_page().
> 
> We've already had patches converting kmaps and you're changing the last
> ones, so this is could be the last series, with two exceptions.
> 
> 1) You've changed lzo.c and zlib.

and inode.c.

> but the same kmap/kunmap pattern is
>    used in zstd.c.

I thought that these mappings I had worked on were safe to convert.

Instead I wasn't sure that the others I left untouched in zstd.c could so 
easily and mechanically be converted without prior code inspection and 
proper tests.

I did see those in zstd.c, but I decided to postpone those conversions 
because I'm not yet sure if and how the virtual addresses we get currently 
from kmap() are re-used.

I saw assignments like "workspace->in_buf.src = kmap(in_page);". Is 
"in_buf" re-used across different contexts? (I see that Btrfs uses a dozen 
workqueues). 

I also see that kunmap() is called in the same functions that assign 
virtual addresses to "in_buf" and this makes me think that those addresses 
are not handled across contexts, otherwise you already have bugs. But may 
very well be that somewhere in the calls chain the code flushes workqueues 
before returning to the callers (it would mean that when kunmap() is called 
we can be sure that those workqueues are done with using those addresses). 

Furthermore, what can you say about the tens of page_address() spread 
across the whole fs/btrfs? 

If those page_address() take pages from HIGHMEM which were mapped using 
kmap_local_page(), the filesystem will oops the kernel...

About this issue, please see a bug fix ("[PATCH v2] fs/ext2: Avoid 
page_address on pages returned by ext2_get_page") at 
https://lore.kernel.org/lkml/
20210714185448.8707ac239e9f12b3a7f5b9f9@urjc.es/#r

Do they only use physical memory from ZONE_NORMAL?

Can you please confirm that it is safe to convert those left kmap() to 
kmap_local_page() and that those page_address() are safe?

If so, I have no problems to convert what I had left for later. Otherwise 
I'll need to carefully inspect the code.

> 2) kmap_atomic in inode.c, so that's technically not kmap but it's said
>    to be deprecated and also can be replaced by kmap_local_page. The
>    context in check_compressed_csum is atomic (in end io) so the kmap
>    hasn't been used there.

I was not 100% sure about the preemption requirements for those call sites 
so I had not converted them yet. Are you saying that there is no need for 
preempt_disable() at the following sites?

# git grep kmap_atomic fs/btrfs
fs/btrfs/compression.c:                 kaddr = kmap_atomic(page);
fs/btrfs/inode.c:                       kaddr = kmap_atomic(cpage);
fs/btrfs/inode.c:               kaddr = kmap_atomic(page);
fs/btrfs/inode.c:       kaddr = kmap_atomic(page);

> > tweed32:~ # btrfs check -p ~zoek/dev/btrfs.file
> 
> That won't verify if the kmap conversion is OK, it's a runtime thing
> while 'check' verifies the data on device. Have you run any kind of
> stress test with enabled compression before running the check?

Ah, thanks. I didn't know this thing.

I installed (x)fstests a couple of days ago. I think it helps to test this 
and other conversions to local mappings, but I haven't yet had time to 
learn how to use it.

Does (x)fstests cover the compression code? Are there any specific tests I 
should target?

> Please send patches converting zstd.c and the remaining kmap_atomic
> usage in inode.c, otherwise the 3 patches are now in misc-next, thanks.

New version is required in any case because LKP reported four uninitialized 
variables in patch 3/3. 

I'm just reading the reports that both you and Christoph hit. At first 
sight they seem to be due to page_address() calls (but I may be wrong, just 
had few minutes to reply so late) :( 

I was wrong in thinking that these call sites could be converted safely. 
I'll do the tests before posting v2.

Thanks,

Fabio

P.S.: I've just read a message from Ira Weiny about something he saw in the 
unmapping order in zstd_compress_pages(). We must think how to address 
mapping /unmapping order properly.
Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()
Posted by Ira Weiny 3 years, 11 months ago
On Wed, Jun 01, 2022 at 03:25:45PM +0200, David Sterba wrote:
> On Tue, May 31, 2022 at 04:53:32PM +0200, Fabio M. De Francesco wrote:
> > This is the first series of patches aimed towards the conversion of Btrfs
> > filesystem from the use of kmap() to kmap_local_page().
> 
> We've already had patches converting kmaps and you're changing the last
> ones, so this is could be the last series, with two exceptions.
> 
> 1) You've changed lzo.c and zlib. but the same kmap/kunmap pattern is
>    used in zstd.c.

I checked out zstd.c and one of the issues there is the way that the input
workspace is mapped page by page while iterating those pages.

This got me thinking about what Willy said at LSFmm concerning something like
kmap_local_range().  Mapping more than 1 page at a time could save some
unmap/remap of output pages required for kmap_local_page() ordering.

Unfortunately, I think the length of the input is probably to long in many
cases.  And some remapping may still be required.

Cc: Willy

As an aside, Willy, I'm thinking that a kmap_local_range() should return a
folio in some way.  Would you agree?

Ira
Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()
Posted by Matthew Wilcox 3 years, 11 months ago
On Thu, Jun 02, 2022 at 09:22:15AM -0700, Ira Weiny wrote:
> On Wed, Jun 01, 2022 at 03:25:45PM +0200, David Sterba wrote:
> > On Tue, May 31, 2022 at 04:53:32PM +0200, Fabio M. De Francesco wrote:
> > > This is the first series of patches aimed towards the conversion of Btrfs
> > > filesystem from the use of kmap() to kmap_local_page().
> > 
> > We've already had patches converting kmaps and you're changing the last
> > ones, so this is could be the last series, with two exceptions.
> > 
> > 1) You've changed lzo.c and zlib. but the same kmap/kunmap pattern is
> >    used in zstd.c.
> 
> I checked out zstd.c and one of the issues there is the way that the input
> workspace is mapped page by page while iterating those pages.
> 
> This got me thinking about what Willy said at LSFmm concerning something like
> kmap_local_range().  Mapping more than 1 page at a time could save some
> unmap/remap of output pages required for kmap_local_page() ordering.

Umm ... Not entirely sure what I said, but it'd be really hard to kmap
multiple pages with the current PAE implementation.  I've steered away
from doing that for now, and kmap_local_folio() just guarantees the
page that the offset lands in is mapped.

I don't think the right answer is having a kmap_folio() that will map
the entire folio.  I'd be more tempted to add vmap_folio() for that.
My understanding is that PAE systems have more address space available
for vmap than they do for kmap.

> Unfortunately, I think the length of the input is probably to long in many
> cases.  And some remapping may still be required.
> 
> Cc: Willy
> 
> As an aside, Willy, I'm thinking that a kmap_local_range() should return a
> folio in some way.  Would you agree?

I imagine it taking a folio to describe the range that's being accessed.
But maybe it should be a phys_addr_t?  I tend to prefer phys_addr_t over
pfn + offset as it's more compact on 64-bit systems and the same on
32-bit systems.