[v1] DEPT(Dependency Tracker)

[PATCH 00/16] DEPT(Dependency Tracker)

Posted by Byungchul Park 4 years, 4 months ago

Hi Linus and folks,

I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock(?) acquisition order to try to
cover all synchonization machanisms. It's done on v5.17-rc1 tag.

https://github.com/lgebyungchulpark/linux-dept/commits/dept1.12_on_v5.17-rc1

Benifit:

	0. Works with all lock primitives.
	1. Works with wait_for_completion()/complete().
	2. Works with 'wait' on PG_locked.
	3. Works with 'wait' on PG_writeback.
	4. Works with swait/wakeup.
	5. Works with waitqueue.
	6. Multiple reports are allowed.
	7. Deduplication control on multiple reports.
	8. Withstand false positives thanks to 6.
	9. Easy to tag any wait/event.

Future work:

	0. To make it more stable.
	1. To separates Dept from Lockdep.
	2. To improves performance in terms of time and space.
	3. To use Dept as a dependency engine for Lockdep.
	4. To add any missing tags of wait/event in the kernel.
	5. To deduplicate stack trace.

I've got several reports from the tool. Some of them look like false
alarms and some others look like real deadlock possibility. Because of
my unfamiliarity of the domain, it's hard to confirm if it's a real one.
Let me add the reports on this email thread.

How to interpret the report is:

	1. E(event) in each context cannot be triggered because of the
	   W(wait) that cannot be woken.
	2. The stack trace helping find the problematic code is located
	   in each conext's detail.

Changes from RFC:

	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
	2. Use try version at lockdep_acquire_cpus_lock() annotation.
	3. Distinguish each syscall context from another.

Thanks,
Byungchul

Byungchul Park (16):
  llist: Move llist_{head,node} definition to types.h
  dept: Implement Dept(Dependency Tracker)
  dept: Embed Dept data in Lockdep
  dept: Apply Dept to spinlock
  dept: Apply Dept to mutex families
  dept: Apply Dept to rwlock
  dept: Apply Dept to wait_for_completion()/complete()
  dept: Apply Dept to seqlock
  dept: Apply Dept to rwsem
  dept: Add proc knobs to show stats and dependency graph
  dept: Introduce split map concept and new APIs for them
  dept: Apply Dept to wait/event of PG_{locked,writeback}
  dept: Apply SDT to swait
  dept: Apply SDT to wait(waitqueue)
  locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
  dept: Distinguish each syscall context from another

 include/linux/completion.h         |   42 +-
 include/linux/dept.h               |  523 +++++++
 include/linux/dept_page.h          |   78 ++
 include/linux/dept_sdt.h           |   62 +
 include/linux/hardirq.h            |    3 +
 include/linux/irqflags.h           |   33 +-
 include/linux/llist.h              |    8 -
 include/linux/lockdep.h            |  156 ++-
 include/linux/lockdep_types.h      |    3 +
 include/linux/mutex.h              |   31 +
 include/linux/page-flags.h         |   45 +-
 include/linux/pagemap.h            |    7 +-
 include/linux/percpu-rwsem.h       |   10 +-
 include/linux/rtmutex.h            |    7 +
 include/linux/rwlock.h             |   48 +
 include/linux/rwlock_api_smp.h     |    8 +-
 include/linux/rwlock_types.h       |    7 +
 include/linux/rwsem.h              |   31 +
 include/linux/sched.h              |    7 +
 include/linux/seqlock.h            |   59 +-
 include/linux/spinlock.h           |   24 +
 include/linux/spinlock_types_raw.h |   13 +
 include/linux/swait.h              |    4 +
 include/linux/types.h              |    8 +
 include/linux/wait.h               |    6 +-
 init/init_task.c                   |    2 +
 init/main.c                        |    4 +
 kernel/Makefile                    |    1 +
 kernel/cpu.c                       |    2 +-
 kernel/dependency/Makefile         |    5 +
 kernel/dependency/dept.c           | 2702 ++++++++++++++++++++++++++++++++++++
 kernel/dependency/dept_hash.h      |   10 +
 kernel/dependency/dept_internal.h  |   26 +
 kernel/dependency/dept_object.h    |   13 +
 kernel/dependency/dept_proc.c      |   93 ++
 kernel/entry/common.c              |    3 +
 kernel/exit.c                      |    1 +
 kernel/fork.c                      |    2 +
 kernel/locking/lockdep.c           |   12 +-
 kernel/module.c                    |    2 +
 kernel/sched/completion.c          |   12 +-
 kernel/sched/core.c                |    3 +
 kernel/sched/swait.c               |   10 +
 kernel/sched/wait.c                |   16 +
 kernel/softirq.c                   |    6 +-
 kernel/trace/trace_preemptirq.c    |   19 +-
 lib/Kconfig.debug                  |   21 +
 mm/filemap.c                       |   68 +
 mm/page_ext.c                      |    5 +
 49 files changed, 4204 insertions(+), 57 deletions(-)
 create mode 100644 include/linux/dept.h
 create mode 100644 include/linux/dept_page.h
 create mode 100644 include/linux/dept_sdt.h
 create mode 100644 kernel/dependency/Makefile
 create mode 100644 kernel/dependency/dept.c
 create mode 100644 kernel/dependency/dept_hash.h
 create mode 100644 kernel/dependency/dept_internal.h
 create mode 100644 kernel/dependency/dept_object.h
 create mode 100644 kernel/dependency/dept_proc.c

-- 
1.9.1

Re: [PATCH 00/16] DEPT(Dependency Tracker)

Posted by Theodore Ts'o 4 years, 4 months ago

On Thu, Feb 17, 2022 at 07:57:36PM +0900, Byungchul Park wrote:
> 
> I've got several reports from the tool. Some of them look like false
> alarms and some others look like real deadlock possibility. Because of
> my unfamiliarity of the domain, it's hard to confirm if it's a real one.
> Let me add the reports on this email thread.

The problem is we have so many potentially invalid, or
so-rare-as-to-be-not-worth-the-time-to-investigate-in-the-
grand-scheme-of-all-of-the-fires-burning-on-maintainers laps that it's
really not reasonable to ask maintainers to determine whether
something is a false alarm or not.  If I want more of these unreliable
potential bug reports to investigate, there is a huge backlog in
Syzkaller.  :-)

Looking at the second ext4 report, it doesn't make any sense.  Context
A is the kjournald thread.  We don't do a commit until (a) the timeout
expires, or (b) someone explicitly requests that a commit happen
waking up j_wait_commit.  I'm guessing that complaint here is that
DEPT thinks nothing is explicitly requesting a wake up.  But note that
after 5 seconds (or whatever journal->j_commit_interval) is configured
to be we *will* always start a commit.  So ergo, there can't be a deadlock.

At a higher level of discussion, it's an unfair tax on maintainer's
times to ask maintainers to help you debug DEPT for you.  Tools like
Syzkaller and DEPT are useful insofar as they save us time in making
our subsystems better.  But until you can prove that it's not going to
be a massive denial of service attack on maintainer's time, at the
*very* least keep an RFC on the patch, or add massive warnings that
more often than not DEPT is going to be sending maintainers on a wild
goose chase.

If you know there there "appear to be false positives", you need to
make sure you've tracked them all down before trying to ask that this
be merged.

You may also want to add some documentation about why we should trust
this; in particular for wait channels, when a process calls schedule()
there may be multiple reasons why the thread will wake up --- in the
worst case, such as in the select(2) or epoll(2) system call, there
may be literally thousands of reasons (one for every file desriptor
the select is waiting on) --- why the process will wake up and thus
resolve the potential "deadlock" that DEPT is worrying about.  How is
DEPT going to handle those cases?  If the answer is that things need
to be tagged, then at least disclose potential reasons why DEPT might
be untrustworthy to save your reviewers time.

I know that you're trying to help us, but this tool needs to be far
better than Lockdep before we should think about merging it.  Even if
it finds 5% more potential deadlocks, if it creates 95% more false
positive reports --- and the ones it finds are crazy things that
rarely actually happen in practice, are the costs worth the benefits?
And who is bearing the costs, and who is receiving the benefits?

Regards,

					- Ted

Re: [PATCH 00/16] DEPT(Dependency Tracker)

Posted by Steven Rostedt 4 years, 4 months ago

On Thu, 17 Feb 2022 10:51:09 -0500
"Theodore Ts'o" <tytso@mit.edu> wrote:

> I know that you're trying to help us, but this tool needs to be far
> better than Lockdep before we should think about merging it.  Even if
> it finds 5% more potential deadlocks, if it creates 95% more false
> positive reports --- and the ones it finds are crazy things that
> rarely actually happen in practice, are the costs worth the benefits?
> And who is bearing the costs, and who is receiving the benefits?

I personally believe that there's potential that this can be helpful and we
will want to merge it.

But, what I believe Ted is trying to say is, if you do not know if the
report is a bug or not, please do not ask the maintainers to determine it
for you. This is a good opportunity for you to look to see why your tool
reported an issue, and learn that subsystem. Look at if this is really a
bug or not, and investigate why.

The likely/unlikely tracing I do finds issues all over the kernel. But
before I report anything, I look at the subsystem and determine *why* it's
reporting what it does. In some cases, it's just a config issue. Where, I
may submit a patch saying "this is 100% wrong in X config, and we should
just remove the "unlikely". But I did the due diligence to find out exactly
what the issue is, and why the tooling reported what it reported.

I want to stress that your Dept tooling looks to have the potential of
being something that will be worth while including. But the false positives
needs to be down to the rate of lockdep false positives. As Ted said, if
it's reporting 95% false positives, nobody is going to look at the 5% of
real bugs that it finds.

-- Steve

Re: [PATCH 00/16] DEPT(Dependency Tracker)

Posted by Byungchul Park 4 years, 4 months ago

On Thu, Feb 17, 2022 at 12:00:05PM -0500, Steven Rostedt wrote:
> On Thu, 17 Feb 2022 10:51:09 -0500
> "Theodore Ts'o" <tytso@mit.edu> wrote:
> 
> > I know that you're trying to help us, but this tool needs to be far
> > better than Lockdep before we should think about merging it.  Even if
> > it finds 5% more potential deadlocks, if it creates 95% more false
> > positive reports --- and the ones it finds are crazy things that
> > rarely actually happen in practice, are the costs worth the benefits?
> > And who is bearing the costs, and who is receiving the benefits?
> 
> I personally believe that there's potential that this can be helpful and we
> will want to merge it.
> 
> But, what I believe Ted is trying to say is, if you do not know if the
> report is a bug or not, please do not ask the maintainers to determine it
> for you. This is a good opportunity for you to look to see why your tool
> reported an issue, and learn that subsystem. Look at if this is really a
> bug or not, and investigate why.

Appreciate your feedback. I'll be more careful in reporting things, and
I think I need to make it more conservative...

> The likely/unlikely tracing I do finds issues all over the kernel. But
> before I report anything, I look at the subsystem and determine *why* it's
> reporting what it does. In some cases, it's just a config issue. Where, I
> may submit a patch saying "this is 100% wrong in X config, and we should
> just remove the "unlikely". But I did the due diligence to find out exactly
> what the issue is, and why the tooling reported what it reported.

I'll try my best to do things that way. However, thing is that there's
few reports with my system... That's why I shared Dept in LKML space.

> I want to stress that your Dept tooling looks to have the potential of
> being something that will be worth while including. But the false positives
> needs to be down to the rate of lockdep false positives. As Ted said, if
> it's reporting 95% false positives, nobody is going to look at the 5% of
> real bugs that it finds.

Agree. Dept should not be merged if so. I'm not pushing ahead, but I'm
convinced that Dept works what a dependency tracker should do. Let's see
how valuable it is esp. in the middle of developing something in the
kernel.

Thanks,
Byungchul

Re: [PATCH 00/16] DEPT(Dependency Tracker)

Posted by Byungchul Park 4 years, 4 months ago

On Thu, Feb 17, 2022 at 10:51:09AM -0500, Theodore Ts'o wrote:
> On Thu, Feb 17, 2022 at 07:57:36PM +0900, Byungchul Park wrote:
> > 
> > I've got several reports from the tool. Some of them look like false
> > alarms and some others look like real deadlock possibility. Because of
> > my unfamiliarity of the domain, it's hard to confirm if it's a real one.
> > Let me add the reports on this email thread.
> 
> The problem is we have so many potentially invalid, or
> so-rare-as-to-be-not-worth-the-time-to-investigate-in-the-
> grand-scheme-of-all-of-the-fires-burning-on-maintainers laps that it's
> really not reasonable to ask maintainers to determine whether

Even though I might have been wrong and might be gonna be wrong, you
look so arrogant. You were hasty to judge and trying to walk over me.

I reported it because I thought it was a real problem but couldn't
confirm it. For the other reports that I thought was not real, I didn't
even mention it. If you are talking about the previous report, then I
felt so sorry as I told you. I skimmed through the part of the waits...

Basically, I respect you and appreciate your feedback. Hope you not get
me wrong.

> Looking at the second ext4 report, it doesn't make any sense.  Context
> A is the kjournald thread.  We don't do a commit until (a) the timeout
> expires, or (b) someone explicitly requests that a commit happen
> waking up j_wait_commit.  I'm guessing that complaint here is that
> DEPT thinks nothing is explicitly requesting a wake up.  But note that
> after 5 seconds (or whatever journal->j_commit_interval) is configured
> to be we *will* always start a commit.  So ergo, there can't be a deadlock.

Yeah, it might not be a *deadlock deadlock* because the wait will be
anyway woken up by one of the wake up points you mentioned. However, the
dependency looks problematic because the three contexts participating in
the dependency chain would be stuck for a while until one eventually
wakes it up. I bet it would not be what you meant.

Again. It's not critical but problematic. Or am I missing something?

> At a higher level of discussion, it's an unfair tax on maintainer's
> times to ask maintainers to help you debug DEPT for you.  Tools like
> Syzkaller and DEPT are useful insofar as they save us time in making
> our subsystems better.  But until you can prove that it's not going to
> be a massive denial of service attack on maintainer's time, at the

Partially I agree. I would understand you even if you don't support Dept
until you think it's valuable enough. However, let me keep asking things
to fs folks, not you, even though I would cc you on it.

> If you know there there "appear to be false positives", you need to
> make sure you've tracked them all down before trying to ask that this
> be merged.

To track them all down, I need to ask LKML because Dept works perfectly
with my system. I don't want it to be merged with a lot of false
positive still in there, either.

> You may also want to add some documentation about why we should trust
> this; in particular for wait channels, when a process calls schedule()
> there may be multiple reasons why the thread will wake up --- in the
> worst case, such as in the select(2) or epoll(2) system call, there
> may be literally thousands of reasons (one for every file desriptor
> the select is waiting on) --- why the process will wake up and thus
> resolve the potential "deadlock" that DEPT is worrying about.  How is
> DEPT going to handle those cases?  If the answer is that things need

Thank you for the information but I don't get it which case you are
concerning. I'd like to ask you a specific senario of that so that we
can discuss it more - maybe I guess I could answer to it tho, but I
won't ask you. Just give me an instance only if you think it's worthy.

You look like a guy who unconditionally blames on new things before
understanding it rather than asking and discussing. Again. I also think
anyone doesn't have to spend his or her time for what he or she think is
not worthy enough.

> I know that you're trying to help us, but this tool needs to be far
> better than Lockdep before we should think about merging it.  Even if
> it finds 5% more potential deadlocks, if it creates 95% more false

It should not get merged for sure if so, but it sounds too sarcastic.
Let's see if it creates 95% false positives for real. If it's true and
I can't control it, I will give up. That's what I should do.

There are a lot of factors to judge how valuable Dept is. Dept would be
useful especially in the middle of development, rather than in the final
state in the tree. It'd be appreciated if you think that sides more, too.

Thanks,
Byungchul

Report 1 in ext4 and journal based on v5.17-rc1

Posted by Byungchul Park 4 years, 4 months ago

[    7.009608] ===================================================
[    7.009613] DEPT: Circular dependency has been detected.
[    7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G        W
[    7.009616] ---------------------------------------------------
[    7.009617] summary
[    7.009618] ---------------------------------------------------
[    7.009618] *** DEADLOCK ***
[    7.009618]
[    7.009619] context A
[    7.009619]     [S] (unknown)(&(bit_wait_table + i)->dmap:0)
[    7.009621]     [W] down_write(&ei->i_data_sem:0)
[    7.009623]     [E] event(&(bit_wait_table + i)->dmap:0)
[    7.009624]
[    7.009625] context B
[    7.009625]     [S] down_read(&ei->i_data_sem:0)
[    7.009626]     [W] wait(&(bit_wait_table + i)->dmap:0)
[    7.009627]     [E] up_read(&ei->i_data_sem:0)
[    7.009628]
[    7.009629] [S]: start of the event context
[    7.009629] [W]: the wait blocked
[    7.009630] [E]: the event not reachable
[    7.009631] ---------------------------------------------------
[    7.009631] context A's detail
[    7.009632] ---------------------------------------------------
[    7.009632] context A
[    7.009633]     [S] (unknown)(&(bit_wait_table + i)->dmap:0)
[    7.009634]     [W] down_write(&ei->i_data_sem:0)
[    7.009635]     [E] event(&(bit_wait_table + i)->dmap:0)
[    7.009636]
[    7.009636] [S] (unknown)(&(bit_wait_table + i)->dmap:0):
[    7.009638] (N/A)
[    7.009638]
[    7.009639] [W] down_write(&ei->i_data_sem:0):
[    7.009639] ext4_truncate (fs/ext4/inode.c:4187) 
[    7.009645] stacktrace:
[    7.009646] down_write (kernel/locking/rwsem.c:1514) 
[    7.009648] ext4_truncate (fs/ext4/inode.c:4187) 
[    7.009650] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
[    7.009652] generic_perform_write (mm/filemap.c:3784) 
[    7.009654] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    7.009657] ext4_file_write_iter (fs/ext4/file.c:677) 
[    7.009659] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    7.009662] vfs_write (fs/read_write.c:590) 
[    7.009663] ksys_write (fs/read_write.c:644) 
[    7.009664] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    7.009667] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    7.009669]
[    7.009670] [E] event(&(bit_wait_table + i)->dmap:0):
[    7.009671] __wake_up_common (kernel/sched/wait.c:108) 
[    7.009673] stacktrace:
[    7.009674] dept_event (kernel/dependency/dept.c:2337) 
[    7.009677] __wake_up_common (kernel/sched/wait.c:109) 
[    7.009678] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
[    7.009679] __wake_up_bit (kernel/sched/wait_bit.c:127) 
[    7.009681] ext4_orphan_del (fs/ext4/orphan.c:282) 
[    7.009683] ext4_truncate (fs/ext4/inode.c:4212) 
[    7.009685] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
[    7.009687] generic_perform_write (mm/filemap.c:3784) 
[    7.009688] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    7.009690] ext4_file_write_iter (fs/ext4/file.c:677) 
[    7.009692] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    7.009694] vfs_write (fs/read_write.c:590) 
[    7.009695] ksys_write (fs/read_write.c:644) 
[    7.009696] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    7.009698] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    7.009700] ---------------------------------------------------
[    7.009700] context B's detail
[    7.009701] ---------------------------------------------------
[    7.009702] context B
[    7.009702]     [S] down_read(&ei->i_data_sem:0)
[    7.009703]     [W] wait(&(bit_wait_table + i)->dmap:0)
[    7.009704]     [E] up_read(&ei->i_data_sem:0)
[    7.009705]
[    7.009706] [S] down_read(&ei->i_data_sem:0):
[    7.009707] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) 
[    7.009709] stacktrace:
[    7.009709] down_read (kernel/locking/rwsem.c:1461) 
[    7.009711] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) 
[    7.009712] ext4_getblk (fs/ext4/inode.c:851) 
[    7.009714] ext4_bread (fs/ext4/inode.c:903) 
[    7.009715] __ext4_read_dirblock (fs/ext4/namei.c:117) 
[    7.009718] dx_probe (fs/ext4/namei.c:789) 
[    7.009720] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
[    7.009722] __ext4_find_entry (fs/ext4/namei.c:1571) 
[    7.009723] ext4_lookup (fs/ext4/namei.c:1770) 
[    7.009725] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) 
[    7.009727] path_openat (fs/namei.c:3401 fs/namei.c:3605) 
[    7.009729] do_filp_open (fs/namei.c:3637) 
[    7.009731] do_sys_openat2 (fs/open.c:1215) 
[    7.009732] do_sys_open (fs/open.c:1231) 
[    7.009734] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    7.009736] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    7.009738]
[    7.009738] [W] wait(&(bit_wait_table + i)->dmap:0):
[    7.009739] prepare_to_wait (kernel/sched/wait.c:275) 
[    7.009741] stacktrace:
[    7.009741] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) 
[    7.009743] schedule (kernel/sched/core.c:6373 (discriminator 1)) 
[    7.009744] io_schedule (./arch/x86/include/asm/current.h:15 kernel/sched/core.c:8392 kernel/sched/core.c:8418) 
[    7.009745] bit_wait_io (./arch/x86/include/asm/current.h:15 kernel/sched/wait_bit.c:210) 
[    7.009746] __wait_on_bit (kernel/sched/wait_bit.c:49) 
[    7.009748] out_of_line_wait_on_bit (kernel/sched/wait_bit.c:65) 
[    7.009749] ext4_read_bh (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 ./include/linux/buffer_head.h:120 fs/ext4/super.c:201) 
[    7.009752] __read_extent_tree_block (fs/ext4/extents.c:545) 
[    7.009754] ext4_find_extent (fs/ext4/extents.c:928) 
[    7.009756] ext4_ext_map_blocks (fs/ext4/extents.c:4099) 
[    7.009757] ext4_map_blocks (fs/ext4/inode.c:563) 
[    7.009759] ext4_getblk (fs/ext4/inode.c:851) 
[    7.009760] ext4_bread (fs/ext4/inode.c:903) 
[    7.009762] __ext4_read_dirblock (fs/ext4/namei.c:117) 
[    7.009764] dx_probe (fs/ext4/namei.c:789) 
[    7.009765] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
[    7.009767]
[    7.009768] [E] up_read(&ei->i_data_sem:0):
[    7.009769] ext4_map_blocks (fs/ext4/inode.c:593) 
[    7.009771] stacktrace:
[    7.009771] up_read (kernel/locking/rwsem.c:1556) 
[    7.009774] ext4_map_blocks (fs/ext4/inode.c:593) 
[    7.009775] ext4_getblk (fs/ext4/inode.c:851) 
[    7.009777] ext4_bread (fs/ext4/inode.c:903) 
[    7.009778] __ext4_read_dirblock (fs/ext4/namei.c:117) 
[    7.009780] dx_probe (fs/ext4/namei.c:789) 
[    7.009782] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
[    7.009784] __ext4_find_entry (fs/ext4/namei.c:1571) 
[    7.009786] ext4_lookup (fs/ext4/namei.c:1770) 
[    7.009788] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) 
[    7.009789] path_openat (fs/namei.c:3401 fs/namei.c:3605) 
[    7.009791] do_filp_open (fs/namei.c:3637) 
[    7.009792] do_sys_openat2 (fs/open.c:1215) 
[    7.009794] do_sys_open (fs/open.c:1231) 
[    7.009795] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    7.009797] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    7.009799] ---------------------------------------------------
[    7.009800] information that might be helpful
[    7.009800] ---------------------------------------------------
[    7.009801] CPU: 0 PID: 611 Comm: rs:main Q:Reg Tainted: G        W         5.17.0-rc1-00014-g8a599299c0cb-dirty #30
[    7.009804] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[    7.009805] Call Trace:
[    7.009806]  <TASK>
[    7.009807] dump_stack_lvl (lib/dump_stack.c:107) 
[    7.009809] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) 
[    7.009812] ? print_circle (kernel/dependency/dept.c:1086) 
[    7.009814] cb_check_dl (kernel/dependency/dept.c:1104) 
[    7.009815] bfs (kernel/dependency/dept.c:860) 
[    7.009818] add_dep (kernel/dependency/dept.c:1423) 
[    7.009820] do_event.isra.25 (kernel/dependency/dept.c:1650) 
[    7.009822] ? __wake_up_common (kernel/sched/wait.c:108) 
[    7.009824] dept_event (kernel/dependency/dept.c:2337) 
[    7.009826] __wake_up_common (kernel/sched/wait.c:109) 
[    7.009828] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
[    7.009830] __wake_up_bit (kernel/sched/wait_bit.c:127) 
[    7.009832] ext4_orphan_del (fs/ext4/orphan.c:282) 
[    7.009835] ? dept_ecxt_exit (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:2478) 
[    7.009837] ext4_truncate (fs/ext4/inode.c:4212) 
[    7.009839] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
[    7.009842] generic_perform_write (mm/filemap.c:3784) 
[    7.009845] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    7.009848] ext4_file_write_iter (fs/ext4/file.c:677) 
[    7.009851] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    7.009854] vfs_write (fs/read_write.c:590) 
[    7.009856] ksys_write (fs/read_write.c:644) 
[    7.009857] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:65) 
[    7.009860] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    7.009862] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    7.009865] RIP: 0033:0x7f3b160b335d
[ 7.009867] Code: e1 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce fa ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 17 fb ff ff 48 89 d0 48 83 c4 08 48 3d 01
All code
========
   0:	e1 20                	loope  0x22
   2:	00 00                	add    %al,(%rax)
   4:	75 10                	jne    0x16
   6:	b8 01 00 00 00       	mov    $0x1,%eax
   b:	0f 05                	syscall 
   d:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
  13:	73 31                	jae    0x46
  15:	c3                   	retq   
  16:	48 83 ec 08          	sub    $0x8,%rsp
  1a:	e8 ce fa ff ff       	callq  0xfffffffffffffaed
  1f:	48 89 04 24          	mov    %rax,(%rsp)
  23:	b8 01 00 00 00       	mov    $0x1,%eax
  28:	0f 05                	syscall 
  2a:*	48 8b 3c 24          	mov    (%rsp),%rdi		<-- trapping instruction
  2e:	48 89 c2             	mov    %rax,%rdx
  31:	e8 17 fb ff ff       	callq  0xfffffffffffffb4d
  36:	48 89 d0             	mov    %rdx,%rax
  39:	48 83 c4 08          	add    $0x8,%rsp
  3d:	48                   	rex.W
  3e:	3d                   	.byte 0x3d
  3f:	01                   	.byte 0x1

Code starting with the faulting instruction
===========================================
   0:	48 8b 3c 24          	mov    (%rsp),%rdi
   4:	48 89 c2             	mov    %rax,%rdx
   7:	e8 17 fb ff ff       	callq  0xfffffffffffffb23
   c:	48 89 d0             	mov    %rdx,%rax
   f:	48 83 c4 08          	add    $0x8,%rsp
  13:	48                   	rex.W
  14:	3d                   	.byte 0x3d
  15:	01                   	.byte 0x1
[    7.009869] RSP: 002b:00007f3b1340f180 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[    7.009871] RAX: ffffffffffffffda RBX: 00007f3b040010a0 RCX: 00007f3b160b335d
[    7.009873] RDX: 0000000000000300 RSI: 00007f3b040010a0 RDI: 0000000000000001
[    7.009874] RBP: 0000000000000000 R08: fffffffffffffa15 R09: fffffffffffffa05
[    7.009875] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f3b04000df0
[    7.009876] R13: 00007f3b1340f1a0 R14: 0000000000000220 R15: 0000000000000300
[    7.009879]  </TASK>

Report 2 in ext4 and journal based on v5.17-rc1

Posted by Byungchul Park 4 years, 4 months ago

[    9.008161] ===================================================
[    9.008163] DEPT: Circular dependency has been detected.
[    9.008164] 5.17.0-rc1-00015-gb94f67143867-dirty #2 Tainted: G        W
[    9.008166] ---------------------------------------------------
[    9.008167] summary
[    9.008167] ---------------------------------------------------
[    9.008168] *** DEADLOCK ***
[    9.008168]
[    9.008168] context A
[    9.008169]     [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0)
[    9.008171]     [W] wait(&(&journal->j_wait_commit)->dmap:0)
[    9.008172]     [E] event(&(&journal->j_wait_transaction_locked)->dmap:0)
[    9.008173]
[    9.008173] context B
[    9.008174]     [S] down_write(mapping.invalidate_lock:0)
[    9.008175]     [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0)
[    9.008176]     [E] up_write(mapping.invalidate_lock:0)
[    9.008177]
[    9.008178] context C
[    9.008179]     [S] (unknown)(&(&journal->j_wait_commit)->dmap:0)
[    9.008180]     [W] down_write(mapping.invalidate_lock:0)
[    9.008181]     [E] event(&(&journal->j_wait_commit)->dmap:0)
[    9.008181]
[    9.008182] [S]: start of the event context
[    9.008183] [W]: the wait blocked
[    9.008183] [E]: the event not reachable
[    9.008184] ---------------------------------------------------
[    9.008184] context A's detail
[    9.008185] ---------------------------------------------------
[    9.008186] context A
[    9.008186]     [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0)
[    9.008187]     [W] wait(&(&journal->j_wait_commit)->dmap:0)
[    9.008188]     [E] event(&(&journal->j_wait_transaction_locked)->dmap:0)
[    9.008189]
[    9.008190] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0):
[    9.008191] (N/A)
[    9.008191]
[    9.008192] [W] wait(&(&journal->j_wait_commit)->dmap:0):
[    9.008193] prepare_to_wait (kernel/sched/wait.c:275) 
[    9.008197] stacktrace:
[    9.008198] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) 
[    9.008200] schedule (kernel/sched/core.c:6373 (discriminator 1)) 
[    9.008201] kjournald2 (fs/jbd2/journal.c:250) 
[    9.008203] kthread (kernel/kthread.c:377) 
[    9.008206] ret_from_fork (arch/x86/entry/entry_64.S:301) 
[    9.008209]
[    9.008209] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0):
[    9.008210] __wake_up_common (kernel/sched/wait.c:108) 
[    9.008212] stacktrace:
[    9.008213] dept_event (kernel/dependency/dept.c:2337) 
[    9.008215] __wake_up_common (kernel/sched/wait.c:109) 
[    9.008217] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
[    9.008218] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) 
[    9.008221] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) 
[    9.008223] kthread (kernel/kthread.c:377) 
[    9.008224] ret_from_fork (arch/x86/entry/entry_64.S:301) 
[    9.008226] ---------------------------------------------------
[    9.008226] context B's detail
[    9.008227] ---------------------------------------------------
[    9.008228] context B
[    9.008228]     [S] down_write(mapping.invalidate_lock:0)
[    9.008229]     [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0)
[    9.008230]     [E] up_write(mapping.invalidate_lock:0)
[    9.008231]
[    9.008232] [S] down_write(mapping.invalidate_lock:0):
[    9.008233] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
[    9.008237] stacktrace:
[    9.008237] down_write (kernel/locking/rwsem.c:1514) 
[    9.008239] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
[    9.008241] generic_perform_write (mm/filemap.c:3784) 
[    9.008243] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    9.008245] ext4_file_write_iter (fs/ext4/file.c:677) 
[    9.008247] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    9.008250] vfs_write (fs/read_write.c:590) 
[    9.008251] ksys_write (fs/read_write.c:644) 
[    9.008253] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    9.008255] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    9.008258]
[    9.008258] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0):
[    9.008259] prepare_to_wait (kernel/sched/wait.c:275) 
[    9.008261] stacktrace:
[    9.008261] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) 
[    9.008263] schedule (kernel/sched/core.c:6373 (discriminator 1)) 
[    9.008264] wait_transaction_locked (fs/jbd2/transaction.c:184) 
[    9.008266] add_transaction_credits (fs/jbd2/transaction.c:248 (discriminator 3)) 
[    9.008267] start_this_handle (fs/jbd2/transaction.c:427) 
[    9.008269] jbd2__journal_start (fs/jbd2/transaction.c:526) 
[    9.008271] __ext4_journal_start_sb (fs/ext4/ext4_jbd2.c:105) 
[    9.008273] ext4_truncate (fs/ext4/inode.c:4164) 
[    9.008274] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
[    9.008276] generic_perform_write (mm/filemap.c:3784) 
[    9.008277] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    9.008279] ext4_file_write_iter (fs/ext4/file.c:677) 
[    9.008281] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    9.008283] vfs_write (fs/read_write.c:590) 
[    9.008284] ksys_write (fs/read_write.c:644) 
[    9.008285] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    9.008287]
[    9.008288] [E] up_write(mapping.invalidate_lock:0):
[    9.008288] ext4_da_get_block_prep (fs/ext4/inode.c:1795 fs/ext4/inode.c:1829) 
[    9.008291] ---------------------------------------------------
[    9.008291] context C's detail
[    9.008292] ---------------------------------------------------
[    9.008292] context C
[    9.008293]     [S] (unknown)(&(&journal->j_wait_commit)->dmap:0)
[    9.008294]     [W] down_write(mapping.invalidate_lock:0)
[    9.008295]     [E] event(&(&journal->j_wait_commit)->dmap:0)
[    9.008296]
[    9.008297] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0):
[    9.008298] (N/A)
[    9.008298]
[    9.008299] [W] down_write(mapping.invalidate_lock:0):
[    9.008299] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
[    9.008302] stacktrace:
[    9.008302] down_write (kernel/locking/rwsem.c:1514) 
[    9.008304] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
[    9.008305] generic_perform_write (mm/filemap.c:3784) 
[    9.008307] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    9.008309] ext4_file_write_iter (fs/ext4/file.c:677) 
[    9.008311] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    9.008312] vfs_write (fs/read_write.c:590) 
[    9.008314] ksys_write (fs/read_write.c:644) 
[    9.008315] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    9.008316] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    9.008318]
[    9.008319] [E] event(&(&journal->j_wait_commit)->dmap:0):
[    9.008320] __wake_up_common (kernel/sched/wait.c:108) 
[    9.008321] stacktrace:
[    9.008322] __wake_up_common (kernel/sched/wait.c:109) 
[    9.008323] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
[    9.008324] __jbd2_log_start_commit (fs/jbd2/journal.c:508) 
[    9.008326] jbd2_log_start_commit (fs/jbd2/journal.c:527) 
[    9.008327] __jbd2_journal_force_commit (fs/jbd2/journal.c:560) 
[    9.008329] jbd2_journal_force_commit_nested (fs/jbd2/journal.c:583) 
[    9.008331] ext4_should_retry_alloc (fs/ext4/balloc.c:670 (discriminator 3)) 
[    9.008332] ext4_da_write_begin (fs/ext4/inode.c:2965 (discriminator 1)) 
[    9.008334] generic_perform_write (mm/filemap.c:3784) 
[    9.008335] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    9.008337] ext4_file_write_iter (fs/ext4/file.c:677) 
[    9.008339] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    9.008341] vfs_write (fs/read_write.c:590) 
[    9.008342] ksys_write (fs/read_write.c:644) 
[    9.008343] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    9.008345] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    9.008347] ---------------------------------------------------
[    9.008348] information that might be helpful
[    9.008348] ---------------------------------------------------
[    9.008349] CPU: 0 PID: 89 Comm: jbd2/sda1-8 Tainted: G        W         5.17.0-rc1-00015-gb94f67143867-dirty #2
[    9.008352] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[    9.008353] Call Trace:
[    9.008354]  <TASK>
[    9.008355] dump_stack_lvl (lib/dump_stack.c:107) 
[    9.008358] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) 
[    9.008360] ? print_circle (kernel/dependency/dept.c:1086) 
[    9.008362] cb_check_dl (kernel/dependency/dept.c:1104) 
[    9.008364] bfs (kernel/dependency/dept.c:860) 
[    9.008366] add_dep (kernel/dependency/dept.c:1423) 
[    9.008368] do_event.isra.25 (kernel/dependency/dept.c:1651) 
[    9.008370] ? __wake_up_common (kernel/sched/wait.c:108) 
[    9.008372] dept_event (kernel/dependency/dept.c:2337) 
[    9.008374] __wake_up_common (kernel/sched/wait.c:109) 
[    9.008376] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
[    9.008379] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) 
[    9.008381] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:24) 
[    9.008385] ? ret_from_fork (arch/x86/entry/entry_64.S:301) 
[    9.008387] ? dept_enable_hardirq (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:1843) 
[    9.008389] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:45 ./arch/x86/include/asm/irqflags.h:80 ./arch/x86/include/asm/irqflags.h:138 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) 
[    9.008392] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:103 ./include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) 
[    9.008394] ? try_to_del_timer_sync (kernel/time/timer.c:1239) 
[    9.008396] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) 
[    9.008398] ? prepare_to_wait_exclusive (kernel/sched/wait.c:431) 
[    9.008400] ? commit_timeout (fs/jbd2/journal.c:173) 
[    9.008402] kthread (kernel/kthread.c:377) 
[    9.008404] ? kthread_complete_and_exit (kernel/kthread.c:332) 
[    9.008407] ret_from_fork (arch/x86/entry/entry_64.S:301) 
[    9.008410]  </TASK>