[PATCH v2] drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2

Jeff Layton posted 1 patch 2 years, 8 months ago
drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH v2] drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
Posted by Jeff Layton 2 years, 8 months ago
I've been experiencing some intermittent crashes down in the display
driver code. The symptoms are ususally a line like this in dmesg:

    amdgpu 0000:30:00.0: [drm] Failed to create MST payload for port 000000006d3a3885: -5

...followed by an Oops due to a NULL pointer dereference.

Switch to using mgr->dev instead of state->dev since "state" can be
NULL in some cases.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2184855
Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

I've been running this patch for a couple of days, but the problem
hasn't occurred again as of yet. It seems sane though as long as we can
assume that mgr->dev will be valid even when "state" is a NULL pointer.

diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index 38dab76ae69e..e2e21ce79510 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -3404,7 +3404,7 @@ int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
 
 	/* Skip failed payloads */
 	if (payload->vc_start_slot == -1) {
-		drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
+		drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
 			    payload->port->connector->name);
 		return -EIO;
 	}
-- 
2.39.2
Re: [PATCH v2] drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
Posted by Lyude Paul 2 years, 8 months ago
Reviewed-by: Lyude Paul <lyude@redhat.com>

Thanks!

On Wed, 2023-04-19 at 07:24 -0400, Jeff Layton wrote:
> I've been experiencing some intermittent crashes down in the display
> driver code. The symptoms are ususally a line like this in dmesg:
> 
>     amdgpu 0000:30:00.0: [drm] Failed to create MST payload for port 000000006d3a3885: -5
> 
> ...followed by an Oops due to a NULL pointer dereference.
> 
> Switch to using mgr->dev instead of state->dev since "state" can be
> NULL in some cases.
> 
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2184855
> Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> I've been running this patch for a couple of days, but the problem
> hasn't occurred again as of yet. It seems sane though as long as we can
> assume that mgr->dev will be valid even when "state" is a NULL pointer.
> 
> diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> index 38dab76ae69e..e2e21ce79510 100644
> --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> @@ -3404,7 +3404,7 @@ int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
>  
>  	/* Skip failed payloads */
>  	if (payload->vc_start_slot == -1) {
> -		drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> +		drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
>  			    payload->port->connector->name);
>  		return -EIO;
>  	}

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat
Re: [PATCH v2] drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
Posted by Jeff Layton 2 years, 6 months ago
I've noticed that this patch is not included in linux-next currently.

Can I get some confirmation that this is going to be included in v6.5?
Currently, I've been having to rebuild Fedora kernels to avoid this
panic, and I'd like to know there is a light at the end of that tunnel.

Thanks,
Jeff

On Wed, 2023-04-19 at 16:54 -0400, Lyude Paul wrote:
> Reviewed-by: Lyude Paul <lyude@redhat.com>
> 
> Thanks!
> 
> On Wed, 2023-04-19 at 07:24 -0400, Jeff Layton wrote:
> > I've been experiencing some intermittent crashes down in the display
> > driver code. The symptoms are ususally a line like this in dmesg:
> > 
> >     amdgpu 0000:30:00.0: [drm] Failed to create MST payload for port 000000006d3a3885: -5
> > 
> > ...followed by an Oops due to a NULL pointer dereference.
> > 
> > Switch to using mgr->dev instead of state->dev since "state" can be
> > NULL in some cases.
> > 
> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2184855
> > Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > I've been running this patch for a couple of days, but the problem
> > hasn't occurred again as of yet. It seems sane though as long as we can
> > assume that mgr->dev will be valid even when "state" is a NULL pointer.
> > 
> > diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > index 38dab76ae69e..e2e21ce79510 100644
> > --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > @@ -3404,7 +3404,7 @@ int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
> >  
> >  	/* Skip failed payloads */
> >  	if (payload->vc_start_slot == -1) {
> > -		drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> > +		drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> >  			    payload->port->connector->name);
> >  		return -EIO;
> >  	}
> 

-- 
Jeff Layton <jlayton@kernel.org>
Re: [PATCH v2] drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
Posted by Lyude Paul 2 years, 5 months ago
Eek - this might have been a situation where everyone involved assumed someone
else would push it, whoops. I'll make sure this is pushed upstream :).

FWIW: You could definitely send an MR to the fedora kernel's gitlab to get
this included earlier. If you don't get to it before me I'll try to do that
today

On Tue, 2023-06-20 at 07:18 -0400, Jeff Layton wrote:
> I've noticed that this patch is not included in linux-next currently.
> 
> Can I get some confirmation that this is going to be included in v6.5?
> Currently, I've been having to rebuild Fedora kernels to avoid this
> panic, and I'd like to know there is a light at the end of that tunnel.
> 
> Thanks,
> Jeff
> 
> On Wed, 2023-04-19 at 16:54 -0400, Lyude Paul wrote:
> > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > 
> > Thanks!
> > 
> > On Wed, 2023-04-19 at 07:24 -0400, Jeff Layton wrote:
> > > I've been experiencing some intermittent crashes down in the display
> > > driver code. The symptoms are ususally a line like this in dmesg:
> > > 
> > >     amdgpu 0000:30:00.0: [drm] Failed to create MST payload for port 000000006d3a3885: -5
> > > 
> > > ...followed by an Oops due to a NULL pointer dereference.
> > > 
> > > Switch to using mgr->dev instead of state->dev since "state" can be
> > > NULL in some cases.
> > > 
> > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2184855
> > > Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > I've been running this patch for a couple of days, but the problem
> > > hasn't occurred again as of yet. It seems sane though as long as we can
> > > assume that mgr->dev will be valid even when "state" is a NULL pointer.
> > > 
> > > diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > index 38dab76ae69e..e2e21ce79510 100644
> > > --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > @@ -3404,7 +3404,7 @@ int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
> > >  
> > >  	/* Skip failed payloads */
> > >  	if (payload->vc_start_slot == -1) {
> > > -		drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> > > +		drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> > >  			    payload->port->connector->name);
> > >  		return -EIO;
> > >  	}
> > 
> 

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat
Re: [PATCH v2] drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
Posted by Lyude Paul 2 years, 5 months ago
Also since I forgot, so patchwork picks this up:

Reviewed-by: Lyude Paul <lyude@redhat.com>

On Tue, 2023-06-20 at 15:50 -0400, Lyude Paul wrote:
> Eek - this might have been a situation where everyone involved assumed someone
> else would push it, whoops. I'll make sure this is pushed upstream :).
> 
> FWIW: You could definitely send an MR to the fedora kernel's gitlab to get
> this included earlier. If you don't get to it before me I'll try to do that
> today
> 
> On Tue, 2023-06-20 at 07:18 -0400, Jeff Layton wrote:
> > I've noticed that this patch is not included in linux-next currently.
> > 
> > Can I get some confirmation that this is going to be included in v6.5?
> > Currently, I've been having to rebuild Fedora kernels to avoid this
> > panic, and I'd like to know there is a light at the end of that tunnel.
> > 
> > Thanks,
> > Jeff
> > 
> > On Wed, 2023-04-19 at 16:54 -0400, Lyude Paul wrote:
> > > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > > 
> > > Thanks!
> > > 
> > > On Wed, 2023-04-19 at 07:24 -0400, Jeff Layton wrote:
> > > > I've been experiencing some intermittent crashes down in the display
> > > > driver code. The symptoms are ususally a line like this in dmesg:
> > > > 
> > > >     amdgpu 0000:30:00.0: [drm] Failed to create MST payload for port 000000006d3a3885: -5
> > > > 
> > > > ...followed by an Oops due to a NULL pointer dereference.
> > > > 
> > > > Switch to using mgr->dev instead of state->dev since "state" can be
> > > > NULL in some cases.
> > > > 
> > > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2184855
> > > > Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >  drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > I've been running this patch for a couple of days, but the problem
> > > > hasn't occurred again as of yet. It seems sane though as long as we can
> > > > assume that mgr->dev will be valid even when "state" is a NULL pointer.
> > > > 
> > > > diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > > index 38dab76ae69e..e2e21ce79510 100644
> > > > --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > > +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > > @@ -3404,7 +3404,7 @@ int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
> > > >  
> > > >  	/* Skip failed payloads */
> > > >  	if (payload->vc_start_slot == -1) {
> > > > -		drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> > > > +		drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> > > >  			    payload->port->connector->name);
> > > >  		return -EIO;
> > > >  	}
> > > 
> > 
> 

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat
Re: [PATCH v2] drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
Posted by Chris Bainbridge 1 year, 7 months ago
On Tue, Jun 20, 2023 at 03:59:24PM -0400, Lyude Paul wrote:
> Also since I forgot, so patchwork picks this up:
> 
> Reviewed-by: Lyude Paul <lyude@redhat.com>
> 
> On Tue, 2023-06-20 at 15:50 -0400, Lyude Paul wrote:
> > Eek - this might have been a situation where everyone involved assumed someone
> > else would push it, whoops. I'll make sure this is pushed upstream :).
> > 
> > FWIW: You could definitely send an MR to the fedora kernel's gitlab to get
> > this included earlier. If you don't get to it before me I'll try to do that
> > today
> > 
> > On Tue, 2023-06-20 at 07:18 -0400, Jeff Layton wrote:
> > > I've noticed that this patch is not included in linux-next currently.
> > > 
> > > Can I get some confirmation that this is going to be included in v6.5?
> > > Currently, I've been having to rebuild Fedora kernels to avoid this
> > > panic, and I'd like to know there is a light at the end of that tunnel.
> > > 
> > > Thanks,
> > > Jeff
> > > 
> > > On Wed, 2023-04-19 at 16:54 -0400, Lyude Paul wrote:
> > > > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > > > 
> > > > Thanks!
> > > > 
> > > > On Wed, 2023-04-19 at 07:24 -0400, Jeff Layton wrote:
> > > > > I've been experiencing some intermittent crashes down in the display
> > > > > driver code. The symptoms are ususally a line like this in dmesg:
> > > > > 
> > > > >     amdgpu 0000:30:00.0: [drm] Failed to create MST payload for port 000000006d3a3885: -5
> > > > > 
> > > > > ...followed by an Oops due to a NULL pointer dereference.
> > > > > 
> > > > > Switch to using mgr->dev instead of state->dev since "state" can be
> > > > > NULL in some cases.
> > > > > 
> > > > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2184855
> > > > > Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > ---
> > > > >  drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > I've been running this patch for a couple of days, but the problem
> > > > > hasn't occurred again as of yet. It seems sane though as long as we can
> > > > > assume that mgr->dev will be valid even when "state" is a NULL pointer.
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > > > index 38dab76ae69e..e2e21ce79510 100644
> > > > > --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > > > +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > > > @@ -3404,7 +3404,7 @@ int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
> > > > >  
> > > > >  	/* Skip failed payloads */
> > > > >  	if (payload->vc_start_slot == -1) {
> > > > > -		drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> > > > > +		drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> > > > >  			    payload->port->connector->name);
> > > > >  		return -EIO;
> > > > >  	}
> > > > 
> > > 
> > 
> 
> -- 
> Cheers,
>  Lyude Paul (she/her)
>  Software Engineer at Red Hat
> 

Hello, this patch regressed in Wayne's 5aa1dfcdf0a42 commit:

$ git show 5aa1dfcdf0a42 | grep -A6 'Skip failed payloads'
 	/* Skip failed payloads */
-	if (payload->vc_start_slot == -1) {
-		drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
+	if (payload->payload_allocation_status != DRM_DP_MST_PAYLOAD_ALLOCATION_DFP) {
+		drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
 			    payload->port->connector->name);
 		return -EIO;

$ git tag --contains 5aa1dfcdf0a42
v6.7
v6.7-rc1
v6.7-rc2
v6.7-rc3
v6.7-rc4
v6.7-rc5
v6.7-rc6
v6.7-rc7
v6.7-rc8
v6.7.1
v6.7.10
v6.7.11
v6.7.12
v6.7.2
v6.7.3
v6.7.4
v6.7.5
v6.7.6
v6.7.7
v6.7.8
v6.7.9
v6.8
v6.8-rc1
v6.8-rc2
v6.8-rc3
v6.8-rc4
v6.8-rc5
v6.8-rc6
v6.8-rc7
v6.8.1
v6.8.2
v6.8.3
v6.8.4
v6.8.5
v6.8.6
v6.8.7
v6.8.8
v6.9-rc1
v6.9-rc2
v6.9-rc3
v6.9-rc4
v6.9-rc5
Re: [PATCH v2] drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
Posted by Jeff Layton 2 years, 7 months ago
On Wed, 2023-04-19 at 16:54 -0400, Lyude Paul wrote:
> Reviewed-by: Lyude Paul <lyude@redhat.com>
> 
> Thanks!
> 
> On Wed, 2023-04-19 at 07:24 -0400, Jeff Layton wrote:
> > I've been experiencing some intermittent crashes down in the display
> > driver code. The symptoms are ususally a line like this in dmesg:
> > 
> >     amdgpu 0000:30:00.0: [drm] Failed to create MST payload for port 000000006d3a3885: -5
> > 
> > ...followed by an Oops due to a NULL pointer dereference.
> > 
> > Switch to using mgr->dev instead of state->dev since "state" can be
> > NULL in some cases.
> > 
> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2184855
> > Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > I've been running this patch for a couple of days, but the problem
> > hasn't occurred again as of yet. It seems sane though as long as we can
> > assume that mgr->dev will be valid even when "state" is a NULL pointer.
> > 
> > diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > index 38dab76ae69e..e2e21ce79510 100644
> > --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > @@ -3404,7 +3404,7 @@ int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
> >  
> >  	/* Skip failed payloads */
> >  	if (payload->vc_start_slot == -1) {
> > -		drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> > +		drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> >  			    payload->port->connector->name);
> >  		return -EIO;
> >  	}
> 

Thanks! BTW, I've had a couple more of these events in the last few
days:

[20199.446159] amdgpu 0000:30:00.0: [drm] Failed to create MST payload for port 00000000556eb455: -5
[20199.508379] [drm] DM_MST: stopping TM on aconnector: 000000001c0c0284 [id: 86]
[20200.064417] [drm] DM_MST: starting TM on aconnector: 000000001c0c0284 [id: 86]

The patch prevents an Oops, but GNOME seems to decide that a different
monitor is primary and moves all of the windows on the desktop around (I
have 2 monitors). Mostly this seems to happen when I walk away from the
machine for a bit, so I suspect it's associated with the display going
to sleep.

At one point, Wayne said he might know the root cause of this. If there
are patches that you need help testing, I can do that. I'm having to
build my own kernels anyway until this patch makes it into the distros.
-- 
Jeff Layton <jlayton@kernel.org>
Re: [PATCH v2] drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
Posted by Jani Nikula 2 years, 8 months ago
On Wed, 19 Apr 2023, Jeff Layton <jlayton@kernel.org> wrote:
> I've been experiencing some intermittent crashes down in the display
> driver code. The symptoms are ususally a line like this in dmesg:
>
>     amdgpu 0000:30:00.0: [drm] Failed to create MST payload for port 000000006d3a3885: -5
>
> ...followed by an Oops due to a NULL pointer dereference.
>
> Switch to using mgr->dev instead of state->dev since "state" can be
> NULL in some cases.
>
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2184855
> Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

Thanks,

Reviewed-by: Jani Nikula <jani.nikula@intel.com>


> ---
>  drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> I've been running this patch for a couple of days, but the problem
> hasn't occurred again as of yet. It seems sane though as long as we can
> assume that mgr->dev will be valid even when "state" is a NULL pointer.
>
> diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> index 38dab76ae69e..e2e21ce79510 100644
> --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> @@ -3404,7 +3404,7 @@ int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
>  
>  	/* Skip failed payloads */
>  	if (payload->vc_start_slot == -1) {
> -		drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> +		drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
>  			    payload->port->connector->name);
>  		return -EIO;
>  	}

-- 
Jani Nikula, Intel Open Source Graphics Center
Re: [PATCH v2] drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
Posted by Jeff Layton 2 years, 7 months ago
On Wed, 2023-04-19 at 16:21 +0300, Jani Nikula wrote:
> On Wed, 19 Apr 2023, Jeff Layton <jlayton@kernel.org> wrote:
> > I've been experiencing some intermittent crashes down in the display
> > driver code. The symptoms are ususally a line like this in dmesg:
> > 
> >     amdgpu 0000:30:00.0: [drm] Failed to create MST payload for port 000000006d3a3885: -5
> > 
> > ...followed by an Oops due to a NULL pointer dereference.
> > 
> > Switch to using mgr->dev instead of state->dev since "state" can be
> > NULL in some cases.
> > 
> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2184855
> > Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> 
> Thanks,
> 
> Reviewed-by: Jani Nikula <jani.nikula@intel.com>
> 
>
> > ---
> >  drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > I've been running this patch for a couple of days, but the problem
> > hasn't occurred again as of yet. It seems sane though as long as we can
> > assume that mgr->dev will be valid even when "state" is a NULL pointer.
> > 
> > diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > index 38dab76ae69e..e2e21ce79510 100644
> > --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > @@ -3404,7 +3404,7 @@ int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
> >  
> >  	/* Skip failed payloads */
> >  	if (payload->vc_start_slot == -1) {
> > -		drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> > +		drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
> >  			    payload->port->connector->name);
> >  		return -EIO;
> >  	}
> 

Thanks for the reviews!

I finally had this happen again today, and I can confirm that this does
prevent the oops. GNOME rearranged my screen layout after the error, but
the box stayed up and running. 
-- 
Jeff Layton <jlayton@kernel.org>