OSDN Git Service

drm/amdgpu: skip query ecc info in gpu recovery
authorStanley.Yang <Stanley.Yang@amd.com>
Thu, 2 Dec 2021 05:06:05 +0000 (13:06 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Thu, 2 Dec 2021 17:43:06 +0000 (12:43 -0500)
this is a workaround due to get ecc info failed during gpu recovery

[  700.236122] amdgpu 0000:09:00.0: amdgpu: Failed to export SMU ecc table!
[  700.236128] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
[  704.331171] amdgpu: qcm fence wait loop timeout expired
[  704.331194] amdgpu: The cp might be in an unrecoverable state due to an unsuccessful queues preemption
[  704.332445] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
[  704.332448] amdgpu 0000:09:00.0: amdgpu: Bailing on TDR for s_job:ffffffffffffffff, as another already in progress
[  704.332456] amdgpu: Pasid 0x8000 destroy queue 0 failed, ret -62
[  710.360924] amdgpu 0000:09:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000013 SMN_C2PMSG_82:0x00000007
[  710.360964] amdgpu 0000:09:00.0: amdgpu: Failed to disable smu features.
[  710.361002] amdgpu 0000:09:00.0: amdgpu: Fail to disable dpm features!
[  710.361014] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c

index 3c623e5..28678c8 100644 (file)
@@ -897,6 +897,10 @@ static void amdgpu_ras_get_ecc_info(struct amdgpu_device *adev, struct ras_err_d
        struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
        int ret = 0;
 
+       /* skip get ecc info during gpu recovery */
+       if (atomic_read(&ras->in_recovery) == 1)
+               return;
+
        /*
         * choosing right query method according to
         * whether smu support query error information