OSDN Git Service

drm/amdkfd: Fix a concurrency issue during kfd recovery
authorOak Zeng <Oak.Zeng@amd.com>
Thu, 15 Jul 2021 23:34:25 +0000 (18:34 -0500)
committerAlex Deucher <alexander.deucher@amd.com>
Fri, 23 Jul 2021 14:08:00 +0000 (10:08 -0400)
commit4f942aaeb19dbf2135931120cc806d459add4788
tree4b40dddc9e1921155700d94e32052066ad81cfb9
parent78ccea9ff2ad6fb5c73f146b46193ef15d6ede5f
drm/amdkfd: Fix a concurrency issue during kfd recovery

start_cpsch and stop_cpsch can be called during kfd device
initialization or during gpu reset/recovery. So they can
run concurrently. Currently in start_cpsch and stop_cpsch,
pm_init and pm_uninit is not protected by the dpm lock.
Imagine such a case that user use packet manager's function
to submit a pm4 packet to hang hws (ie through command
cat /sys/class/kfd/kfd/topology/nodes/1/gpu_id | sudo tee
/sys/kernel/debug/kfd/hang_hws), while kfd device is under
device reset/recovery so packet manager can be not initialized.
There will be unpredictable protection fault in such case.

This patch moves pm_init/uninit inside the dpm lock and check
packet manager is initialized before using packet manager
function.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdkfd/kfd_device.c
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
drivers/gpu/drm/amd/amdkfd/kfd_priv.h