OSDN Git Service

AMDGPU/InsertWaitcnts: Untangle some semi-global state
authorNicolai Haehnle <nhaehnle@gmail.com>
Thu, 29 Nov 2018 11:06:06 +0000 (11:06 +0000)
committerNicolai Haehnle <nhaehnle@gmail.com>
Thu, 29 Nov 2018 11:06:06 +0000 (11:06 +0000)
commitf2ec2633c5f1c10aae0710c7da20273e27018aa8
treed014bba8ae71706b1928aa4d4bdbf7d4b710a0ab
parentbf9aa8b6bf3e64b0d0df4f8292b00bfda9b62f89
AMDGPU/InsertWaitcnts: Untangle some semi-global state

Summary:
Reduce the statefulness of the algorithm in two ways:

1. More clearly split generateWaitcntInstBefore into two phases: the
   first one which determines the required wait, if any, without changing
   the ScoreBrackets, and the second one which actually inserts the wait
   and updates the brackets.

2. Communicate pre-existing s_waitcnt instructions using an argument to
   generateWaitcntInstBefore instead of through the ScoreBrackets.

To simplify these changes, a Waitcnt structure is introduced which carries
the counts of an s_waitcnt instruction in decoded form.

There are some functional changes:

1. The FIXME for the VCCZ bug workaround was implemented: we only wait for
   SMEM instructions as required instead of waiting on all counters.

2. We now properly track pre-existing waitcnt's in all cases, which leads
   to less conservative waitcnts being emitted in some cases.

     s_load_dword ...
     s_waitcnt lgkmcnt(0)    <-- pre-existing wait count
     ds_read_b32 v0, ...
     ds_read_b32 v1, ...
     s_waitcnt lgkmcnt(0)    <-- this is too conservative
     use(v0)
     more code
     use(v1)

   This increases code size a bit, but the reduced latency should still be a
   win in basically all cases. The worst code size regressions in my shader-db
   are:

 WORST REGRESSIONS - Code Size
 Before After     Delta Percentage
   1724  1736        12    0.70 %   shaders/private/f1-2015/1334.shader_test [0]
   2276  2284         8    0.35 %   shaders/private/f1-2015/1306.shader_test [0]
   4632  4640         8    0.17 %   shaders/private/ue4_elemental/62.shader_test [0]
   2376  2384         8    0.34 %   shaders/private/f1-2015/1308.shader_test [0]
   3284  3292         8    0.24 %   shaders/private/talos_principle/1955.shader_test [0]

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54226

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347848 91177308-0d34-0410-b5e6-96231b3b80d8
lib/Target/AMDGPU/SIInsertWaitcnts.cpp
lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
test/CodeGen/AMDGPU/smrd-vccz-bug.ll
test/CodeGen/AMDGPU/vccz-corrupt-bug-workaround.mir
test/CodeGen/AMDGPU/waitcnt-preexisting.mir