OSDN Git Service

[AMDGPU] Bundle loads before post-RA scheduler
authorStanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Mon, 13 Jan 2020 22:54:17 +0000 (14:54 -0800)
committerStanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Fri, 24 Jan 2020 19:33:38 +0000 (11:33 -0800)
commit555d8f4ef5ebb2cdce2636af5102ff944da5fef8
treee8b2dff0cf0db1980e70e08b3b6ce2b3cc767fe2
parentb35b7da4608426e099fc048319ebf50c11c984ea
[AMDGPU] Bundle loads before post-RA scheduler

We are relying on atrificial DAG edges inserted by the
MemOpClusterMutation to keep loads and stores together in the
post-RA scheduler. This does not work all the time since it
allows to schedule a completely independent instruction in the
middle of the cluster.

Removed the DAG mutation and added pass to bundle already
clustered instructions. These bundles are unpacked before the
memory legalizer because it does not work with bundles but also
because it allows to insert waitcounts in the middle of a store
cluster.

Removing artificial edges also allows a more relaxed scheduling.

Differential Revision: https://reviews.llvm.org/D72737
50 files changed:
llvm/lib/Target/AMDGPU/AMDGPU.h
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
llvm/lib/Target/AMDGPU/CMakeLists.txt
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
llvm/lib/Target/AMDGPU/SIPostRABundler.cpp [new file with mode: 0644]
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.atomic.dec.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.atomic.inc.ll
llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll
llvm/test/CodeGen/AMDGPU/byval-frame-setup.ll
llvm/test/CodeGen/AMDGPU/call-argument-types.ll
llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll
llvm/test/CodeGen/AMDGPU/copy-illegal-type.ll
llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll
llvm/test/CodeGen/AMDGPU/ds_write2st64.ll
llvm/test/CodeGen/AMDGPU/global-saddr.ll
llvm/test/CodeGen/AMDGPU/half.ll
llvm/test/CodeGen/AMDGPU/idot2.ll
llvm/test/CodeGen/AMDGPU/idot4s.ll
llvm/test/CodeGen/AMDGPU/idot4u.ll
llvm/test/CodeGen/AMDGPU/idot8s.ll
llvm/test/CodeGen/AMDGPU/idot8u.ll
llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll
llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
llvm/test/CodeGen/AMDGPU/llvm.maxnum.f16.ll
llvm/test/CodeGen/AMDGPU/llvm.minnum.f16.ll
llvm/test/CodeGen/AMDGPU/llvm.round.f64.ll
llvm/test/CodeGen/AMDGPU/load-lo16.ll
llvm/test/CodeGen/AMDGPU/local-memory.amdgcn.ll
llvm/test/CodeGen/AMDGPU/lshr.v2i16.ll
llvm/test/CodeGen/AMDGPU/max.i16.ll
llvm/test/CodeGen/AMDGPU/memory-legalizer-load.ll
llvm/test/CodeGen/AMDGPU/memory_clause.ll
llvm/test/CodeGen/AMDGPU/merge-store-crash.ll
llvm/test/CodeGen/AMDGPU/postra-bundle-memops.mir [new file with mode: 0644]
llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
llvm/test/CodeGen/AMDGPU/saddo.ll
llvm/test/CodeGen/AMDGPU/salu-to-valu.ll
llvm/test/CodeGen/AMDGPU/scratch-simple.ll
llvm/test/CodeGen/AMDGPU/select.f16.ll
llvm/test/CodeGen/AMDGPU/shl.ll
llvm/test/CodeGen/AMDGPU/shl.v2i16.ll
llvm/test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll
llvm/test/CodeGen/AMDGPU/sign_extend.ll
llvm/test/CodeGen/AMDGPU/sminmax.v2i16.ll
llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll
llvm/test/CodeGen/AMDGPU/v_mac_f16.ll
llvm/test/CodeGen/AMDGPU/v_madak_f16.ll